Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid warning in Many-Models Notebook #1971

Merged
merged 3 commits into from
Dec 10, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,6 @@
"\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=TIME_COLNAME,\n",
" drop_column_names=\"Revenue\",\n",
" forecast_horizon=6,\n",
" time_series_id_column_names=partition_column_names,\n",
" cv_step_size=\"auto\",\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -433,7 +433,6 @@
"\n",
"forecasting_parameters = ForecastingParameters(\n",
" time_column_name=\"WeekStarting\",\n",
" drop_column_names=\"Revenue\",\n",
" forecast_horizon=6,\n",
" time_series_id_column_names=partition_column_names,\n",
" cv_step_size=\"auto\",\n",
Expand Down Expand Up @@ -469,7 +468,9 @@
"\n",
"Reuse of previous results (``allow_reuse``) is key when using pipelines in a collaborative environment since eliminating unnecessary reruns offers agility. Reuse is the default behavior when the ``script_name``, ``inputs``, and the parameters of a step remain the same. When reuse is allowed, results from the previous run are immediately sent to the next step. If ``allow_reuse`` is set to False, a new run will always be generated for this step during pipeline execution.\n",
"\n",
"> Note that we only support partitioned FileDataset and TabularDataset without partition when using such output as input."
"> Note that we only support partitioned FileDataset and TabularDataset without partition when using such output as input.\n",
"\n",
"> Note that we **drop column** \"Revenue\" from the dataset in this step as this is not relevant for forecasting with the dataset used in this example. **Please modify the logic based on your data**."
arun-rajora marked this conversation as resolved.
Show resolved Hide resolved
iamrk04 marked this conversation as resolved.
Show resolved Hide resolved
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ def main(args):
dataset = run_context.input_datasets["train_10_models"]
df = dataset.to_pandas_dataframe()

# Drop the column "Revenue" from the dataset
iamrk04 marked this conversation as resolved.
Show resolved Hide resolved
# Please remove if this is not required
drop_column_name = "Revenue"
if drop_column_name in df.columns:
df.drop(drop_column_name, axis=1, inplace=True)

# Apply any data pre-processing techniques here

df.to_parquet(output / "data_prepared_result.parquet", compression=None)
Expand Down