Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 14 additions & 5 deletions src/fev/adapters.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,15 +100,24 @@ class GluonTSAdapter(PandasAdapter):
"""Converts dataset to format required by GluonTS."""

@staticmethod
def _convert_dtypes(df: pd.DataFrame, float_dtype: str = "float32") -> pd.DataFrame:
def _convert_dtypes(
df: pd.DataFrame,
id_column: str,
category_as_ordinal: bool = False,
float_dtype: str = "float32",
) -> pd.DataFrame:
"""Convert numeric dtypes to float32 and object dtypes to category"""
astype_dict = {}
for col in df.columns:
if pd.api.types.is_object_dtype(df[col]):
astype_dict[col] = "category"
elif pd.api.types.is_numeric_dtype(df[col]):
astype_dict[col] = float_dtype
return df.astype(astype_dict)
df = df.astype(astype_dict)
if category_as_ordinal:
cat_cols = [col for col in df.select_dtypes(include="category").columns if col != id_column]
df = df.assign(**{col: df[col].cat.codes for col in cat_cols})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible alternatives:

  • automatically one-hot-encode categorical columns
  • drop categorical columns

Ideally, this should be a configurable option, but currently the fev.convert_input_data method does not allow routing kwargs to the individual adapters. @abdulfatir what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just discussed target encoding as a good option w/ @abdulfatir . why not also use it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial idea was that adapters perform the bare minimum preprocessing such that the data can be consumed by the respective frameworks, but I agree that we can also incorporate the best practices here.

If we go for target encoding, we should probably enable/disable it via an optional argument to the GluonTSAdapter. Currently these are not supported since fev.convert_input_data does not forward kwargs to the adapters.

How about we

  1. Merge this (or some other simple strategy) as a simple default that unbreaks GluonTS models with covaraites
  2. Add a better strategy after the Task refactor with an optional argument to the GluonTSAdapter?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for putting as little model-related stuff here as possible. If the user wants to do other types of encodings, they should do this on the model side.

return df

@classmethod
def convert_input_data(
Expand All @@ -135,10 +144,10 @@ def convert_input_data(
static_columns=static_columns,
)

past_df = cls._convert_dtypes(past_df)
future_df = cls._convert_dtypes(future_df)
past_df = cls._convert_dtypes(past_df, id_column=id_column, category_as_ordinal=True)
future_df = cls._convert_dtypes(future_df, id_column=id_column, category_as_ordinal=True)
if static_df is not None:
static_df = cls._convert_dtypes(static_df.set_index(id_column))
static_df = cls._convert_dtypes(static_df.set_index(id_column), id_column=id_column)
else:
static_df = pd.DataFrame()

Expand Down