Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[timeseries] add feature importance to TimeSeriesPredictor #4033

Merged

Conversation

canerturkmen
Copy link
Contributor

@canerturkmen canerturkmen commented Apr 3, 2024

Issue #, if available:

#3924

Description of changes:

This draft PR introduces the high level design for TimeSeriesPredictor.feature_importance.

Additional to-dos:

  • Fix feature transformation logic to prevent shuffling the held-out future
  • Add override logic to make sure overall unused features are not computed for and 0 importance is assigned
  • Add unit tests
  • (stretch) Revisit prediction caching logic to prevent models that don't use features from re-inferring

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@canerturkmen canerturkmen added the module: timeseries related to the timeseries module label Apr 3, 2024
@canerturkmen canerturkmen added this to the 1.1 Release milestone Apr 3, 2024
@canerturkmen canerturkmen requested a review from shchur April 3, 2024 09:07
@canerturkmen canerturkmen added enhancement New feature or request priority: 0 Maximum priority labels Apr 3, 2024
@canerturkmen canerturkmen mentioned this pull request Apr 3, 2024
23 tasks
@yinweisu
Copy link
Collaborator

yinweisu commented Apr 3, 2024

Previous CI Run Current CI Run

Copy link
Collaborator

@shchur shchur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very elegant!

The main points that I want to discuss:

  1. Moving all feature importance logic to Trainer
  2. Using statistics of train data for masking
  3. Use full train_data/tuning_data by default
  4. Fixing seed for determinism

timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/utils/features.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/utils/features.py Outdated Show resolved Hide resolved
)

if features is None:
features = feature_importance_transform.covariate_metadata.all_features
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that some features are present in data.columns or data.static_features.columns but are missing from covariate_metadata.all_features because TimeSeriesFeatureGenerator removes non-informative features from the data (e.g., duplicate feature, feature consisting of all constant values, feature that takes different value for each row, unrecognized dtype time datetime).

We should manually check the columns in original data and assign feature importance of 0 to columns that are missing from transformed data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what is done in Tabular I believe

Copy link

github-actions bot commented Apr 3, 2024

Job PR-4033-32a1e3d is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4033/32a1e3d/index.html

@yinweisu
Copy link
Collaborator

yinweisu commented Apr 3, 2024

Previous CI Run Current CI Run
pydantic_core==2.16.3 -
ray==2.10.0 ray==2.6.3
pydantic==2.6.4 pydantic==1.10.15
annotated-types==0.6.0 -
platformdirs==4.2.0 platformdirs==3.11.0
virtualenv==20.25.1 virtualenv==20.21.0
ray==2.10.0 ray==2.6.3
- gpustat==1.1.1
pydantic==2.6.4 pydantic==1.10.15
- blessed==1.20.0
- nvidia-ml-py==12.535.133
platformdirs==4.2.0 platformdirs==3.11.0
virtualenv==20.25.1 virtualenv==20.21.0

Copy link

github-actions bot commented Apr 3, 2024

Job PR-4033-c0a0175 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4033/c0a0175/index.html

Copy link
Contributor

@Innixma Innixma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added initial review

@canerturkmen
Copy link
Contributor Author

Added initial review

Thanks for the review @Innixma ! This was the first cut trying to pin down the design. I'll base my replies on that.

@yinweisu
Copy link
Collaborator

yinweisu commented Apr 4, 2024

Previous CI Run Current CI Run

Copy link

github-actions bot commented Apr 4, 2024

Job PR-4033-fe04e5f is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4033/fe04e5f/index.html

timeseries/src/autogluon/timeseries/learner.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/predictor.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/utils/features.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/utils/features.py Outdated Show resolved Hide resolved
timeseries/src/autogluon/timeseries/utils/features.py Outdated Show resolved Hide resolved
timeseries/tests/unittests/test_features.py Show resolved Hide resolved
@yinweisu
Copy link
Collaborator

yinweisu commented Apr 4, 2024

Previous CI Run Current CI Run
botocore==1.34.77 botocore==1.34.78
boto3==1.34.77 boto3==1.34.78
botocore==1.34.77 botocore==1.34.78
boto3==1.34.77 boto3==1.34.78

@yinweisu
Copy link
Collaborator

yinweisu commented Apr 4, 2024

Previous CI Run Current CI Run

Copy link

github-actions bot commented Apr 4, 2024

Job PR-4033-0898cb0 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4033/0898cb0/index.html

@yinweisu
Copy link
Collaborator

yinweisu commented Apr 5, 2024

Previous CI Run Current CI Run

1 similar comment
@yinweisu
Copy link
Collaborator

yinweisu commented Apr 5, 2024

Previous CI Run Current CI Run

@canerturkmen canerturkmen marked this pull request as ready for review April 5, 2024 09:05
@yinweisu
Copy link
Collaborator

yinweisu commented Apr 5, 2024

Previous CI Run Current CI Run

Copy link

github-actions bot commented Apr 5, 2024

Job PR-4033-5051648 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4033/5051648/index.html

@yinweisu
Copy link
Collaborator

yinweisu commented Apr 5, 2024

Previous CI Run Current CI Run
typing_extensions==4.10.0 typing_extensions==4.11.0
typing_extensions==4.10.0 typing_extensions==4.11.0

@canerturkmen canerturkmen force-pushed the ts-feature-importance-transforms branch from c517448 to 68b8095 Compare April 5, 2024 13:04
@yinweisu
Copy link
Collaborator

yinweisu commented Apr 5, 2024

Previous CI Run Current CI Run
lazy_loader==0.3 lazy_loader==0.4
lazy_loader==0.3 lazy_loader==0.4

Copy link
Collaborator

@shchur shchur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for taking care of this massive feature! 🚀

@yinweisu
Copy link
Collaborator

yinweisu commented Apr 5, 2024

Previous CI Run Current CI Run

@shchur shchur merged commit 3cec54a into autogluon:master Apr 5, 2024
27 checks passed
@canerturkmen canerturkmen deleted the ts-feature-importance-transforms branch April 5, 2024 14:31
Copy link

github-actions bot commented Apr 5, 2024

Job PR-4033-0c74a01 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-4033/0c74a01/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request module: timeseries related to the timeseries module priority: 0 Maximum priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants