Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accelerate FastAI Preprocessing + Fix TabularNN time_limit #2909

Merged
merged 2 commits into from
Feb 14, 2023

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Feb 14, 2023

Issue #, if available:

Description of changes:

  • Accelerate FastAI Preprocessing
  • For datasets with many columns (>1000), this speeds up FastAI preprocessing by >300x when using pandas==1.5.3
  • While the speedup is most large using pandas==1.5.3 due to pandas having major slowdowns in several functions, it is also major using older pandas versions that are not as slow (instead of 300x faster, it is 30x faster).
  • This speedup can result in training time going from 1000 seconds -> 25 seconds, and inference time from 70 seconds -> 0.13 seconds. (robert dataset)
  • This PR additionally fixes another major bug: TabularNN time estimates were off by a factor of 128x, causing many situations where TabularNN will skip training due to thinking it will run out of time when time_limit is specified. This bug was introduced in [Post 0.6][Tabular] make tabular nn dataset iterable #2395 and effects releases v0.6.1 and v0.6.2. The reason is because len(train_dataloader) switched from being train_rows/batch_size to train_rows.

TODO:

  • Benchmark
  • Create minimal reproducible example using only pandas and numpy, submit performance regression issue to pandas

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Innixma Innixma added enhancement New feature or request module: tabular priority: 0 Maximum priority labels Feb 14, 2023
@Innixma Innixma added this to the 0.7 Release milestone Feb 14, 2023
@github-actions
Copy link

Job PR-2909-31ad1e1 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2909/31ad1e1/index.html

Copy link
Collaborator

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!

FastAI model was the slowest model at inference time, and now it's catching up with other models, especially for large batch inference.

image

@Innixma Innixma added the bug Something isn't working label Feb 14, 2023
@github-actions
Copy link

Job PR-2909-2364806 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2909/2364806/index.html

@Innixma
Copy link
Contributor Author

Innixma commented Feb 14, 2023

Benchmark result looks good! Pareto dominates mainline (2022_02_12 is mainline, 2022_02_14 is this PR, predict axis is based on batch_size=1):

autogluon-benchmark_–_run_check_inference_speed_py__autogluon-utils_

@Innixma Innixma changed the title [Do Not Merge] Accelerate FastAI Preprocessing Accelerate FastAI Preprocessing + Fix TabularNN time_limit Feb 14, 2023
@Innixma Innixma merged commit 727d674 into autogluon:master Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request module: tabular priority: 0 Maximum priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants