Accelerate FastAI Preprocessing + Fix TabularNN time_limit #2909

Innixma · 2023-02-14T00:40:34Z

Issue #, if available:

Description of changes:

Accelerate FastAI Preprocessing
For datasets with many columns (>1000), this speeds up FastAI preprocessing by >300x when using pandas==1.5.3
While the speedup is most large using pandas==1.5.3 due to pandas having major slowdowns in several functions, it is also major using older pandas versions that are not as slow (instead of 300x faster, it is 30x faster).
This speedup can result in training time going from 1000 seconds -> 25 seconds, and inference time from 70 seconds -> 0.13 seconds. (robert dataset)
This PR additionally fixes another major bug: TabularNN time estimates were off by a factor of 128x, causing many situations where TabularNN will skip training due to thinking it will run out of time when time_limit is specified. This bug was introduced in [Post 0.6][Tabular] make tabular nn dataset iterable #2395 and effects releases v0.6.1 and v0.6.2. The reason is because len(train_dataloader) switched from being train_rows/batch_size to train_rows.

TODO:

Benchmark
Create minimal reproducible example using only pandas and numpy, submit performance regression issue to pandas

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2023-02-14T02:29:01Z

liangfu

Looks awesome!

FastAI model was the slowest model at inference time, and now it's catching up with other models, especially for large batch inference.

github-actions · 2023-02-14T05:20:34Z

Innixma · 2023-02-14T17:06:47Z

Benchmark result looks good! Pareto dominates mainline (2022_02_12 is mainline, 2022_02_14 is this PR, predict axis is based on batch_size=1):

Innixma added enhancement New feature or request module: tabular priority: 0 Maximum priority labels Feb 14, 2023

Innixma added this to the 0.7 Release milestone Feb 14, 2023

Accelerate FastAI Preprocessing

31ad1e1

Innixma force-pushed the accelerate_fastai branch from b1adef7 to 31ad1e1 Compare February 14, 2023 00:59

gradientsky approved these changes Feb 14, 2023

View reviewed changes

liangfu approved these changes Feb 14, 2023

View reviewed changes

Fix TorchNN failing to train due to incorrect time estimate

2364806

Innixma added the bug Something isn't working label Feb 14, 2023

Innixma changed the title ~~[Do Not Merge] Accelerate FastAI Preprocessing~~ Accelerate FastAI Preprocessing + Fix TabularNN time_limit Feb 14, 2023

Innixma merged commit 727d674 into autogluon:master Feb 14, 2023

Provide feedback