[Post 0.6][Tabular] make tabular nn dataset iterable #2395

liangfu · 2022-11-15T05:08:53Z

Description of changes:
This PR converts map-based tabular nn dataset class into an iterable dataset, which makes batch-based data loading significantly faster.

For instance, loading 9769 rows in test.csv has been reduced from 49 ms to 9 ms.

Data loading time has been measured with following code snippet.

        tic = time.time()
        subtotal = 0
        for batch_idx, data_batch in enumerate(val_dataloader):
            tik = time.time()
            preds_batch = self.model.predict(data_batch)
            preds_dataset.append(preds_batch)
            subtotal += time.time() - tik
        total = time.time()-tic
        print(f"elapsed (dataloader): {(total-subtotal)*1000:.0f} ms")
        print(f"elapsed (predict): {subtotal*1000:.0f} ms")

Under Linux, we are able to achieve 40% overall time savings for batch_size>10000.

See following chart for more benchmark results.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

github-actions · 2022-11-15T21:13:50Z

Job PR-2395-8f85dcb is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2395/8f85dcb/index.html

liangfu · 2022-11-21T17:52:54Z

cc @tonyhoo @Innixma

Innixma

Looks very good! I am able to reproduce the speedups on multiple datasets, nice work!!

tabular/src/autogluon/tabular/models/tabular_nn/torch/torch_network_modules.py

tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_torch_dataset.py

github-actions · 2022-11-23T20:04:55Z

Job PR-2395-2341393 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2395/2341393/index.html

Innixma · 2022-11-23T21:58:02Z

tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_torch_dataset.py

+            # Drop last batch
+            if self.drop_last and (idx_start + self.batch_size) > self.num_examples:
+                break


Were we dropping last before this PR?

This depends on the drop_last argument in the data loader.

I think I found the answer, we were, if you look at deleted code lines

We won't drop_last or shuffle when is_test==True, we were dropping last and shuffle the dataset for training. The behavior is maintained to be consistent.

liangfu force-pushed the iterable-dataset-1 branch from eedbfa6 to 8f85dcb Compare November 15, 2022 19:03

liangfu changed the title ~~[Tabular] make tabular nn dataset iterable~~ [Post 0.6][Tabular] make tabular nn dataset iterable Nov 17, 2022

Innixma approved these changes Nov 23, 2022

View reviewed changes

tabular/src/autogluon/tabular/models/tabular_nn/torch/torch_network_modules.py Outdated Show resolved Hide resolved

tabular/src/autogluon/tabular/models/tabular_nn/torch/tabular_torch_dataset.py Show resolved Hide resolved

liangfu added 2 commits November 23, 2022 09:58

make tabular nn dataset iterable

c095b24

address review comments

2341393

liangfu force-pushed the iterable-dataset-1 branch from 8f85dcb to 2341393 Compare November 23, 2022 17:59

Innixma reviewed Nov 23, 2022

View reviewed changes

Innixma approved these changes Nov 23, 2022

View reviewed changes

Innixma merged commit 8d81511 into autogluon:master Nov 23, 2022

liangfu deleted the iterable-dataset-1 branch November 23, 2022 22:10

This was referenced Feb 11, 2023

[Performance]Set persistent_worker in prediction dataloader to False + Remove seed in inference #2891

Merged

Accelerate FastAI Preprocessing + Fix TabularNN time_limit #2909

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Post 0.6][Tabular] make tabular nn dataset iterable #2395

[Post 0.6][Tabular] make tabular nn dataset iterable #2395

liangfu commented Nov 15, 2022 •

edited

github-actions bot commented Nov 15, 2022

liangfu commented Nov 21, 2022

Innixma left a comment

github-actions bot commented Nov 23, 2022

Innixma Nov 23, 2022

liangfu Nov 23, 2022

Innixma Nov 23, 2022

liangfu Nov 23, 2022

[Post 0.6][Tabular] make tabular nn dataset iterable #2395

[Post 0.6][Tabular] make tabular nn dataset iterable #2395

Conversation

liangfu commented Nov 15, 2022 • edited

github-actions bot commented Nov 15, 2022

liangfu commented Nov 21, 2022

Innixma left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 23, 2022

Innixma Nov 23, 2022

Choose a reason for hiding this comment

liangfu Nov 23, 2022

Choose a reason for hiding this comment

Innixma Nov 23, 2022

Choose a reason for hiding this comment

liangfu Nov 23, 2022

Choose a reason for hiding this comment

liangfu commented Nov 15, 2022 •

edited