Simplified infer speed throughput calculation #2465

Innixma · 2022-11-23T02:56:40Z

Issue #, if available:

Description of changes:

Simplified infer speed throughput calculation when needing to compute multiple different batch size throughputs and plot them.

Example script:

import pandas as pd

from autogluon.tabular import TabularPredictor, TabularDataset
from autogluon.core.utils.infer_utils import get_model_true_infer_speed_per_row_batch_bulk


if __name__ == '__main__':
    train_path = 'https://autogluon.s3.amazonaws.com/datasets/CoverTypeMulticlassClassification/train_data.csv'
    test_path = 'https://autogluon.s3.amazonaws.com/datasets/CoverTypeMulticlassClassification/test_data.csv'
    label = 'Cover_Type'

    train_data = TabularDataset(train_path)
    subsample_size = 1000  # subsample subset of data for faster demo, try setting this to much larger values
    if subsample_size is not None and subsample_size < len(train_data):
        train_data = train_data.sample(n=subsample_size, random_state=0)


    hyperparameters = {
        'GBM': {},
        'NN_TORCH': {},
    }

    predictor = TabularPredictor(
        label=label,
    )

    predictor.fit(
        train_data,
        fit_weighted_ensemble=False,
        hyperparameters=hyperparameters,
    )

    test_data = TabularDataset(test_path)

    predictor.persist_models('all')
    leaderboard = predictor.leaderboard(test_data)

    repeats = 2
    batch_sizes = [
        1,
        10,
        100,
        1000,
        10000,
        100000,
    ]

    infer_df_full, infer_df_full_transform = get_model_true_infer_speed_per_row_batch_bulk(
        data=test_data,
        predictor=predictor,
        batch_sizes=batch_sizes,
        repeats=repeats,
        include_transform_features=True,
    )

    infer_df_full['rows_per_second'] = 1 / infer_df_full['pred_time_test']

    import matplotlib.pyplot as plt

    fig, ax = plt.subplots()
    fig.set_size_inches(12, 12)
    fig.set_dpi(300)

    plt.xscale('log')
    plt.yscale('log')

    models = list(infer_df_full['model'].unique())
    batch_sizes = list(infer_df_full['batch_size'].unique())
    for model in models:
        infer_df_model = infer_df_full[infer_df_full['model'] == model]
        ax.plot(infer_df_model['batch_size'].values, infer_df_model['rows_per_second'].values, label=model)

    ax.set(xlabel='batch_size', ylabel='rows_per_second',
           title='Rows per second inference throughput by data batch_size (AdultIncome)')
    ax.grid()
    ax.legend()
    fig.savefig('infer_speed.png', dpi=300)
    plt.show()

Example output:

Throughput for batch_size=1:
	0.021s per row | LightGBM
	0.029s per row | NeuralNetTorch
	0.019s per row | transform_features
Throughput for batch_size=10:
	2.076ms per row | LightGBM
	2.934ms per row | NeuralNetTorch
	1.872ms per row | transform_features
Throughput for batch_size=100:
	0.257ms per row | LightGBM
	0.318ms per row | NeuralNetTorch
	0.207ms per row | transform_features
Throughput for batch_size=1000:
	0.046ms per row | LightGBM
	0.035ms per row | NeuralNetTorch
	0.021ms per row | transform_features
Throughput for batch_size=10000:
	0.025ms per row | LightGBM
	8.97μs per row | NeuralNetTorch
	3.032μs per row | transform_features
Throughput for batch_size=100000:
	0.025ms per row | LightGBM
	8.815μs per row | NeuralNetTorch
	4.065μs per row | transform_features

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

liangfu

Left a comment, otherwise looks good.

core/src/autogluon/core/utils/infer_utils.py

liangfu

Looks great! Thanks for the contribution.

github-actions · 2022-11-23T05:00:37Z

Job PR-2465-bcd89d5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2465/bcd89d5/index.html

github-actions · 2022-11-23T05:18:39Z

Job PR-2465-581ab2a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2465/581ab2a/index.html

Simplified infer speed throughput calculation

bcd89d5

liangfu reviewed Nov 23, 2022

View reviewed changes

core/src/autogluon/core/utils/infer_utils.py Show resolved Hide resolved

add missing line of code

581ab2a

liangfu approved these changes Nov 23, 2022

View reviewed changes

Innixma merged commit cdde0ac into master Nov 23, 2022

Innixma deleted the simplify_infer_calc branch January 18, 2023 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified infer speed throughput calculation #2465

Simplified infer speed throughput calculation #2465

Innixma commented Nov 23, 2022

liangfu left a comment

liangfu left a comment

github-actions bot commented Nov 23, 2022

github-actions bot commented Nov 23, 2022

Simplified infer speed throughput calculation #2465

Simplified infer speed throughput calculation #2465

Conversation

Innixma commented Nov 23, 2022

liangfu left a comment

Choose a reason for hiding this comment

liangfu left a comment

Choose a reason for hiding this comment

github-actions bot commented Nov 23, 2022

github-actions bot commented Nov 23, 2022