Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified infer speed throughput calculation #2465

Merged
merged 2 commits into from
Nov 23, 2022
Merged

Conversation

Innixma
Copy link
Contributor

@Innixma Innixma commented Nov 23, 2022

Issue #, if available:

Description of changes:

  • Simplified infer speed throughput calculation when needing to compute multiple different batch size throughputs and plot them.

Example script:

import pandas as pd

from autogluon.tabular import TabularPredictor, TabularDataset
from autogluon.core.utils.infer_utils import get_model_true_infer_speed_per_row_batch_bulk


if __name__ == '__main__':
    train_path = 'https://autogluon.s3.amazonaws.com/datasets/CoverTypeMulticlassClassification/train_data.csv'
    test_path = 'https://autogluon.s3.amazonaws.com/datasets/CoverTypeMulticlassClassification/test_data.csv'
    label = 'Cover_Type'

    train_data = TabularDataset(train_path)
    subsample_size = 1000  # subsample subset of data for faster demo, try setting this to much larger values
    if subsample_size is not None and subsample_size < len(train_data):
        train_data = train_data.sample(n=subsample_size, random_state=0)


    hyperparameters = {
        'GBM': {},
        'NN_TORCH': {},
    }

    predictor = TabularPredictor(
        label=label,
    )

    predictor.fit(
        train_data,
        fit_weighted_ensemble=False,
        hyperparameters=hyperparameters,
    )

    test_data = TabularDataset(test_path)

    predictor.persist_models('all')
    leaderboard = predictor.leaderboard(test_data)

    repeats = 2
    batch_sizes = [
        1,
        10,
        100,
        1000,
        10000,
        100000,
    ]

    infer_df_full, infer_df_full_transform = get_model_true_infer_speed_per_row_batch_bulk(
        data=test_data,
        predictor=predictor,
        batch_sizes=batch_sizes,
        repeats=repeats,
        include_transform_features=True,
    )

    infer_df_full['rows_per_second'] = 1 / infer_df_full['pred_time_test']

    import matplotlib.pyplot as plt

    fig, ax = plt.subplots()
    fig.set_size_inches(12, 12)
    fig.set_dpi(300)

    plt.xscale('log')
    plt.yscale('log')

    models = list(infer_df_full['model'].unique())
    batch_sizes = list(infer_df_full['batch_size'].unique())
    for model in models:
        infer_df_model = infer_df_full[infer_df_full['model'] == model]
        ax.plot(infer_df_model['batch_size'].values, infer_df_model['rows_per_second'].values, label=model)

    ax.set(xlabel='batch_size', ylabel='rows_per_second',
           title='Rows per second inference throughput by data batch_size (AdultIncome)')
    ax.grid()
    ax.legend()
    fig.savefig('infer_speed.png', dpi=300)
    plt.show()

Example output:

Throughput for batch_size=1:
	0.021s per row | LightGBM
	0.029s per row | NeuralNetTorch
	0.019s per row | transform_features
Throughput for batch_size=10:
	2.076ms per row | LightGBM
	2.934ms per row | NeuralNetTorch
	1.872ms per row | transform_features
Throughput for batch_size=100:
	0.257ms per row | LightGBM
	0.318ms per row | NeuralNetTorch
	0.207ms per row | transform_features
Throughput for batch_size=1000:
	0.046ms per row | LightGBM
	0.035ms per row | NeuralNetTorch
	0.021ms per row | transform_features
Throughput for batch_size=10000:
	0.025ms per row | LightGBM
	8.97μs per row | NeuralNetTorch
	3.032μs per row | transform_features
Throughput for batch_size=100000:
	0.025ms per row | LightGBM
	8.815μs per row | NeuralNetTorch
	4.065μs per row | transform_features

infer_speed_adult_rf_onnx

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link
Collaborator

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment, otherwise looks good.

core/src/autogluon/core/utils/infer_utils.py Show resolved Hide resolved
Copy link
Collaborator

@liangfu liangfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for the contribution.

@github-actions
Copy link

Job PR-2465-bcd89d5 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2465/bcd89d5/index.html

@github-actions
Copy link

Job PR-2465-581ab2a is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-2465/581ab2a/index.html

@Innixma Innixma merged commit cdde0ac into master Nov 23, 2022
@Innixma Innixma deleted the simplify_infer_calc branch January 18, 2023 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants