Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from nn.DataParallel to nn.DistributedDataParallel for realtime inference #3565

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

tonyhoo
Copy link
Collaborator

@tonyhoo tonyhoo commented Oct 6, 2023

Issue #, if available:

Description of changes:
Switch from DataParallel to DistributedDataParallel as recommended in the latest Pytorch doc

However, after switching the real time inference speed has been impacted significantly due to the overhead on spawning new process, collecting the data from these process and moving it between CPU and GPUs. See plot below:

execution_time_plot

Generating script (requires code change):

import os
import warnings
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
 
from autogluon.core.utils.loaders import load_zip
from autogluon.multimodal import MultiModalPredictor
 
warnings.filterwarnings('ignore')
np.random.seed(123)
 
if __name__ == '__main__':
    download_dir = './ag_multimodal_tutorial'
 
    dataset_path = f'{download_dir}/petfinder_for_tutorial'
 
    test_data = pd.read_csv(f'{dataset_path}/test.csv', index_col=0)
 
    label_col = 'AdoptionSpeed'
 
    predictor = MultiModalPredictor.load("/home/ubuntu/workplace/autogluon/autogluon/multimodal/AutogluonModels/ag-20231016_231146")
 
    
    strategies = ["ddp", "dp"]
 
    num_runs = 30  # number of runs for each strategy
    results = []
    for strategy in strategies:
        
        for i in range(num_runs):
            start = time.time()
            predictions = predictor.predict(test_data.drop(columns=label_col), realtime=True, strategy=strategy)
            end = time.time()
            if i == 0:
                continue
            results.append({'strategy': strategy, 'execution_time': end - start})
 
    results_df = pd.DataFrame(results)
 
    # save results to a csv file
    results_df.to_csv(f"results.csv", index=False)
 
    # print the average execution time for each strategy
    average_times = results_df.groupby('strategy')['execution_time'].mean()
    print(average_times)
 
    # plot the results
    results_df.boxplot(by='strategy')
    plt.ylabel('Execution Time (s)')
    plt.show()
    plt.savefig('execution_time_plot.png')

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@tonyhoo tonyhoo added the resource: GPU Related to GPU label Oct 12, 2023
@tonyhoo tonyhoo changed the title [Draft]Switch from nn.DataParallel to nn.DistributedDataParallel for realtime inference [WIP]Switch from nn.DataParallel to nn.DistributedDataParallel for realtime inference Oct 12, 2023
@github-actions
Copy link

Job PR-3565-9201202 is done.
Docs are uploaded to http://autogluon-staging.s3-website-us-west-2.amazonaws.com/PR-3565/9201202/index.html

@tonyhoo tonyhoo changed the title [WIP]Switch from nn.DataParallel to nn.DistributedDataParallel for realtime inference Switch from nn.DataParallel to nn.DistributedDataParallel for realtime inference Oct 16, 2023
@@ -176,7 +176,7 @@ def infer_batch(
device = torch.device(device_type)
batch_size = len(batch[next(iter(batch))])
if 1 < num_gpus <= batch_size:
model = nn.DataParallel(model)
model = nn.parallel.DistributedDataParallel(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test realtime inference in multi-gpu environment and compare the inference time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the codes and testing results for 30 runs between DDP and DP. It looks like DDP for real time inference has much high latency contributed by results collection operation. Based on the results, I don't recommend to use DDP for real time inference

@tonyhoo tonyhoo marked this pull request as draft October 23, 2023 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
resource: GPU Related to GPU
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants