Skip to content

add in benchmark code for parallel download#18

Merged
jdnurme merged 1 commit intomainfrom
parallel-bench
Mar 11, 2024
Merged

add in benchmark code for parallel download#18
jdnurme merged 1 commit intomainfrom
parallel-bench

Conversation

@jdnurme
Copy link
Collaborator

@jdnurme jdnurme commented Mar 11, 2024

Simple parallel bench with instructions to run in comments.

It's a good idea to open an issue first for discussion.

  • Tests pass
  • Appropriate changes to documentation are included in the PR

@jdnurme jdnurme requested review from MattIrv and bernardhan33 March 11, 2024 21:31
@jdnurme jdnurme self-assigned this Mar 11, 2024
Copy link
Collaborator

@bernardhan33 bernardhan33 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great for an initial pass. For next steps it might be good to understand

  1. With an existing number (<=cpu count) of processes doing the download, the performance improvement of further parallelizing the download.
  2. For frameworks such as PyTorch, with an existing number (<=cpu count) of workers doing the download, the performance of further parallelizing the download. There, we might not see that much of a performance improvement due to the "batch_size" and "prefetching".

@jdnurme
Copy link
Collaborator Author

jdnurme commented Mar 11, 2024

This is great for an initial pass. For next steps it might be good to understand

  1. With an existing number (<=cpu count) of processes doing the download, the performance improvement of further parallelizing the download.
  2. For frameworks such as PyTorch, with an existing number (<=cpu count) of workers doing the download, the performance of further parallelizing the download. There, we might not see that much of a performance improvement due to the "batch_size" and "prefetching".

Yes, getting a full understanding of the actual impact of different values will be essential to figuring out some sort of "recommendation" function in the future.

@jdnurme jdnurme merged commit 310094f into main Mar 11, 2024
@jdnurme jdnurme deleted the parallel-bench branch April 5, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants