Corrdiff - Make IO Asynchronous with inference #305

daviddpruitt · 2024-01-18T22:10:34Z

Modulus Pull Request

Description

Make IO Asynchronous soinferencing doesn't stall while writing results

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

Dependencies

…A#298)

mnabian

Could you please revert changes to README.md and blossom-ci.yaml? Those are likely coming from a rebase from main.

mnabian · 2024-01-18T22:19:38Z

/blossom-ci

akshaysubr · 2024-01-19T17:16:22Z

/blossom-ci

mnabian · 2024-01-22T21:56:05Z

/blossom-ci

* Refactor saving results to improve perf and overlap with inferencing * Update blossom-ci.yml (NVIDIA#295) * Change pip install commands with the correct PyPI package name (NVIDIA#298) * Make number of writer workers parameterized * add comment for writer workers * revert readme and blossom-ci * 2nd revert of readme * fix black formatting --------- Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>

* Refactor saving results to improve perf and overlap with inferencing * Update blossom-ci.yml (#295) * Change pip install commands with the correct PyPI package name (#298) * Make number of writer workers parameterized * add comment for writer workers * revert readme and blossom-ci * 2nd revert of readme * fix black formatting --------- Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>

* Fix TypeError in CorrDiff dataloader (#290) * Update dataset.py * Update formatting --------- Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Mohammad Amin Nabian <mnabian@nvidia.com> * Update to 0.5.0 version (#293) * Fix diffusion example imports. (#294) * Fix pydantic issues (#302) * Fix minor bugs in corrdiff training (#303) * fix small bugs * formatting * add documentation for task * Fix zenith angle import, pydantic issues (#304) * Corrdiff - Make IO Asynchronous with inference (#305) * Refactor saving results to improve perf and overlap with inferencing * Update blossom-ci.yml (#295) * Change pip install commands with the correct PyPI package name (#298) * Make number of writer workers parameterized * add comment for writer workers * revert readme and blossom-ci * 2nd revert of readme * fix black formatting --------- Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com> * CorrDiff integration: separate regression and diffusion configs (#308) * Update blossom-ci.yml (#295) * Change pip install commands with the correct PyPI package name (#298) * update configs * remove mixture loss * remove mixture loss from init --------- Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com> * Move URL routines to examples dir (#309) * Update blossom-ci.yml (#295) * Change pip install commands with the correct PyPI package name (#298) * reorganize * remove from init --------- Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com> * CorrDiff: Adding NVTX markers and removing redundant regression model inference (#306) * Refactor saving results to improve perf and overlap with inferencing * Update blossom-ci.yml (#295) * Change pip install commands with the correct PyPI package name (#298) * Make number of writer workers parameterized * add comment for writer workers * revert readme and blossom-ci * 2nd revert of readme * Adding NVTX markers and removing redundant regression model inference * Code formatting * Adding nvtx to pyproject.toml * Updating CHANGELOG.md --------- Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com> Co-authored-by: David Pruitt <dpruitt@nvidia.com> Co-authored-by: David Pruitt <daviddpruitt@gmail.com> Co-authored-by: Akshay Subramaniam <asubramaniam@login-eos01.eos.clusters.nvidia.com> * Allowing channels last group norms Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> * Adding channels last convolutions Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> * Adding torch.compile to fix channels last groupnorm Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> * Adding main guard in score_samples.py Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> * Updating changelog and some cleanup Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> * Making torch.compile configurable Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> * Keep original group norm path for training and use custom version only for inference Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> --------- Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com> Co-authored-by: Tao Ge <115046371+tge25@users.noreply.github.com> Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com> Co-authored-by: Mohammad Amin Nabian <mnabian@nvidia.com> Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com> Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com> Co-authored-by: David Pruitt <dpruitt@nvidia.com> Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com> Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com> Co-authored-by: David Pruitt <daviddpruitt@gmail.com> Co-authored-by: Akshay Subramaniam <asubramaniam@login-eos01.eos.clusters.nvidia.com>

David Pruitt and others added 7 commits January 16, 2024 09:16

Refactor saving results to improve perf and overlap with inferencing

5914442

Update blossom-ci.yml (NVIDIA#295)

db61e53

Change pip install commands with the correct PyPI package name (NVIDI…

2a35804

…A#298)

Make number of writer workers parameterized

b7b8c13

Merge branch 'NVIDIA:main' into enh-corrdiff-perf

e8bfae2

merge with RC

6a4ad32

add comment for writer workers

f535d4d

daviddpruitt requested review from MortezaMardani and mnabian January 18, 2024 22:10

mnabian approved these changes Jan 18, 2024

View reviewed changes

mnabian assigned daviddpruitt Jan 18, 2024

mnabian added the 3 - Ready for Review Ready for review by team label Jan 18, 2024

MortezaMardani approved these changes Jan 18, 2024

View reviewed changes

daviddpruitt added 3 commits January 18, 2024 15:16

revert readme and blossom-ci

2ad008c

2nd revert of readme

a87fe71

Merge branch '0.5.0-rc' into enh-corrdiff-perf

88443ef

fix black formatting

a89c529

mnabian merged commit 56a29a2 into NVIDIA:0.5.0-rc Jan 23, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corrdiff - Make IO Asynchronous with inference #305

Corrdiff - Make IO Asynchronous with inference #305

daviddpruitt commented Jan 18, 2024

mnabian left a comment

mnabian commented Jan 18, 2024

akshaysubr commented Jan 19, 2024

mnabian commented Jan 22, 2024

Corrdiff - Make IO Asynchronous with inference #305

Corrdiff - Make IO Asynchronous with inference #305

Conversation

daviddpruitt commented Jan 18, 2024

Modulus Pull Request

Description

Checklist

Dependencies

mnabian left a comment

Choose a reason for hiding this comment

mnabian commented Jan 18, 2024

akshaysubr commented Jan 19, 2024

mnabian commented Jan 22, 2024