Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corrdiff - Make IO Asynchronous with inference #305

Merged
merged 11 commits into from
Jan 23, 2024

Conversation

daviddpruitt
Copy link
Collaborator

Modulus Pull Request

Description

Make IO Asynchronous soinferencing doesn't stall while writing results

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.

Dependencies

Copy link
Collaborator

@mnabian mnabian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please revert changes to README.md and blossom-ci.yaml? Those are likely coming from a rebase from main.

@mnabian mnabian added the 3 - Ready for Review Ready for review by team label Jan 18, 2024
@mnabian
Copy link
Collaborator

mnabian commented Jan 18, 2024

/blossom-ci

@akshaysubr
Copy link
Collaborator

/blossom-ci

@mnabian
Copy link
Collaborator

mnabian commented Jan 22, 2024

/blossom-ci

@mnabian mnabian merged commit 56a29a2 into NVIDIA:0.5.0-rc Jan 23, 2024
1 check passed
NickGeneva pushed a commit to NickGeneva/modulus that referenced this pull request Jan 26, 2024
* Refactor saving results to improve perf and overlap with inferencing

* Update blossom-ci.yml (NVIDIA#295)

* Change pip install commands with the correct PyPI package name (NVIDIA#298)

* Make number of writer workers parameterized

* add comment for writer workers

* revert readme and blossom-ci

* 2nd revert of readme

* fix black formatting

---------

Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>
NickGeneva pushed a commit that referenced this pull request Jan 26, 2024
* Refactor saving results to improve perf and overlap with inferencing

* Update blossom-ci.yml (#295)

* Change pip install commands with the correct PyPI package name (#298)

* Make number of writer workers parameterized

* add comment for writer workers

* revert readme and blossom-ci

* 2nd revert of readme

* fix black formatting

---------

Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>
mnabian added a commit that referenced this pull request Feb 21, 2024
* Fix TypeError in CorrDiff dataloader (#290)

* Update dataset.py

* Update formatting

---------

Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Mohammad Amin Nabian <mnabian@nvidia.com>

* Update to 0.5.0 version (#293)

* Fix diffusion example imports. (#294)

* Fix pydantic issues (#302)

* Fix minor bugs in corrdiff training (#303)

* fix small bugs

* formatting

* add documentation for task

* Fix zenith angle import, pydantic issues (#304)

* Corrdiff - Make IO Asynchronous with inference (#305)

* Refactor saving results to improve perf and overlap with inferencing

* Update blossom-ci.yml (#295)

* Change pip install commands with the correct PyPI package name (#298)

* Make number of writer workers parameterized

* add comment for writer workers

* revert readme and blossom-ci

* 2nd revert of readme

* fix black formatting

---------

Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>

* CorrDiff integration: separate regression and diffusion configs (#308)

* Update blossom-ci.yml (#295)

* Change pip install commands with the correct PyPI package name (#298)

* update configs

* remove mixture loss

* remove mixture loss from init

---------

Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>

* Move URL routines to examples dir (#309)

* Update blossom-ci.yml (#295)

* Change pip install commands with the correct PyPI package name (#298)

* reorganize

* remove from init

---------

Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>

* CorrDiff: Adding NVTX markers and removing redundant regression model inference (#306)

* Refactor saving results to improve perf and overlap with inferencing

* Update blossom-ci.yml (#295)

* Change pip install commands with the correct PyPI package name (#298)

* Make number of writer workers parameterized

* add comment for writer workers

* revert readme and blossom-ci

* 2nd revert of readme

* Adding NVTX markers and removing redundant regression model inference

* Code formatting

* Adding nvtx to pyproject.toml

* Updating CHANGELOG.md

---------

Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>
Co-authored-by: David Pruitt <dpruitt@nvidia.com>
Co-authored-by: David Pruitt <daviddpruitt@gmail.com>
Co-authored-by: Akshay Subramaniam <asubramaniam@login-eos01.eos.clusters.nvidia.com>

* Allowing channels last group norms

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

* Adding channels last convolutions

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

* Adding torch.compile to fix channels last groupnorm

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

* Adding main guard in score_samples.py

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

* Updating changelog and some cleanup

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

* Making torch.compile configurable

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

* Keep original group norm path for training and use custom version only for inference

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>

---------

Signed-off-by: Akshay Subramaniam <6964110+akshaysubr@users.noreply.github.com>
Co-authored-by: Tao Ge <115046371+tge25@users.noreply.github.com>
Co-authored-by: Mohammad Amin Nabian <m.a.nabiyan@gmail.com>
Co-authored-by: Mohammad Amin Nabian <mnabian@nvidia.com>
Co-authored-by: Kaustubh Tangsali <71059996+ktangsali@users.noreply.github.com>
Co-authored-by: Alexey Kamenev <alex.kamenev@gmail.com>
Co-authored-by: David Pruitt <dpruitt@nvidia.com>
Co-authored-by: David Pruitt <dpruitt@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: Abdullah <37012364+Saydemr@users.noreply.github.com>
Co-authored-by: David Pruitt <daviddpruitt@gmail.com>
Co-authored-by: Akshay Subramaniam <asubramaniam@login-eos01.eos.clusters.nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants