-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quantized checkpoint support in export and deploy modules #8859
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
for more information, see https://pre-commit.ci
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
jenkins |
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
oyilmaz-nvidia
approved these changes
Apr 22, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
alxzhang-amazon
pushed a commit
to alxzhang-amazon/NeMo
that referenced
this pull request
Apr 26, 2024
* Resolve engine build command for int8_sq quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix links and typos Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add quantization docs to ToC Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Opt for using torchrun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting and running quantized qnemo checkpoints Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Report evaluation time and shorten passing results around Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix undefined model_info Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unfold import path Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable HF tokenizer Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add copyright headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Update AMMO to 0.9.4 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Unpack qnemo checkpoint if it's a tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Format results display Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
galv
pushed a commit
to galv/NeMo
that referenced
this pull request
Apr 29, 2024
* Resolve engine build command for int8_sq quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix links and typos Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add quantization docs to ToC Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Opt for using torchrun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting and running quantized qnemo checkpoints Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Report evaluation time and shorten passing results around Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix undefined model_info Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unfold import path Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable HF tokenizer Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add copyright headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Update AMMO to 0.9.4 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Unpack qnemo checkpoint if it's a tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Format results display Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
suiyoubi
pushed a commit
that referenced
this pull request
May 2, 2024
* Resolve engine build command for int8_sq quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix links and typos Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add quantization docs to ToC Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Opt for using torchrun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting and running quantized qnemo checkpoints Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Report evaluation time and shorten passing results around Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix undefined model_info Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unfold import path Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable HF tokenizer Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add copyright headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Update AMMO to 0.9.4 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Unpack qnemo checkpoint if it's a tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Format results display Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Ao Tang <aot@nvidia.com>
This was referenced May 16, 2024
rohitrango
pushed a commit
to rohitrango/NeMo
that referenced
this pull request
Jun 25, 2024
* Resolve engine build command for int8_sq quantization Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix links and typos Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Add quantization docs to ToC Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Opt for using torchrun Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable exporting and running quantized qnemo checkpoints Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Report evaluation time and shorten passing results around Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Fix undefined model_info Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Unfold import path Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Enable HF tokenizer Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add copyright headers Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Update AMMO to 0.9.4 Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Unpack qnemo checkpoint if it's a tarball Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Format results display Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
Add support for testing, deploying and running quantized "qnemo" checkpoints in
nemo.deploy
andnemo.export
modules.Collection: NLP
Changelog
Usage
Building TensorRT-LLM engine and running a basic prompt:
Running Lambada test locally:
Jenkins CI
To run Jenkins, a NeMo User with write access must comment
jenkins
on the PR.Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information