-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Address VDR feedback for NeMo FW evaluations #13701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Abhishree <abhishreetm@gmail.com>
jgerh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed Tech Pubs review of docs/source/evaluation/evaluation-doc.rst and provided a few copyedits, formatting updates, and suggested text revisions.
| NeMo-Run. This method is quick and easy, making it ideal for evaluation on a local workstation with GPUs, as it | ||
| facilitates easier debugging. However, for running evaluations on clusters, it is recommended to use NeMo-Run for its | ||
| ease of use. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a list of line-by-line edits to read-only text. these edits should also be addressed.
Lines 4/4 to 13/13
Please fix bullet list syntax (I think it is causing the following paragraphs to be bold). See the reStructuredText Guide here: https://aschilling.gitlab-master-pages.nvidia.com/documentation/repo_docs/latest/rst-guide.html. Replace asterisk with hyphen, delete leading indent, and separate sentence from bulleted list with a blank line.
This guide provides detailed instructions on evaluating NeMo 2.0 checkpoints using the NVIDIA Evals Factory <https://pypi.org/project/nvidia-lm-eval/>__ within the NeMo Framework. Supported benchmarks include:
- GPQA
- GSM8K
- IFEval
- MGSM
- MMLU
- MMLU-Pro
- MMLU-Redux
- Wikilingua
Line 29/29 - 34/34
Same comment about bulleted list
The NVIDIA Evals Factory provides the following predefined configurations for evaluating the completions endpoint:
gsm8kmgsmmmlummlu_prommlu_redux
Line 36/36 - Line 44/44
same comment about bulleted list
It also provides the following configurations for evaluating the chat endpoint:
gpqa_diamond_cotgsm8k_cot_instructifevalmgsm_cotmmlu_instructmmlu_pro_instructmmlu_redux_instructwikilingua
Line 67/70 - 70/73
revise sentence (no "killed")
The entrypoint for evaluation is the `evaluatemethod defined innemo/collections/llm/api.py``. To run evaluations on the deployed model, use the following command. Make sure to open a new terminal within the same container to execute it. For longer evaluations, it is advisable to run both the deploy and evaluate commands in tmux sessions to prevent the processes from being terminated unexpectedly and aborting the runs.
Line 86/89
revise note
.. note::
Please refer to the deploy and evaluate methods in nemo/collections/llm/api.py to review all available argument options, as the provided commands are only examples and do not include all arguments or their default values. For more detailed information on the arguments used in the ApiEndpoint and ConfigParams classes for evaluation, see the source code at nemo/collections/llm/evaluation/api.py <https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/llm/evaluation/api.py>__.
Line 96/99
Make sure the # characters match the exact length of the heading text.
Line 98/101 - 99/102
remove period
The evaluation.py <https://github.com/NVIDIA/NeMo/blob/main/scripts/llm/evaluation.py>__ script serves as a reference for launching evaluations with NeMo-Run.
Line 120/123
Make sure the # characters match the exact length of the heading text.
Line 137/140
Make sure the - characters match the exact length of the heading text.
Line 164/167 - Line 168/178
revise sentence (no "killed")
The evaluate method defined in nemo/collections/llm/api.py supports the legacy way of evaluating models. To run evaluations on the deployed model, use the following command. Make sure to pass the nemo_checkpoint_path and url parameters, as they are required for using the legacy evaluation code. Open a new terminal within the same container to execute it. For longer evaluations, it is advisable to run both the deploy and evaluate commands in tmux sessions to prevent the processes from being interrupted or terminated unexpectedly, which could cause the runs to abort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgerh could you please review again, I have addressed all your comments. Thank you!
marta-sd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com>
Signed-off-by: Abhishree <abhishreetm@gmail.com>
jgerh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed tech pubs review of latest copyedits and approved.
|
Fast merging since this is a docs change only |
* Address VDR feedback Signed-off-by: Abhishree <abhishreetm@gmail.com> * Update docs/source/evaluation/evaluation-doc.rst Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * Address code review comments Signed-off-by: Abhishree <abhishreetm@gmail.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: jianbinc <shjwudp@gmail.com>
* Address VDR feedback Signed-off-by: Abhishree <abhishreetm@gmail.com> * Update docs/source/evaluation/evaluation-doc.rst Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> * Address code review comments Signed-off-by: Abhishree <abhishreetm@gmail.com> --------- Signed-off-by: Abhishree <abhishreetm@gmail.com> Signed-off-by: Abhishree Thittenamane <47577437+athitten@users.noreply.github.com> Co-authored-by: jgerh <163925524+jgerh@users.noreply.github.com>
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Adds details to evaluation docs addressing the VDR feedback for evaluations in NeMo FW.
Collection: [Note which collection this PR will affect]
Changelog
Usage
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information