-
Notifications
You must be signed in to change notification settings - Fork 214
Fix bugs in retriever sdg notebook #522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bugs in retriever sdg notebook #522
Conversation
Signed-off-by: viraman <viraman@nvidia.com>
Signed-off-by: viraman <viraman@nvidia.com>
Signed-off-by: viraman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
…iraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com>
…dia.com Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
Signed-off-by: Vinay Raman <viraman@nvidia.com>
tutorials/nemo-retriever-synthetic-data-generation/config/config.yaml
Outdated
Show resolved
Hide resolved
tutorials/nemo-retriever-synthetic-data-generation/config/config.yaml
Outdated
Show resolved
Hide resolved
tutorials/nemo-retriever-synthetic-data-generation/config/config.yaml
Outdated
Show resolved
Hide resolved
| if __name__ == "__main__": | ||
| dask_client = get_client() | ||
| main() | ||
| # dask_client.cancel(dask_client.futures, force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between this filter.py, generate.py, and main.py? They look nearly identical. They also all look to be CLI scripts but only main.py is mentioned in the README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
README has both generate.py and filter.py mentioned, please have a look.
This is needed if the user just needs to generate data or filter pre-generated data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding two files, can you just add two command line arguments like so?
--generate-only--filter-only
With two new files, it's very hard to tell if the differences between them are correct or buggy. CLI args in a single file make it much easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved, please check the latest commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this yet.
| if __name__ == "__main__": | ||
| dask_client = get_client() | ||
| main() | ||
| # dask_client.cancel(dask_client.futures, force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented out code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there.
Signed-off-by: viraman <viraman@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more minor things. Add the two CLI args to main.py instead of two separate files generate.py and filter.py is the big one though.
| "output_type": "stream", | ||
| "text": [ | ||
| "Generator model used = mistralai/mixtral-8x22b-instruct-v0.1\n" | ||
| "Generator model used = nvdev/mistralai/mixtral-8x22b-instruct-v0.1\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revert model name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolved
Signed-off-by: viraman <viraman@nvidia.com>
Signed-off-by: viraman <viraman@nvidia.com>
…Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com>
Signed-off-by: viraman <viraman@nvidia.com>
ryantwolf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vinay-raman I don't think your changes were pushed or something.
| if __name__ == "__main__": | ||
| dask_client = get_client() | ||
| main() | ||
| # dask_client.cancel(dask_client.futures, force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there.
| if __name__ == "__main__": | ||
| dask_client = get_client() | ||
| main() | ||
| # dask_client.cancel(dask_client.futures, force=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this yet.
Signed-off-by: viraman <viraman@nvidia.com>
* Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * fixed qa bug 5008113, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * bug fixes for generator, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed precommit, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed filters, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed all issues, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed bug with document id, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * check if filtering pipeline is present, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed notebook, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * added functionality to filter pre-generated datasets, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * separated generation & filtering pipelines, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed pre-commit, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * minor changes, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed Ryan Wolf's comments, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * fixed minor bugs in configs, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * removed commented code in main.py, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * added CLI flags for generation & filtering removed code duplication, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * minor fix to quickstart notebook, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * removed filter.py & generate.py, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> --------- Signed-off-by: viraman <viraman@nvidia.com> Signed-off-by: Vinay Raman <viraman@nvidia.com>
* Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * fixed qa bug 5008113, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * bug fixes for generator, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed precommit, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed filters, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed all issues, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed bug with document id, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * check if filtering pipeline is present, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed notebook, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * added functionality to filter pre-generated datasets, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * separated generation & filtering pipelines, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed pre-commit, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * minor changes, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed Ryan Wolf's comments, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * fixed minor bugs in configs, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * removed commented code in main.py, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * added CLI flags for generation & filtering removed code duplication, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * minor fix to quickstart notebook, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * removed filter.py & generate.py, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> --------- Signed-off-by: viraman <viraman@nvidia.com> Signed-off-by: Vinay Raman <viraman@nvidia.com>
* Signed-off by viraman@nvidia.com * Signed-off by viraman@nvidia.com * fixed qa bug 5008113, Signed-off by viraman@nvidia.com * bug fixes for generator, Signed-off by viraman@nvidia.com * fixed precommit, Signed-off by viraman@nvidia.com * fixed filters, Signed-off by viraman@nvidia.com * fixed all issues, Signed-off by viraman@nvidia.com * fixed bug with document id, Signed-off by viraman@nvidia.com * check if filtering pipeline is present, Signed-off by viraman@nvidia.com * fixed notebook, Signed-off by viraman@nvidia.com * added functionality to filter pre-generated datasets, Signed-off by viraman@nvidia.com * separated generation & filtering pipelines, Signed-off by viraman@nvidia.com * fixed pre-commit, Signed-off by viraman@nvidia.com * minor changes, Signed-off by viraman@nvidia.com * fixed Ryan Wolf's comments, Signed-off by viraman@nvidia.com * fixed minor bugs in configs, Signed-off by viraman@nvidia.com * removed commented code in main.py, Signed-off by viraman@nvidia.com * added CLI flags for generation & filtering removed code duplication, Signed-off by viraman@nvidia.com * minor fix to quickstart notebook, Signed-off by viraman@nvidia.com * removed filter.py & generate.py, Signed-off by viraman@nvidia.com --------- Signed-off-by: viraman <viraman@nvidia.com> Signed-off-by: Vinay Raman <viraman@nvidia.com> Co-authored-by: vinay-raman <98057837+vinay-raman@users.noreply.github.com>
* Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * fixed qa bug 5008113, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * bug fixes for generator, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed precommit, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed filters, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed all issues, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed bug with document id, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * check if filtering pipeline is present, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed notebook, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * added functionality to filter pre-generated datasets, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * separated generation & filtering pipelines, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed pre-commit, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * minor changes, Signed-off by viraman@nvidia.com Signed-off-by: Vinay Raman <viraman@nvidia.com> * fixed Ryan Wolf's comments, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * fixed minor bugs in configs, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * removed commented code in main.py, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * added CLI flags for generation & filtering removed code duplication, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * minor fix to quickstart notebook, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> * removed filter.py & generate.py, Signed-off by viraman@nvidia.com Signed-off-by: viraman <viraman@nvidia.com> --------- Signed-off-by: viraman <viraman@nvidia.com> Signed-off-by: Vinay Raman <viraman@nvidia.com>
Description
Fix bugs in retriever sdg notebook.
Checklist