-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add RETRO indexed dataset and set_inference_key_value_memory
inference
#4220
Conversation
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
This pull request introduces 1 alert when merging cd0a84a into 7a9a8f0 - view on LGTM.com new alerts:
|
This pull request introduces 1 alert when merging e4809f7 into 7a9a8f0 - view on LGTM.com new alerts:
|
Signed-off-by: Yi Dong <yidong@nvidia.com>
This pull request introduces 1 alert when merging ab14327 into e838862 - view on LGTM.com new alerts:
|
/blossom-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very few minor comments. LGTM.
nemo/collections/nlp/models/language_modeling/megatron_retrieval_model.py
Outdated
Show resolved
Hide resolved
retrieval_index: MMapRetrievalIndexedDataset, | ||
): | ||
if not HAVE_APEX: | ||
raise ImportError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yidong72 can you add a description of the arguments? (e.g., what is documents?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added comment
assert half == data_ds._index.chunk_size | ||
neighbor_match = tokenizer.ids_to_text(token_ids[:half]) | ||
neighbor_extend = tokenizer.ids_to_text(token_ids[half:]) | ||
print(f' ->K{i}: {neighbor_match} --- {neighbor_extend}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why break retrieval into 2? Only half of each chunk is used for embedding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the retrieved data has two parts. The first part is used to match the query chunk, the second half is the continuation chunk.
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the changes! transformer.py
is getting really big :) I think we should try and minimize modifications to it in the future.
Agree. Also, we should refactor the TransformerLayer class to make it abstract. For different models, we can add different implementations. |
…nce (NVIDIA#4220) * added retrieval index dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * added retrieval db handling Signed-off-by: Yi Dong <yidong@nvidia.com> * create and load data Signed-off-by: Yi Dong <yidong@nvidia.com> * working chunk retrieval Signed-off-by: Yi Dong <yidong@nvidia.com> * retrieval fetch works Signed-off-by: Yi Dong <yidong@nvidia.com> * unit test passes Signed-off-by: Yi Dong <yidong@nvidia.com> * add option to run retrieval preprocess Signed-off-by: Yi Dong <yidong@nvidia.com> * slice into chunks Signed-off-by: Yi Dong <yidong@nvidia.com> * add script to build index Signed-off-by: Yi Dong <yidong@nvidia.com> * building faiss index works Signed-off-by: Yi Dong <yidong@nvidia.com> * speed up the index building Signed-off-by: Yi Dong <yidong@nvidia.com> * added knn map index file Signed-off-by: Yi Dong <yidong@nvidia.com> * workign build knn map Signed-off-by: Yi Dong <yidong@nvidia.com> * added docstring Signed-off-by: Yi Dong <yidong@nvidia.com> * add retro dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * dataset test passes Signed-off-by: Yi Dong <yidong@nvidia.com> * added unittest asserts Signed-off-by: Yi Dong <yidong@nvidia.com> * added dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * training of retro is working Signed-off-by: Yi Dong <yidong@nvidia.com> * remove unused imports Signed-off-by: Yi Dong <yidong@nvidia.com> * fix jenkins Signed-off-by: Yi Dong <yidong@nvidia.com> * added knn example data Signed-off-by: Yi Dong <yidong@nvidia.com> * better print format Signed-off-by: Yi Dong <yidong@nvidia.com> * configure the number of neighbors Signed-off-by: Yi Dong <yidong@nvidia.com> * removed non-used cfg Signed-off-by: Yi Dong <yidong@nvidia.com> * turn on normliazaiton option Signed-off-by: Yi Dong <yidong@nvidia.com> * add layer number offset Signed-off-by: Yi Dong <yidong@nvidia.com> * need to add one Signed-off-by: Yi Dong <yidong@nvidia.com> * use at leaset one layer Signed-off-by: Yi Dong <yidong@nvidia.com> * added inference unit test Signed-off-by: Yi Dong <yidong@nvidia.com> * encoder inference test pass Signed-off-by: Yi Dong <yidong@nvidia.com> * encoder inference is confirmed to work Signed-off-by: Yi Dong <yidong@nvidia.com> * handles another edge case Signed-off-by: Yi Dong <yidong@nvidia.com> * alige relative position to the context Signed-off-by: Yi Dong <yidong@nvidia.com> * chunked cross attention passes test Signed-off-by: Yi Dong <yidong@nvidia.com> * fixed chunk cross attention masked attention Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the padding Signed-off-by: Yi Dong <yidong@nvidia.com> * chunked cross attention layer inference passes unit test Signed-off-by: Yi Dong <yidong@nvidia.com> * working on the decoder Signed-off-by: Yi Dong <yidong@nvidia.com> * one mile stone for decoder Signed-off-by: Yi Dong <yidong@nvidia.com> * decoder is working Signed-off-by: Yi Dong <yidong@nvidia.com> * make encoder infer behave nicely Signed-off-by: Yi Dong <yidong@nvidia.com> * added encoder decoder inference unittest Signed-off-by: Yi Dong <yidong@nvidia.com> * remove bad imports Signed-off-by: Yi Dong <yidong@nvidia.com> * make training work Signed-off-by: Yi Dong <yidong@nvidia.com> * fix failed unit test Signed-off-by: Yi Dong <yidong@nvidia.com> * remove unused variables Signed-off-by: Yi Dong <yidong@nvidia.com> * added run on GPU for unittest Signed-off-by: Yi Dong <yidong@nvidia.com> * add pad id to the preprocessing script Signed-off-by: Yi Dong <yidong@nvidia.com> * added doc string for indexed dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * efficient deduplicate doc Signed-off-by: Yi Dong <yidong@nvidia.com> * add test case data and retrieval use the same indexed data. handles no neighbor padding Signed-off-by: Yi Dong <yidong@nvidia.com> * address the reviewer comments Signed-off-by: Yi Dong <yidong@nvidia.com> * preserve some fraction data not used for retrieval index Signed-off-by: Yi Dong <yidong@nvidia.com> * added perplexity Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the default batch limits Signed-off-by: Yi Dong <yidong@nvidia.com> * address review comment Signed-off-by: Yi Dong <yidong@nvidia.com> * added pad_id valdiation logics Signed-off-by: Yi Dong <yidong@nvidia.com> * fix no attention issue Signed-off-by: Yi Dong <yidong@nvidia.com> * comment the fix, waiting for the fix from apex Signed-off-by: Yi Dong <yidong@nvidia.com> * get rid of pre_decoder final layernorm Signed-off-by: Yi Dong <yidong@nvidia.com> * add index check Signed-off-by: Yi Dong <yidong@nvidia.com> * same implementation Signed-off-by: Yi Dong <yidong@nvidia.com> * fix merge error Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * use the dec num layers to encoder scaling Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * address reviewer comments Signed-off-by: Yi Dong <yidong@nvidia.com> * added the headscale option Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Georg Kucsko <gkucsko@gmail.com>
…nce (NVIDIA#4220) * added retrieval index dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * added retrieval db handling Signed-off-by: Yi Dong <yidong@nvidia.com> * create and load data Signed-off-by: Yi Dong <yidong@nvidia.com> * working chunk retrieval Signed-off-by: Yi Dong <yidong@nvidia.com> * retrieval fetch works Signed-off-by: Yi Dong <yidong@nvidia.com> * unit test passes Signed-off-by: Yi Dong <yidong@nvidia.com> * add option to run retrieval preprocess Signed-off-by: Yi Dong <yidong@nvidia.com> * slice into chunks Signed-off-by: Yi Dong <yidong@nvidia.com> * add script to build index Signed-off-by: Yi Dong <yidong@nvidia.com> * building faiss index works Signed-off-by: Yi Dong <yidong@nvidia.com> * speed up the index building Signed-off-by: Yi Dong <yidong@nvidia.com> * added knn map index file Signed-off-by: Yi Dong <yidong@nvidia.com> * workign build knn map Signed-off-by: Yi Dong <yidong@nvidia.com> * added docstring Signed-off-by: Yi Dong <yidong@nvidia.com> * add retro dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * dataset test passes Signed-off-by: Yi Dong <yidong@nvidia.com> * added unittest asserts Signed-off-by: Yi Dong <yidong@nvidia.com> * added dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * training of retro is working Signed-off-by: Yi Dong <yidong@nvidia.com> * remove unused imports Signed-off-by: Yi Dong <yidong@nvidia.com> * fix jenkins Signed-off-by: Yi Dong <yidong@nvidia.com> * added knn example data Signed-off-by: Yi Dong <yidong@nvidia.com> * better print format Signed-off-by: Yi Dong <yidong@nvidia.com> * configure the number of neighbors Signed-off-by: Yi Dong <yidong@nvidia.com> * removed non-used cfg Signed-off-by: Yi Dong <yidong@nvidia.com> * turn on normliazaiton option Signed-off-by: Yi Dong <yidong@nvidia.com> * add layer number offset Signed-off-by: Yi Dong <yidong@nvidia.com> * need to add one Signed-off-by: Yi Dong <yidong@nvidia.com> * use at leaset one layer Signed-off-by: Yi Dong <yidong@nvidia.com> * added inference unit test Signed-off-by: Yi Dong <yidong@nvidia.com> * encoder inference test pass Signed-off-by: Yi Dong <yidong@nvidia.com> * encoder inference is confirmed to work Signed-off-by: Yi Dong <yidong@nvidia.com> * handles another edge case Signed-off-by: Yi Dong <yidong@nvidia.com> * alige relative position to the context Signed-off-by: Yi Dong <yidong@nvidia.com> * chunked cross attention passes test Signed-off-by: Yi Dong <yidong@nvidia.com> * fixed chunk cross attention masked attention Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the padding Signed-off-by: Yi Dong <yidong@nvidia.com> * chunked cross attention layer inference passes unit test Signed-off-by: Yi Dong <yidong@nvidia.com> * working on the decoder Signed-off-by: Yi Dong <yidong@nvidia.com> * one mile stone for decoder Signed-off-by: Yi Dong <yidong@nvidia.com> * decoder is working Signed-off-by: Yi Dong <yidong@nvidia.com> * make encoder infer behave nicely Signed-off-by: Yi Dong <yidong@nvidia.com> * added encoder decoder inference unittest Signed-off-by: Yi Dong <yidong@nvidia.com> * remove bad imports Signed-off-by: Yi Dong <yidong@nvidia.com> * make training work Signed-off-by: Yi Dong <yidong@nvidia.com> * fix failed unit test Signed-off-by: Yi Dong <yidong@nvidia.com> * remove unused variables Signed-off-by: Yi Dong <yidong@nvidia.com> * added run on GPU for unittest Signed-off-by: Yi Dong <yidong@nvidia.com> * add pad id to the preprocessing script Signed-off-by: Yi Dong <yidong@nvidia.com> * added doc string for indexed dataset Signed-off-by: Yi Dong <yidong@nvidia.com> * efficient deduplicate doc Signed-off-by: Yi Dong <yidong@nvidia.com> * add test case data and retrieval use the same indexed data. handles no neighbor padding Signed-off-by: Yi Dong <yidong@nvidia.com> * address the reviewer comments Signed-off-by: Yi Dong <yidong@nvidia.com> * preserve some fraction data not used for retrieval index Signed-off-by: Yi Dong <yidong@nvidia.com> * added perplexity Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the default batch limits Signed-off-by: Yi Dong <yidong@nvidia.com> * address review comment Signed-off-by: Yi Dong <yidong@nvidia.com> * added pad_id valdiation logics Signed-off-by: Yi Dong <yidong@nvidia.com> * fix no attention issue Signed-off-by: Yi Dong <yidong@nvidia.com> * comment the fix, waiting for the fix from apex Signed-off-by: Yi Dong <yidong@nvidia.com> * get rid of pre_decoder final layernorm Signed-off-by: Yi Dong <yidong@nvidia.com> * add index check Signed-off-by: Yi Dong <yidong@nvidia.com> * same implementation Signed-off-by: Yi Dong <yidong@nvidia.com> * fix merge error Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * use the dec num layers to encoder scaling Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * address reviewer comments Signed-off-by: Yi Dong <yidong@nvidia.com> * added the headscale option Signed-off-by: Yi Dong <yidong@nvidia.com> Co-authored-by: Eric Harper <complex451@gmail.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
What does this PR do ?
set_inference_key_value_memory
capability to RETRO modules so it can run inference efficiently