Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RETRO indexed dataset and set_inference_key_value_memory inference #4220

Merged
merged 82 commits into from
Jun 2, 2022

Conversation

yidong72
Copy link
Collaborator

What does this PR do ?

  1. Added the RETRO index dataset for both data and retrieval data
  2. Added the KNN Map index dataset
  3. Added unit tests for all the index dataset
  4. Added RETRO dataset that uses index dataset
  5. Added the scripts to preprocess, build Faiss index and generate KNN map index for training
  6. Added the set_inference_key_value_memory capability to RETRO modules so it can run inference efficiently
  7. Added unit tests for efficient inference of all RETRO modules.

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
yidong72 and others added 9 commits May 25, 2022 23:37
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented May 30, 2022

This pull request introduces 1 alert when merging cd0a84a into 7a9a8f0 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Yi Dong <yidong@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented May 30, 2022

This pull request introduces 1 alert when merging e4809f7 into 7a9a8f0 - view on LGTM.com

new alerts:

  • 1 for Unused import

Signed-off-by: Yi Dong <yidong@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented May 31, 2022

This pull request introduces 1 alert when merging ab14327 into e838862 - view on LGTM.com

new alerts:

  • 1 for Unused import

@okuchaiev
Copy link
Member

/blossom-ci

Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very few minor comments. LGTM.

retrieval_index: MMapRetrievalIndexedDataset,
):
if not HAVE_APEX:
raise ImportError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yidong72 can you add a description of the arguments? (e.g., what is documents?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment

assert half == data_ds._index.chunk_size
neighbor_match = tokenizer.ids_to_text(token_ids[:half])
neighbor_extend = tokenizer.ids_to_text(token_ids[half:])
print(f' ->K{i}: {neighbor_match} --- {neighbor_extend}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why break retrieval into 2? Only half of each chunk is used for embedding?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the retrieved data has two parts. The first part is used to match the query chunk, the second half is the continuation chunk.

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Copy link
Contributor

@MaximumEntropy MaximumEntropy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes! transformer.py is getting really big :) I think we should try and minimize modifications to it in the future.

@yidong72
Copy link
Collaborator Author

yidong72 commented Jun 2, 2022

Thanks for making the changes! transformer.py is getting really big :) I think we should try and minimize modifications to it in the future.

Agree. Also, we should refactor the TransformerLayer class to make it abstract. For different models, we can add different implementations.

@yidong72 yidong72 merged commit 75c2d82 into main Jun 2, 2022
@yidong72 yidong72 deleted the feature_retrieval_idx branch June 2, 2022 13:11
gkucsko pushed a commit to gkucsko/NeMo that referenced this pull request Jun 2, 2022
…nce (NVIDIA#4220)

* added retrieval index dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added retrieval db handling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* create and load data

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working chunk retrieval

Signed-off-by: Yi Dong <yidong@nvidia.com>

* retrieval fetch works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* unit test passes

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add option to run retrieval preprocess

Signed-off-by: Yi Dong <yidong@nvidia.com>

* slice into chunks

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add script to build index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* building faiss index works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* speed up the index building

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added knn map index file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* workign build knn map

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added docstring

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add retro dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* dataset test passes

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added unittest asserts

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* training of retro is working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove unused imports

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix jenkins

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added knn example data

Signed-off-by: Yi Dong <yidong@nvidia.com>

* better print format

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the number of neighbors

Signed-off-by: Yi Dong <yidong@nvidia.com>

* removed non-used cfg

Signed-off-by: Yi Dong <yidong@nvidia.com>

* turn on normliazaiton option

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add layer number offset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* need to add one

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use at leaset one layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added inference unit test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* encoder inference test pass

Signed-off-by: Yi Dong <yidong@nvidia.com>

* encoder inference is confirmed to work

Signed-off-by: Yi Dong <yidong@nvidia.com>

* handles another edge case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* alige relative position to the context

Signed-off-by: Yi Dong <yidong@nvidia.com>

* chunked cross attention passes test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fixed chunk cross attention masked attention

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the padding

Signed-off-by: Yi Dong <yidong@nvidia.com>

* chunked cross attention layer inference passes unit test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working on the decoder

Signed-off-by: Yi Dong <yidong@nvidia.com>

* one mile stone for decoder

Signed-off-by: Yi Dong <yidong@nvidia.com>

* decoder is working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make encoder infer behave nicely

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added encoder decoder inference unittest

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove bad imports

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make training work

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix failed unit test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove unused variables

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added run on GPU for unittest

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add pad id to the preprocessing script

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added doc string for indexed dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* efficient deduplicate doc

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add test case data and retrieval use the same indexed data. handles no neighbor padding

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address the reviewer comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

* preserve some fraction data not used for retrieval index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added perplexity

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the default batch limits

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address review comment

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added pad_id valdiation logics

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix no attention issue

Signed-off-by: Yi Dong <yidong@nvidia.com>

* comment the fix, waiting for the fix from apex

Signed-off-by: Yi Dong <yidong@nvidia.com>

* get rid of pre_decoder final layernorm

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add index check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* same implementation

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix merge error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the dec num layers to encoder scaling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address reviewer comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the headscale option

Signed-off-by: Yi Dong <yidong@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Georg Kucsko <gkucsko@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
…nce (NVIDIA#4220)

* added retrieval index dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added retrieval db handling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* create and load data

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working chunk retrieval

Signed-off-by: Yi Dong <yidong@nvidia.com>

* retrieval fetch works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* unit test passes

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add option to run retrieval preprocess

Signed-off-by: Yi Dong <yidong@nvidia.com>

* slice into chunks

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add script to build index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* building faiss index works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* speed up the index building

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added knn map index file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* workign build knn map

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added docstring

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add retro dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* dataset test passes

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added unittest asserts

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* training of retro is working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove unused imports

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix jenkins

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added knn example data

Signed-off-by: Yi Dong <yidong@nvidia.com>

* better print format

Signed-off-by: Yi Dong <yidong@nvidia.com>

* configure the number of neighbors

Signed-off-by: Yi Dong <yidong@nvidia.com>

* removed non-used cfg

Signed-off-by: Yi Dong <yidong@nvidia.com>

* turn on normliazaiton option

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add layer number offset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* need to add one

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use at leaset one layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added inference unit test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* encoder inference test pass

Signed-off-by: Yi Dong <yidong@nvidia.com>

* encoder inference is confirmed to work

Signed-off-by: Yi Dong <yidong@nvidia.com>

* handles another edge case

Signed-off-by: Yi Dong <yidong@nvidia.com>

* alige relative position to the context

Signed-off-by: Yi Dong <yidong@nvidia.com>

* chunked cross attention passes test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fixed chunk cross attention masked attention

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the padding

Signed-off-by: Yi Dong <yidong@nvidia.com>

* chunked cross attention layer inference passes unit test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working on the decoder

Signed-off-by: Yi Dong <yidong@nvidia.com>

* one mile stone for decoder

Signed-off-by: Yi Dong <yidong@nvidia.com>

* decoder is working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make encoder infer behave nicely

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added encoder decoder inference unittest

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove bad imports

Signed-off-by: Yi Dong <yidong@nvidia.com>

* make training work

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix failed unit test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove unused variables

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added run on GPU for unittest

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add pad id to the preprocessing script

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added doc string for indexed dataset

Signed-off-by: Yi Dong <yidong@nvidia.com>

* efficient deduplicate doc

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add test case data and retrieval use the same indexed data. handles no neighbor padding

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address the reviewer comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

* preserve some fraction data not used for retrieval index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added perplexity

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the default batch limits

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address review comment

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added pad_id valdiation logics

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix no attention issue

Signed-off-by: Yi Dong <yidong@nvidia.com>

* comment the fix, waiting for the fix from apex

Signed-off-by: Yi Dong <yidong@nvidia.com>

* get rid of pre_decoder final layernorm

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add index check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* same implementation

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix merge error

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the dec num layers to encoder scaling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* address reviewer comments

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added the headscale option

Signed-off-by: Yi Dong <yidong@nvidia.com>

Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants