-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding RETRO model Faiss sharding index and KNN sharding index #4713
Conversation
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
This pull request introduces 4 alerts when merging 46dfb46 into 28524d6 - view on LGTM.com new alerts:
|
Signed-off-by: Yi Dong <doyend@gmail.com>
This pull request introduces 4 alerts when merging 3826add into c0bfa6f - view on LGTM.com new alerts:
|
Signed-off-by: Yi Dong <doyend@gmail.com>
This pull request introduces 4 alerts when merging 1b1adc2 into c0bfa6f - view on LGTM.com new alerts:
|
Signed-off-by: Yi Dong <doyend@gmail.com>
This pull request introduces 4 alerts when merging ae42d7d into c0bfa6f - view on LGTM.com new alerts:
|
This pull request introduces 4 alerts when merging e027c2b into f53bb34 - view on LGTM.com new alerts:
|
Signed-off-by: Yi Dong <yidong@nvidia.com>
This pull request introduces 4 alerts when merging fd0b19c into f53bb34 - view on LGTM.com new alerts:
|
This pull request introduces 4 alerts when merging 8d66ac5 into e8ba60b - view on LGTM.com new alerts:
|
Signed-off-by: Yi Dong <yidong@nvidia.com>
This pull request introduces 4 alerts when merging 6b4e3a9 into 1c16b96 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Great PR. See minor comments regarding missing docstrings/comments
# self.grad_clip_pl_default = True | ||
|
||
if hasattr(self.cfg, "shape_file"): | ||
# self.grad_clip_pl_default = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. removed it
# add shape file | ||
self.register_artifact("model.shape_file", self.cfg.shape_file), | ||
|
||
for name, tensor in self.named_parameters(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment explaining this initialization scheme?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
attention_mask = inputs[3] | ||
rotary_pos_emb = inputs[4] | ||
relative_position_bias = inputs[5] | ||
if len(inputs) == 7: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks very brittle.Perhaps we can use here the same solution we used in nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py
with self._get_forward_output_only_func
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the self._get_forward_output_only_func
method, which has a help function to convert dictionary arguments to positional arguments. Here I have to translate positional arguments into dictionary arguments. Not sure what's a better solution.
return start, total_chunks | ||
|
||
|
||
def process_sentence_chunks( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docsring describing what the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
logging.info(f'Index neighbors: {f.K}') | ||
logging.info(f'Index chunk start id: {f.chunk_start_id}') | ||
logging.info(f'Index chunk end id: {f.chunk_end_id}') | ||
sys.exit(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a few more high level comments in the code below explaining the purpose of code blocks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do have a high level comment about the script at the beginning of the file with some examples of how to use it.
@@ -105,7 +105,7 @@ def get_tokenizer(args): | |||
logging.info(f'Data index has {data_ds.chunks} chunks') | |||
logging.info(f'Retrieval Data index has {retrieval_ds.chunks} chunks') | |||
logging.info(f'KNN index has {knn_index.K} neighbors') | |||
assert knn_index.knn_map.max() < retrieval_ds.chunks | |||
# assert knn_index.knn_map.max() < retrieval_ds.chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comment if not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
@@ -68,7 +125,15 @@ def get_tokenizer(args): | |||
|
|||
|
|||
def process_sentence_chunks( | |||
ds: MMapRetrievalIndexedDataset, tokenizer, chunk_size: int, warm_up_size: int, percent: float | |||
ds: MMapRetrievalIndexedDataset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
position += 1 | ||
|
||
|
||
def dedup(chunk_id_to_range, I, tmp_neighbors, chunk_id_start, offset): | ||
""" | ||
deduplicate the KNN who are from the same document as the data chunks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to do this? Can we not have multiple neighbors from a given document? Or am I misunderstanding something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is RETRO's trick to train on the same data that is used as retrieval data. We simply remove the neighbors from the same document as the input data chunk.
position += 1 | ||
|
||
|
||
def dedup(chunk_id_to_range, I, tmp_neighbors, chunk_id_start, offset): | ||
""" | ||
deduplicate the KNN who are from the same document as the data chunks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to do this? Can we not have multiple neighbors from a given document? Or am I misunderstanding something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
@@ -175,6 +264,8 @@ def get_emb(): | |||
default='bert-base-nli-mean-tokens', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yidong72 I thought you were now using all-mpnet-base-v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. this is just the default argument. When using this script, I set it to all-mpnet-base-v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
stage-0: train on the dataset, example, | ||
|
||
```python | ||
python scripts/nlp_language_modeling/build_retrieval_index.py \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training uses the entirety of the dataset right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For training the Faiss index, we use a fraction of the chunks specified by train_index_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks great to me. I have added some comments.
) | ||
parser.add_argument( | ||
'--train_index_size', type=int, required=True, help='The number of sentences that is used to train the index', | ||
'--train_index_size', type=int, required=False, help='The number of sentences that is used to train the index', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So do you use the entirety or only a subset of data to train the index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a subset of the data. It is impractical to use all the data as my dataset can be as large as 200billion tokens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Makes sense. I am guessing it shouldn't even matter that much for performance.
Signed-off-by: Yi Dong <doyend@gmail.com>
This pull request introduces 4 alerts when merging c978671 into d29a66b - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…A#4713) * dump shape Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the readout layer Signed-off-by: Yi Dong <yidong@nvidia.com> * working Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the scaling Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * save file by config Signed-off-by: Yi Dong <yidong@nvidia.com> * confirm tp works Signed-off-by: Yi Dong <yidong@nvidia.com> * added support of ckpt conversion Signed-off-by: Yi Dong <yidong@nvidia.com> * add loss scale Signed-off-by: Yi Dong <yidong@nvidia.com> * padding only Signed-off-by: Yi Dong <yidong@nvidia.com> * add comments, set normfactor in file Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * set it norm factor correctly Signed-off-by: Yi Dong <yidong@nvidia.com> * match the original one Signed-off-by: Yi Dong <yidong@nvidia.com> * fix weight init Signed-off-by: Yi Dong <yidong@nvidia.com> * hard code the output std Signed-off-by: Yi Dong <yidong@nvidia.com> * seq parallel base Signed-off-by: Yi Dong <yidong@nvidia.com> * recursively process the files Signed-off-by: Yi Dong <yidong@nvidia.com> * add filter filter Signed-off-by: Yi Dong <yidong@nvidia.com> * add the shape file Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the inconsistence in training Signed-off-by: Yi Dong <yidong@nvidia.com> * register optimizers Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the style Signed-off-by: Yi Dong <yidong@nvidia.com> * fix LGTM Signed-off-by: Yi Dong <yidong@nvidia.com> * added model training performance ci test Signed-off-by: Yi Dong <yidong@nvidia.com> * escape properly Signed-off-by: Yi Dong <yidong@nvidia.com> * multiple line quote Signed-off-by: Yi Dong <yidong@nvidia.com> * added python to ci test Signed-off-by: Yi Dong <yidong@nvidia.com> * reduce accuracy check Signed-off-by: Yi Dong <yidong@nvidia.com> * use the same data mapping Signed-off-by: Yi Dong <yidong@nvidia.com> * only test it on A100 gpu Signed-off-by: Yi Dong <yidong@nvidia.com> * added license and use base yaml conf Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 0 Signed-off-by: Yi Dong <yidong@nvidia.com> * add time info Signed-off-by: Yi Dong <yidong@nvidia.com> * fix one bug Signed-off-by: Yi Dong <yidong@nvidia.com> * reduce nlist Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 1 Signed-off-by: Yi Dong <yidong@nvidia.com> * working stage 1 Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 2 Signed-off-by: Yi Dong <yidong@nvidia.com> * fix one bug Signed-off-by: Yi Dong <yidong@nvidia.com> * time add index Signed-off-by: Yi Dong <yidong@nvidia.com> * document data loading time Signed-off-by: Yi Dong <yidong@nvidia.com> * works with CPU only environment Signed-off-by: Yi Dong <yidong@nvidia.com> * add doc string Signed-off-by: Yi Dong <yidong@nvidia.com> * handle gpu faiss Signed-off-by: Yi Dong <yidong@nvidia.com> * added sharding knn index Signed-off-by: Yi Dong <yidong@nvidia.com> * added unit test for sharding index Signed-off-by: Yi Dong <yidong@nvidia.com> * knn stage processing Signed-off-by: Yi Dong <yidong@nvidia.com> * fix shard gpu index Signed-off-by: Yi Dong <yidong@nvidia.com> * add the unit test for dedup logics Signed-off-by: Yi Dong <yidong@nvidia.com> * fix parallel jit Signed-off-by: Yi Dong <yidong@nvidia.com> * added doc Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * use the sharded version Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * fix save checkpoint Signed-off-by: Yi Dong <yidong@nvidia.com> * fix id update Signed-off-by: Yi Dong <yidong@nvidia.com> * move to constructor; Signed-off-by: Yi Dong <doyend@gmail.com> * fix the test Signed-off-by: Yi Dong <doyend@gmail.com> * make checkpoint activation work Signed-off-by: Yi Dong <doyend@gmail.com> * reduce duplicate Signed-off-by: Yi Dong <doyend@gmail.com> * fix selective attention Signed-off-by: Yi Dong <doyend@gmail.com> * fix the style Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the max check Signed-off-by: Yi Dong <yidong@nvidia.com> * added comments Signed-off-by: Yi Dong <doyend@gmail.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
…A#4713) * dump shape Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the readout layer Signed-off-by: Yi Dong <yidong@nvidia.com> * working Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the scaling Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * save file by config Signed-off-by: Yi Dong <yidong@nvidia.com> * confirm tp works Signed-off-by: Yi Dong <yidong@nvidia.com> * added support of ckpt conversion Signed-off-by: Yi Dong <yidong@nvidia.com> * add loss scale Signed-off-by: Yi Dong <yidong@nvidia.com> * padding only Signed-off-by: Yi Dong <yidong@nvidia.com> * add comments, set normfactor in file Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * set it norm factor correctly Signed-off-by: Yi Dong <yidong@nvidia.com> * match the original one Signed-off-by: Yi Dong <yidong@nvidia.com> * fix weight init Signed-off-by: Yi Dong <yidong@nvidia.com> * hard code the output std Signed-off-by: Yi Dong <yidong@nvidia.com> * seq parallel base Signed-off-by: Yi Dong <yidong@nvidia.com> * recursively process the files Signed-off-by: Yi Dong <yidong@nvidia.com> * add filter filter Signed-off-by: Yi Dong <yidong@nvidia.com> * add the shape file Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the inconsistence in training Signed-off-by: Yi Dong <yidong@nvidia.com> * register optimizers Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the style Signed-off-by: Yi Dong <yidong@nvidia.com> * fix LGTM Signed-off-by: Yi Dong <yidong@nvidia.com> * added model training performance ci test Signed-off-by: Yi Dong <yidong@nvidia.com> * escape properly Signed-off-by: Yi Dong <yidong@nvidia.com> * multiple line quote Signed-off-by: Yi Dong <yidong@nvidia.com> * added python to ci test Signed-off-by: Yi Dong <yidong@nvidia.com> * reduce accuracy check Signed-off-by: Yi Dong <yidong@nvidia.com> * use the same data mapping Signed-off-by: Yi Dong <yidong@nvidia.com> * only test it on A100 gpu Signed-off-by: Yi Dong <yidong@nvidia.com> * added license and use base yaml conf Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 0 Signed-off-by: Yi Dong <yidong@nvidia.com> * add time info Signed-off-by: Yi Dong <yidong@nvidia.com> * fix one bug Signed-off-by: Yi Dong <yidong@nvidia.com> * reduce nlist Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 1 Signed-off-by: Yi Dong <yidong@nvidia.com> * working stage 1 Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 2 Signed-off-by: Yi Dong <yidong@nvidia.com> * fix one bug Signed-off-by: Yi Dong <yidong@nvidia.com> * time add index Signed-off-by: Yi Dong <yidong@nvidia.com> * document data loading time Signed-off-by: Yi Dong <yidong@nvidia.com> * works with CPU only environment Signed-off-by: Yi Dong <yidong@nvidia.com> * add doc string Signed-off-by: Yi Dong <yidong@nvidia.com> * handle gpu faiss Signed-off-by: Yi Dong <yidong@nvidia.com> * added sharding knn index Signed-off-by: Yi Dong <yidong@nvidia.com> * added unit test for sharding index Signed-off-by: Yi Dong <yidong@nvidia.com> * knn stage processing Signed-off-by: Yi Dong <yidong@nvidia.com> * fix shard gpu index Signed-off-by: Yi Dong <yidong@nvidia.com> * add the unit test for dedup logics Signed-off-by: Yi Dong <yidong@nvidia.com> * fix parallel jit Signed-off-by: Yi Dong <yidong@nvidia.com> * added doc Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * use the sharded version Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * fix save checkpoint Signed-off-by: Yi Dong <yidong@nvidia.com> * fix id update Signed-off-by: Yi Dong <yidong@nvidia.com> * move to constructor; Signed-off-by: Yi Dong <doyend@gmail.com> * fix the test Signed-off-by: Yi Dong <doyend@gmail.com> * make checkpoint activation work Signed-off-by: Yi Dong <doyend@gmail.com> * reduce duplicate Signed-off-by: Yi Dong <doyend@gmail.com> * fix selective attention Signed-off-by: Yi Dong <doyend@gmail.com> * fix the style Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the max check Signed-off-by: Yi Dong <yidong@nvidia.com> * added comments Signed-off-by: Yi Dong <doyend@gmail.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
…A#4713) * dump shape Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the readout layer Signed-off-by: Yi Dong <yidong@nvidia.com> * working Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the scaling Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * save file by config Signed-off-by: Yi Dong <yidong@nvidia.com> * confirm tp works Signed-off-by: Yi Dong <yidong@nvidia.com> * added support of ckpt conversion Signed-off-by: Yi Dong <yidong@nvidia.com> * add loss scale Signed-off-by: Yi Dong <yidong@nvidia.com> * padding only Signed-off-by: Yi Dong <yidong@nvidia.com> * add comments, set normfactor in file Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * set it norm factor correctly Signed-off-by: Yi Dong <yidong@nvidia.com> * match the original one Signed-off-by: Yi Dong <yidong@nvidia.com> * fix weight init Signed-off-by: Yi Dong <yidong@nvidia.com> * hard code the output std Signed-off-by: Yi Dong <yidong@nvidia.com> * seq parallel base Signed-off-by: Yi Dong <yidong@nvidia.com> * recursively process the files Signed-off-by: Yi Dong <yidong@nvidia.com> * add filter filter Signed-off-by: Yi Dong <yidong@nvidia.com> * add the shape file Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the inconsistence in training Signed-off-by: Yi Dong <yidong@nvidia.com> * register optimizers Signed-off-by: Yi Dong <yidong@nvidia.com> * fix the style Signed-off-by: Yi Dong <yidong@nvidia.com> * fix LGTM Signed-off-by: Yi Dong <yidong@nvidia.com> * added model training performance ci test Signed-off-by: Yi Dong <yidong@nvidia.com> * escape properly Signed-off-by: Yi Dong <yidong@nvidia.com> * multiple line quote Signed-off-by: Yi Dong <yidong@nvidia.com> * added python to ci test Signed-off-by: Yi Dong <yidong@nvidia.com> * reduce accuracy check Signed-off-by: Yi Dong <yidong@nvidia.com> * use the same data mapping Signed-off-by: Yi Dong <yidong@nvidia.com> * only test it on A100 gpu Signed-off-by: Yi Dong <yidong@nvidia.com> * added license and use base yaml conf Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 0 Signed-off-by: Yi Dong <yidong@nvidia.com> * add time info Signed-off-by: Yi Dong <yidong@nvidia.com> * fix one bug Signed-off-by: Yi Dong <yidong@nvidia.com> * reduce nlist Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 1 Signed-off-by: Yi Dong <yidong@nvidia.com> * working stage 1 Signed-off-by: Yi Dong <yidong@nvidia.com> * stage 2 Signed-off-by: Yi Dong <yidong@nvidia.com> * fix one bug Signed-off-by: Yi Dong <yidong@nvidia.com> * time add index Signed-off-by: Yi Dong <yidong@nvidia.com> * document data loading time Signed-off-by: Yi Dong <yidong@nvidia.com> * works with CPU only environment Signed-off-by: Yi Dong <yidong@nvidia.com> * add doc string Signed-off-by: Yi Dong <yidong@nvidia.com> * handle gpu faiss Signed-off-by: Yi Dong <yidong@nvidia.com> * added sharding knn index Signed-off-by: Yi Dong <yidong@nvidia.com> * added unit test for sharding index Signed-off-by: Yi Dong <yidong@nvidia.com> * knn stage processing Signed-off-by: Yi Dong <yidong@nvidia.com> * fix shard gpu index Signed-off-by: Yi Dong <yidong@nvidia.com> * add the unit test for dedup logics Signed-off-by: Yi Dong <yidong@nvidia.com> * fix parallel jit Signed-off-by: Yi Dong <yidong@nvidia.com> * added doc Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * use the sharded version Signed-off-by: Yi Dong <yidong@nvidia.com> * fix style Signed-off-by: Yi Dong <yidong@nvidia.com> * fix save checkpoint Signed-off-by: Yi Dong <yidong@nvidia.com> * fix id update Signed-off-by: Yi Dong <yidong@nvidia.com> * move to constructor; Signed-off-by: Yi Dong <doyend@gmail.com> * fix the test Signed-off-by: Yi Dong <doyend@gmail.com> * make checkpoint activation work Signed-off-by: Yi Dong <doyend@gmail.com> * reduce duplicate Signed-off-by: Yi Dong <doyend@gmail.com> * fix selective attention Signed-off-by: Yi Dong <doyend@gmail.com> * fix the style Signed-off-by: Yi Dong <yidong@nvidia.com> * remove the max check Signed-off-by: Yi Dong <yidong@nvidia.com> * added comments Signed-off-by: Yi Dong <doyend@gmail.com> Signed-off-by: Yi Dong <yidong@nvidia.com> Signed-off-by: Yi Dong <doyend@gmail.com> Co-authored-by: Yi Dong <doyend@gmail.com> Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com> Signed-off-by: Hainan Xu <hainanx@nvidia.com>
What does this PR do ?
To handle very large dataset, e.g. hundreds of gigabyte to terrabyte compressed raw data, we need multiple nodes to create sharding index and combine them together.
It enhance the KNN index data structure to handling sharding index.
In this PR, it creates GPU Faiss index in 3 stages.
It creates KNN index in 2 stages.
It includes some unit tests to cover the KNN sharding index, dedup funcitons etc.
I have been used it to successfuly create Faiss and KNN index for 350G dataset in a slurm cluster enviroment.