Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding RETRO model Faiss sharding index and KNN sharding index #4713

Merged
merged 78 commits into from
Sep 7, 2022

Conversation

yidong72
Copy link
Collaborator

@yidong72 yidong72 commented Aug 9, 2022

What does this PR do ?

To handle very large dataset, e.g. hundreds of gigabyte to terrabyte compressed raw data, we need multiple nodes to create sharding index and combine them together.
It enhance the KNN index data structure to handling sharding index.

In this PR, it creates GPU Faiss index in 3 stages.

  1. stage 0, train Faiss index structure.
  2. stage 1, create Faiss sharding index
  3. stage 2, merge sharding index into one.

It creates KNN index in 2 stages.

  1. stage 1, create KNN sharding index
  2. stage 2, merge KNN sharding index into one.

It includes some unit tests to cover the KNN sharding index, dedup funcitons etc.

I have been used it to successfuly create Faiss and KNN index for 350G dataset in a slurm cluster enviroment.

yidong72 and others added 30 commits July 5, 2022 21:26
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <yidong@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Aug 19, 2022

This pull request introduces 4 alerts when merging 46dfb46 into 28524d6 - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

Signed-off-by: Yi Dong <doyend@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Aug 23, 2022

This pull request introduces 4 alerts when merging 3826add into c0bfa6f - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

Signed-off-by: Yi Dong <doyend@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Aug 23, 2022

This pull request introduces 4 alerts when merging 1b1adc2 into c0bfa6f - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

Signed-off-by: Yi Dong <doyend@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Aug 23, 2022

This pull request introduces 4 alerts when merging ae42d7d into c0bfa6f - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

@okuchaiev okuchaiev requested a review from soumye August 31, 2022 17:17
@lgtm-com
Copy link

lgtm-com bot commented Aug 31, 2022

This pull request introduces 4 alerts when merging e027c2b into f53bb34 - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

Signed-off-by: Yi Dong <yidong@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Aug 31, 2022

This pull request introduces 4 alerts when merging fd0b19c into f53bb34 - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

@lgtm-com
Copy link

lgtm-com bot commented Aug 31, 2022

This pull request introduces 4 alerts when merging 8d66ac5 into e8ba60b - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

Signed-off-by: Yi Dong <yidong@nvidia.com>
@lgtm-com
Copy link

lgtm-com bot commented Sep 3, 2022

This pull request introduces 4 alerts when merging 6b4e3a9 into 1c16b96 - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

michalivne
michalivne previously approved these changes Sep 6, 2022
Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great PR. See minor comments regarding missing docstrings/comments

# self.grad_clip_pl_default = True

if hasattr(self.cfg, "shape_file"):
# self.grad_clip_pl_default = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment required?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. removed it

# add shape file
self.register_artifact("model.shape_file", self.cfg.shape_file),

for name, tensor in self.named_parameters():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining this initialization scheme?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

attention_mask = inputs[3]
rotary_pos_emb = inputs[4]
relative_position_bias = inputs[5]
if len(inputs) == 7:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very brittle.Perhaps we can use here the same solution we used in nemo/collections/nlp/models/language_modeling/megatron_lm_encoder_decoder_model.py with self._get_forward_output_only_func ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the self._get_forward_output_only_func method, which has a help function to convert dictionary arguments to positional arguments. Here I have to translate positional arguments into dictionary arguments. Not sure what's a better solution.

return start, total_chunks


def process_sentence_chunks(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docsring describing what the function?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

logging.info(f'Index neighbors: {f.K}')
logging.info(f'Index chunk start id: {f.chunk_start_id}')
logging.info(f'Index chunk end id: {f.chunk_end_id}')
sys.exit(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a few more high level comments in the code below explaining the purpose of code blocks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have a high level comment about the script at the beginning of the file with some examples of how to use it.

@@ -105,7 +105,7 @@ def get_tokenizer(args):
logging.info(f'Data index has {data_ds.chunks} chunks')
logging.info(f'Retrieval Data index has {retrieval_ds.chunks} chunks')
logging.info(f'KNN index has {knn_index.K} neighbors')
assert knn_index.knn_map.max() < retrieval_ds.chunks
# assert knn_index.knn_map.max() < retrieval_ds.chunks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove comment if not needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@@ -68,7 +125,15 @@ def get_tokenizer(args):


def process_sentence_chunks(
ds: MMapRetrievalIndexedDataset, tokenizer, chunk_size: int, warm_up_size: int, percent: float
ds: MMapRetrievalIndexedDataset,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a docstring?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

position += 1


def dedup(chunk_id_to_range, I, tmp_neighbors, chunk_id_start, offset):
"""
deduplicate the KNN who are from the same document as the data chunks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to do this? Can we not have multiple neighbors from a given document? Or am I misunderstanding something here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is RETRO's trick to train on the same data that is used as retrieval data. We simply remove the neighbors from the same document as the input data chunk.

position += 1


def dedup(chunk_id_to_range, I, tmp_neighbors, chunk_id_start, offset):
"""
deduplicate the KNN who are from the same document as the data chunks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to do this? Can we not have multiple neighbors from a given document? Or am I misunderstanding something here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above

@@ -175,6 +264,8 @@ def get_emb():
default='bert-base-nli-mean-tokens',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yidong72 I thought you were now using all-mpnet-base-v2

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. this is just the default argument. When using this script, I set it to all-mpnet-base-v2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

stage-0: train on the dataset, example,

```python
python scripts/nlp_language_modeling/build_retrieval_index.py \
Copy link
Collaborator

@soumye soumye Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Training uses the entirety of the dataset right?

Copy link
Collaborator Author

@yidong72 yidong72 Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For training the Faiss index, we use a fraction of the chunks specified by train_index_size

Copy link
Collaborator

@soumye soumye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks great to me. I have added some comments.

)
parser.add_argument(
'--train_index_size', type=int, required=True, help='The number of sentences that is used to train the index',
'--train_index_size', type=int, required=False, help='The number of sentences that is used to train the index',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do you use the entirety or only a subset of data to train the index?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a subset of the data. It is impractical to use all the data as my dataset can be as large as 200billion tokens.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Makes sense. I am guessing it shouldn't even matter that much for performance.

Signed-off-by: Yi Dong <doyend@gmail.com>
@lgtm-com
Copy link

lgtm-com bot commented Sep 7, 2022

This pull request introduces 4 alerts when merging c978671 into d29a66b - view on LGTM.com

new alerts:

  • 2 for Module is imported with 'import' and 'import from'
  • 2 for Redundant assignment

Copy link
Collaborator

@michalivne michalivne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@yidong72 yidong72 merged commit 0d7c4bc into main Sep 7, 2022
@yidong72 yidong72 deleted the feature_shard_index branch September 7, 2022 14:37
jubick1337 pushed a commit to jubick1337/NeMo that referenced this pull request Oct 3, 2022
…A#4713)

* dump shape

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the readout layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the scaling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* save file by config

Signed-off-by: Yi Dong <yidong@nvidia.com>

* confirm tp works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added support of ckpt conversion

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add loss scale

Signed-off-by: Yi Dong <yidong@nvidia.com>

* padding only

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add comments, set normfactor in file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set it norm factor correctly

Signed-off-by: Yi Dong <yidong@nvidia.com>

* match the original one

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix weight init

Signed-off-by: Yi Dong <yidong@nvidia.com>

* hard code the output std

Signed-off-by: Yi Dong <yidong@nvidia.com>

* seq parallel base

Signed-off-by: Yi Dong <yidong@nvidia.com>

* recursively process the files

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add filter filter

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add the shape file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the inconsistence in training

Signed-off-by: Yi Dong <yidong@nvidia.com>

* register optimizers

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix LGTM

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added model training performance ci test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* escape properly

Signed-off-by: Yi Dong <yidong@nvidia.com>

* multiple line quote

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added python to ci test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* reduce accuracy check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the same data mapping

Signed-off-by: Yi Dong <yidong@nvidia.com>

* only test it on A100 gpu

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added license and use base yaml conf

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 0

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add time info

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix one bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* reduce nlist

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working stage 1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 2

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix one bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* time add index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* document data loading time

Signed-off-by: Yi Dong <yidong@nvidia.com>

* works with CPU only environment

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add doc string

Signed-off-by: Yi Dong <yidong@nvidia.com>

* handle gpu faiss

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added sharding knn index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added unit test for sharding index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* knn stage processing

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix shard gpu index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add the unit test for dedup logics

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix parallel jit

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added doc

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the sharded version

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix save checkpoint

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix id update

Signed-off-by: Yi Dong <yidong@nvidia.com>

* move to constructor;

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix the test

Signed-off-by: Yi Dong <doyend@gmail.com>

* make checkpoint activation work

Signed-off-by: Yi Dong <doyend@gmail.com>

* reduce duplicate

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix selective attention

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix the style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the max check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added comments

Signed-off-by: Yi Dong <doyend@gmail.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <doyend@gmail.com>
Co-authored-by: Yi Dong <doyend@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Matvei Novikov <mattyson.so@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
…A#4713)

* dump shape

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the readout layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the scaling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* save file by config

Signed-off-by: Yi Dong <yidong@nvidia.com>

* confirm tp works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added support of ckpt conversion

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add loss scale

Signed-off-by: Yi Dong <yidong@nvidia.com>

* padding only

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add comments, set normfactor in file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set it norm factor correctly

Signed-off-by: Yi Dong <yidong@nvidia.com>

* match the original one

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix weight init

Signed-off-by: Yi Dong <yidong@nvidia.com>

* hard code the output std

Signed-off-by: Yi Dong <yidong@nvidia.com>

* seq parallel base

Signed-off-by: Yi Dong <yidong@nvidia.com>

* recursively process the files

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add filter filter

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add the shape file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the inconsistence in training

Signed-off-by: Yi Dong <yidong@nvidia.com>

* register optimizers

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix LGTM

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added model training performance ci test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* escape properly

Signed-off-by: Yi Dong <yidong@nvidia.com>

* multiple line quote

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added python to ci test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* reduce accuracy check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the same data mapping

Signed-off-by: Yi Dong <yidong@nvidia.com>

* only test it on A100 gpu

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added license and use base yaml conf

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 0

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add time info

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix one bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* reduce nlist

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working stage 1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 2

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix one bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* time add index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* document data loading time

Signed-off-by: Yi Dong <yidong@nvidia.com>

* works with CPU only environment

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add doc string

Signed-off-by: Yi Dong <yidong@nvidia.com>

* handle gpu faiss

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added sharding knn index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added unit test for sharding index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* knn stage processing

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix shard gpu index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add the unit test for dedup logics

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix parallel jit

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added doc

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the sharded version

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix save checkpoint

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix id update

Signed-off-by: Yi Dong <yidong@nvidia.com>

* move to constructor;

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix the test

Signed-off-by: Yi Dong <doyend@gmail.com>

* make checkpoint activation work

Signed-off-by: Yi Dong <doyend@gmail.com>

* reduce duplicate

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix selective attention

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix the style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the max check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added comments

Signed-off-by: Yi Dong <doyend@gmail.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <doyend@gmail.com>
Co-authored-by: Yi Dong <doyend@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 29, 2022
…A#4713)

* dump shape

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the readout layer

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the scaling

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* save file by config

Signed-off-by: Yi Dong <yidong@nvidia.com>

* confirm tp works

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added support of ckpt conversion

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add loss scale

Signed-off-by: Yi Dong <yidong@nvidia.com>

* padding only

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add comments, set normfactor in file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* set it norm factor correctly

Signed-off-by: Yi Dong <yidong@nvidia.com>

* match the original one

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix weight init

Signed-off-by: Yi Dong <yidong@nvidia.com>

* hard code the output std

Signed-off-by: Yi Dong <yidong@nvidia.com>

* seq parallel base

Signed-off-by: Yi Dong <yidong@nvidia.com>

* recursively process the files

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add filter filter

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add the shape file

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the inconsistence in training

Signed-off-by: Yi Dong <yidong@nvidia.com>

* register optimizers

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix the style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix LGTM

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added model training performance ci test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* escape properly

Signed-off-by: Yi Dong <yidong@nvidia.com>

* multiple line quote

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added python to ci test

Signed-off-by: Yi Dong <yidong@nvidia.com>

* reduce accuracy check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the same data mapping

Signed-off-by: Yi Dong <yidong@nvidia.com>

* only test it on A100 gpu

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added license and use base yaml conf

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 0

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add time info

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix one bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* reduce nlist

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* working stage 1

Signed-off-by: Yi Dong <yidong@nvidia.com>

* stage 2

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix one bug

Signed-off-by: Yi Dong <yidong@nvidia.com>

* time add index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* document data loading time

Signed-off-by: Yi Dong <yidong@nvidia.com>

* works with CPU only environment

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add doc string

Signed-off-by: Yi Dong <yidong@nvidia.com>

* handle gpu faiss

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added sharding knn index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added unit test for sharding index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* knn stage processing

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix shard gpu index

Signed-off-by: Yi Dong <yidong@nvidia.com>

* add the unit test for dedup logics

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix parallel jit

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added doc

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* use the sharded version

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix save checkpoint

Signed-off-by: Yi Dong <yidong@nvidia.com>

* fix id update

Signed-off-by: Yi Dong <yidong@nvidia.com>

* move to constructor;

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix the test

Signed-off-by: Yi Dong <doyend@gmail.com>

* make checkpoint activation work

Signed-off-by: Yi Dong <doyend@gmail.com>

* reduce duplicate

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix selective attention

Signed-off-by: Yi Dong <doyend@gmail.com>

* fix the style

Signed-off-by: Yi Dong <yidong@nvidia.com>

* remove the max check

Signed-off-by: Yi Dong <yidong@nvidia.com>

* added comments

Signed-off-by: Yi Dong <doyend@gmail.com>

Signed-off-by: Yi Dong <yidong@nvidia.com>
Signed-off-by: Yi Dong <doyend@gmail.com>
Co-authored-by: Yi Dong <doyend@gmail.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants