Skip to content

RowFeatureIndex Optimization#531

Merged
polinabinder1 merged 13 commits into
mainfrom
polinabinder/small_row_feat_index
Dec 17, 2024
Merged

RowFeatureIndex Optimization#531
polinabinder1 merged 13 commits into
mainfrom
polinabinder/small_row_feat_index

Conversation

@polinabinder1
Copy link
Copy Markdown
Collaborator

With this PR, if an identical dataframe is appended to RowFeatureIndex as the last one it, instead of storing it, the counter corresponding to the previous dataframe is incremented. This becomes an issue in very large datasets.

Previously, if we had dataframe A corresponding to row [0,2] and we wanted to add the same dataframe A corresponding to 4 more rows, we would store dataframe A twice, with the first copy corresponding to rows [0,2] and the second to [2,6].

Now, we would store dataframe A once and it would correspond to rows [0,6].

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@skothenhill-nv
Copy link
Copy Markdown
Collaborator

Is the Megatron-LM change intentional? or do we need to do the git submodule update recursive thing. it shows a diff again main, which seems weird.

Comment thread sub-packages/bionemo-scdl/src/bionemo/scdl/index/row_feature_index.py Outdated
Comment thread sub-packages/bionemo-scdl/tests/bionemo/scdl/index/test_row_feature_index.py Outdated
polinabinder1 and others added 3 commits December 16, 2024 13:45
…ndex.py

Co-authored-by: Peter St. John <pstjohn@nvidia.com>
Signed-off-by: polinabinder1 <pbinder@nvidia.com>
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1 polinabinder1 enabled auto-merge (squash) December 16, 2024 23:30
@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1
Copy link
Copy Markdown
Collaborator Author

/build-ci

@polinabinder1 polinabinder1 merged commit ff5ce98 into main Dec 17, 2024
@polinabinder1 polinabinder1 deleted the polinabinder/small_row_feat_index branch December 17, 2024 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants