RowFeatureIndex Optimization by polinabinder1 · Pull Request #531 · NVIDIA-BioNeMo/bionemo-framework

polinabinder1 · 2024-12-13T23:32:48Z

With this PR, if an identical dataframe is appended to RowFeatureIndex as the last one it, instead of storing it, the counter corresponding to the previous dataframe is incremented. This becomes an issue in very large datasets.

Previously, if we had dataframe A corresponding to row [0,2] and we wanted to add the same dataframe A corresponding to 4 more rows, we would store dataframe A twice, with the first copy corresponding to rows [0,2] and the second to [2,6].

Now, we would store dataframe A once and it would correspond to rows [0,6].

polinabinder1 · 2024-12-13T23:33:05Z

/build-ci

skothenhill-nv · 2024-12-14T00:37:17Z

Is the Megatron-LM change intentional? or do we need to do the git submodule update recursive thing. it shows a diff again main, which seems weird.

…ndex.py Co-authored-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: polinabinder1 <pbinder@nvidia.com>

polinabinder1 · 2024-12-16T22:06:31Z

/build-ci

polinabinder1 · 2024-12-16T23:08:59Z

/build-ci

polinabinder1 · 2024-12-16T23:30:45Z

/build-ci

polinabinder1 · 2024-12-17T01:41:05Z

/build-ci

polinabinder1 · 2024-12-17T19:41:36Z

/build-ci

polinabinder1 · 2024-12-17T19:48:23Z

/build-ci

not having unnecessary things

c0ae9ea

polinabinder1 marked this pull request as ready for review December 13, 2024 23:32

polinabinder1 requested review from jstjohn, malcolmgreaves, ohadmo, pstjohn, skothenhill-nv and trvachov as code owners December 13, 2024 23:32

skothenhill-nv reviewed Dec 14, 2024

View reviewed changes

Comment thread sub-packages/bionemo-scdl/tests/bionemo/scdl/index/test_row_feature_index.py

Merge branch 'main' into polinabinder/small_row_feat_index

0553fb3

pstjohn approved these changes Dec 16, 2024

View reviewed changes

Comment thread sub-packages/bionemo-scdl/src/bionemo/scdl/index/row_feature_index.py Outdated

Comment thread sub-packages/bionemo-scdl/tests/bionemo/scdl/index/test_row_feature_index.py Outdated

polinabinder1 and others added 3 commits December 16, 2024 13:45

correct megatron

7dc1f32

Update sub-packages/bionemo-scdl/src/bionemo/scdl/index/row_feature_i…

89873d1

…ndex.py Co-authored-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: polinabinder1 <pbinder@nvidia.com>

fix pytest

755a135

polinabinder1 added 3 commits December 16, 2024 14:12

re-run precommit

5f9b460

correct pre-config

6e88ead

correct function name

01c07f5

skothenhill-nv approved these changes Dec 16, 2024

View reviewed changes

polinabinder1 enabled auto-merge (squash) December 16, 2024 23:30

Merge branch 'main' into polinabinder/small_row_feat_index

0fd2e9e

test fix

b18d70b

polinabinder1 and others added 2 commits December 17, 2024 10:46

Merge branch 'main' into polinabinder/small_row_feat_index

f0dcad2

Drop subprocess test and test the core logic instead (#540)

492c00c

Merge branch 'main' into polinabinder/small_row_feat_index

d56b5ac

farhadrgh mentioned this pull request Dec 17, 2024

Bump 3rdparty/NeMo from 06e6703 to 06a1491 #538

Merged

polinabinder1 merged commit ff5ce98 into main Dec 17, 2024

polinabinder1 deleted the polinabinder/small_row_feat_index branch December 17, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RowFeatureIndex Optimization#531

RowFeatureIndex Optimization#531
polinabinder1 merged 13 commits into
mainfrom
polinabinder/small_row_feat_index

polinabinder1 commented Dec 13, 2024

Uh oh!

polinabinder1 commented Dec 13, 2024

Uh oh!

Uh oh!

skothenhill-nv commented Dec 14, 2024

Uh oh!

Uh oh!

Uh oh!

polinabinder1 commented Dec 16, 2024

Uh oh!

polinabinder1 commented Dec 16, 2024

Uh oh!

polinabinder1 commented Dec 16, 2024

Uh oh!

polinabinder1 commented Dec 17, 2024

Uh oh!

polinabinder1 commented Dec 17, 2024

Uh oh!

polinabinder1 commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

polinabinder1 commented Dec 13, 2024

Uh oh!

polinabinder1 commented Dec 13, 2024

Uh oh!

Uh oh!

skothenhill-nv commented Dec 14, 2024

Uh oh!

Uh oh!

Uh oh!

polinabinder1 commented Dec 16, 2024

Uh oh!

polinabinder1 commented Dec 16, 2024

Uh oh!

polinabinder1 commented Dec 16, 2024

Uh oh!

polinabinder1 commented Dec 17, 2024

Uh oh!

polinabinder1 commented Dec 17, 2024

Uh oh!

polinabinder1 commented Dec 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants