Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index regeneration on maintenance #2489

Merged
merged 118 commits into from Sep 22, 2023
Merged

Conversation

sounakr
Copy link
Contributor

@sounakr sounakr commented Jul 18, 2023

🚀 🚀 Pull Request

Impact

When vector indexes are created on top of datasets, then any changes to base dataset should be reflected in indexes also. This is done as part of Index maintenance.
Through this PR Index maintenance is implemented and triggered whenever their is a change in base dataset embedding tensor.

Description

This PR regenerates the vector indexes whenever the corresponding embedding tensor in base dataset gets modified.

Things to be aware of

Ideally the index maintenance is a incremental process. But with the starting approach with this PR the index is going to get regenerated which is time consuming.

Things to worry about

This is a slow approach but easy to implement. Taken this strategy as part of initial index maintenance. Improvements will follow for incremental maintenance.

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2023

CLA assistant check
All committers have signed the CLA.

@sounakr sounakr requested a review from khustup July 18, 2023 07:11
@sounakr sounakr changed the title [WIP] Index regeneration on maintenance Index regeneration on maintenance Jul 18, 2023
deeplake/core/vector_index/factory.py Outdated Show resolved Hide resolved
deeplake/core/vector_index/indexer.py Outdated Show resolved Hide resolved
deeplake/core/vector_index/mutable_indexer.py Outdated Show resolved Hide resolved
@istranic istranic self-requested a review September 21, 2023 19:46
@sonarcloud
Copy link

sonarcloud bot commented Sep 22, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 26 Code Smells

27.9% 27.9% Coverage
0.0% 0.0% Duplication

@sounakr sounakr merged commit 7b89516 into main Sep 22, 2023
9 of 14 checks passed
@sounakr sounakr deleted the index_regeneration_on_maintenance branch September 22, 2023 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants