Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

resizing to align blocksize #177

Merged
merged 28 commits into from
Aug 1, 2022
Merged

resizing to align blocksize #177

merged 28 commits into from
Aug 1, 2022

Conversation

meiravgri
Copy link
Collaborator

@meiravgri meiravgri commented Jul 27, 2022

Changes made in this PR:

1. in HNSW index

  • add: When exceeding capacity, it is increased and aligned with vector block size
  • delete: if we free an entire vector block, and the size is at least one block size smaller than the capacity, the capacity is
    aligned to block size and decreased by block size.
  • check if label exist in hnsw_wrapper, before calling hnswlib add or delete

2. in brute force index:

  • add: When exceeding capacity (no more empty cells in idToVectorMemeberMapping) , it is increased and aligned with
    vector block size
  • no resizing down upon deleteion because in the current logic, idToVectorMemeberMapping is not continous.

@codecov
Copy link

codecov bot commented Jul 27, 2022

Codecov Report

Merging #177 (93df064) into main (4d6a240) will increase coverage by 0.14%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #177      +/-   ##
==========================================
+ Coverage   93.54%   93.68%   +0.14%     
==========================================
  Files          40       40              
  Lines        2338     2344       +6     
==========================================
+ Hits         2187     2196       +9     
+ Misses        151      148       -3     
Impacted Files Coverage Δ
src/VecSim/algorithms/brute_force/brute_force.cpp 100.00% <100.00%> (+0.45%) ⬆️
src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp 97.40% <100.00%> (+0.12%) ⬆️
src/VecSim/algorithms/hnsw/hnswlib.h 97.22% <100.00%> (+0.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d6a240...93df064. Read the comment docs.

…leteVector in case an entire block is removed, the capacity is aligned to blocksize and a decreased by block size, added test for resize cases in hnswlib
@meiravgri meiravgri changed the title modified resize logic of BF addvector so that the ids2vectors mapping… resizing to align blocksize Jul 27, 2022
Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good first PR!! :)
I left some comments and questions

src/VecSim/algorithms/brute_force/brute_force.cpp Outdated Show resolved Hide resolved
src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp Outdated Show resolved Hide resolved
src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp Outdated Show resolved Hide resolved
tests/unit/test_bruteforce.cpp Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments

src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp Outdated Show resolved Hide resolved
tests/unit/test_bruteforce.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
meiravgri and others added 5 commits July 31, 2022 11:07
Co-authored-by: alonre24 <alonreshef24@gmail.com>
Co-authored-by: alonre24 <alonreshef24@gmail.com>
Co-authored-by: alonre24 <alonreshef24@gmail.com>
alonre24
alonre24 previously approved these changes Jul 31, 2022
Copy link
Collaborator

@DvirDukhan DvirDukhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

  1. Please edit the PR description to describe your changes
  2. please add a test where capacity is dropped to 0. If it is already covered by one of the existing tests, just mention it.

src/VecSim/algorithms/brute_force/brute_force.cpp Outdated Show resolved Hide resolved
size_t extra_space_to_free = curr_capacity % blockSize;

// Remove one block from the capacity. TODO check that we dont go below zero
this->hnsw->resizeIndex(curr_capacity - blockSize - extra_space_to_free);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this TODO?

tests/unit/test_hnswlib.cpp Outdated Show resolved Hide resolved
Co-authored-by: DvirDukhan <dvir@redis.com>
@meiravgri meiravgri merged commit 68129ce into main Aug 1, 2022
@meiravgri meiravgri deleted the update_resize_logic branch August 1, 2022 06:08
meiravgri added a commit that referenced this pull request Aug 1, 2022
* modified resize logic of BF addvector so that the ids2vectors mapping size fits blockSize

* fixed increasing count in case od re-using id, update brute_force_reindexing_same_vector test

* clang format

* HNSWIndex::addVector aligns the capacity to block size, HNSWIndex::deleteVector in case an entire block is removed, the capacity is aligned to blocksize and a decreased by block size, added test for resize cases in hnswlib

* make format

* update size estimation test

* fixed after Alon's review

* clang format

* cahnged hnsw::remove point to return false in case not found,
also changed HNSWIndex::delete vector to return false in this case (before checking if resizing is required) - this covers (hopedully) also empty index case and resizing only when actual deletion happend

* added tests for empty HNSW index

* added isLabelExists to hnswlib and modified test of id overrrides

* hnsw::removepoint return void, checking if label exists happens only in the wrrapper

* clang format

* update test to removed checking if element count is bigger than max_ellements from addPoint (cant happen, we check this in addVector)

* override test intial size

* remove if exists in addvector update

* Update src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* removed redudent tests after alon's review

* fixed tests

* Update src/VecSim/algorithms/brute_force/brute_force.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* empty index and capacity = 0  tests. removed todo from hnsw_wrapper

Co-authored-by: alonre24 <alonreshef24@gmail.com>
Co-authored-by: DvirDukhan <dvir@redis.com>
meiravgri added a commit that referenced this pull request Aug 1, 2022
* modified resize logic of BF addvector so that the ids2vectors mapping size fits blockSize

* fixed increasing count in case od re-using id, update brute_force_reindexing_same_vector test

* clang format

* HNSWIndex::addVector aligns the capacity to block size, HNSWIndex::deleteVector in case an entire block is removed, the capacity is aligned to blocksize and a decreased by block size, added test for resize cases in hnswlib

* make format

* update size estimation test

* fixed after Alon's review

* clang format

* cahnged hnsw::remove point to return false in case not found,
also changed HNSWIndex::delete vector to return false in this case (before checking if resizing is required) - this covers (hopedully) also empty index case and resizing only when actual deletion happend

* added tests for empty HNSW index

* added isLabelExists to hnswlib and modified test of id overrrides

* hnsw::removepoint return void, checking if label exists happens only in the wrrapper

* clang format

* update test to removed checking if element count is bigger than max_ellements from addPoint (cant happen, we check this in addVector)

* override test intial size

* remove if exists in addvector update

* Update src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* removed redudent tests after alon's review

* fixed tests

* Update src/VecSim/algorithms/brute_force/brute_force.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* empty index and capacity = 0  tests. removed todo from hnsw_wrapper

Co-authored-by: alonre24 <alonreshef24@gmail.com>
Co-authored-by: DvirDukhan <dvir@redis.com>
meiravgri added a commit that referenced this pull request Aug 1, 2022
* modified resize logic of BF addvector so that the ids2vectors mapping size fits blockSize

* fixed increasing count in case od re-using id, update brute_force_reindexing_same_vector test

* clang format

* HNSWIndex::addVector aligns the capacity to block size, HNSWIndex::deleteVector in case an entire block is removed, the capacity is aligned to blocksize and a decreased by block size, added test for resize cases in hnswlib

* make format

* update size estimation test

* fixed after Alon's review

* clang format

* cahnged hnsw::remove point to return false in case not found,
also changed HNSWIndex::delete vector to return false in this case (before checking if resizing is required) - this covers (hopedully) also empty index case and resizing only when actual deletion happend

* added tests for empty HNSW index

* added isLabelExists to hnswlib and modified test of id overrrides

* hnsw::removepoint return void, checking if label exists happens only in the wrrapper

* clang format

* update test to removed checking if element count is bigger than max_ellements from addPoint (cant happen, we check this in addVector)

* override test intial size

* remove if exists in addvector update

* Update src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* removed redudent tests after alon's review

* fixed tests

* Update src/VecSim/algorithms/brute_force/brute_force.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* empty index and capacity = 0  tests. removed todo from hnsw_wrapper

Co-authored-by: alonre24 <alonreshef24@gmail.com>
Co-authored-by: DvirDukhan <dvir@redis.com>
meiravgri added a commit that referenced this pull request Aug 1, 2022
* modified resize logic of BF addvector so that the ids2vectors mapping size fits blockSize

* fixed increasing count in case od re-using id, update brute_force_reindexing_same_vector test

* clang format

* HNSWIndex::addVector aligns the capacity to block size, HNSWIndex::deleteVector in case an entire block is removed, the capacity is aligned to blocksize and a decreased by block size, added test for resize cases in hnswlib

* make format

* update size estimation test

* fixed after Alon's review

* clang format

* cahnged hnsw::remove point to return false in case not found,
also changed HNSWIndex::delete vector to return false in this case (before checking if resizing is required) - this covers (hopedully) also empty index case and resizing only when actual deletion happend

* added tests for empty HNSW index

* added isLabelExists to hnswlib and modified test of id overrrides

* hnsw::removepoint return void, checking if label exists happens only in the wrrapper

* clang format

* update test to removed checking if element count is bigger than max_ellements from addPoint (cant happen, we check this in addVector)

* override test intial size

* remove if exists in addvector update

* Update src/VecSim/algorithms/hnsw/hnsw_wrapper.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: alonre24 <alonreshef24@gmail.com>

* removed redudent tests after alon's review

* fixed tests

* Update src/VecSim/algorithms/brute_force/brute_force.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* Update tests/unit/test_hnswlib.cpp

Co-authored-by: DvirDukhan <dvir@redis.com>

* empty index and capacity = 0  tests. removed todo from hnsw_wrapper

Co-authored-by: alonre24 <alonreshef24@gmail.com>
Co-authored-by: DvirDukhan <dvir@redis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants