Skip to content

Conversation

GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Jun 27, 2023

Describe the changes in the pull request

Improve serialization logic for the new HNSW implementation

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@GuyAv46 GuyAv46 requested a review from alonre24 June 27, 2023 04:38
Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one cosmetic comment. Also, do we have tests that are using the new serialization methods?

@GuyAv46 GuyAv46 requested a review from alonre24 June 27, 2023 08:36
Base automatically changed from add_lock_to_meta_data to feature_hnsw_vector_blocks June 27, 2023 08:38
@GuyAv46
Copy link
Collaborator Author

GuyAv46 commented Jun 27, 2023

We have HNSWSerializationCurrentVersion that creates, dumps and reloads the index

@GuyAv46 GuyAv46 force-pushed the guyav-hnsw_refactor_serializer branch from cea205c to 914e8ce Compare June 27, 2023 08:40
@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

❗ No coverage uploaded for pull request base (feature_hnsw_vector_blocks@7a8ce87). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@                      Coverage Diff                      @@
##             feature_hnsw_vector_blocks     #391   +/-   ##
=============================================================
  Coverage                              ?   96.26%           
=============================================================
  Files                                 ?       67           
  Lines                                 ?     4687           
  Branches                              ?        0           
=============================================================
  Hits                                  ?     4512           
  Misses                                ?      175           
  Partials                              ?        0           

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@GuyAv46 GuyAv46 merged commit ad526c2 into feature_hnsw_vector_blocks Jun 27, 2023
@GuyAv46 GuyAv46 deleted the guyav-hnsw_refactor_serializer branch June 27, 2023 11:12
GuyAv46 added a commit that referenced this pull request Jul 12, 2023
* HNSW refactor 3 - [MOD-5302] (#389)

* moved vector_block

* make DataBlock available to use in vectors

* some general improvements

* implement data blocks in HNSW

* disabled serializer benchmarks and some tests

* more disable

* enabled serialization (for current implementation)

* added prefetch for range

* move generic meta of element to a separated vector

* added prefetch for metadata

* enabled save and load from bindings

* [TMP] return of the old HNSW
REVERT ME

* fix bm

* some changes in prefetch

* shortened BM

* reverted prefetches to by as before

* shorten BM

* small improvement for range

* packing structs

* unrelated performance improvement

* fix

* revert adding origin hnsw,
remove support for v1 and v2 serialization

* update for BM file

* some fixes and test updates

* [TEMP] disable 2 tests so coverage will run

* fix flow test

* more test fixes

* improved tests

* fix for bach iterator scan (needs a benchmark)

* for benchmark

* reverting some temporary changes

* more reverting

* fix for clang

* file name update

* another prefetch option

* few improvements

* some more trying

* final change

* another update to all but `hnsw.h`

* returned `increaseCapacity` responsibility to `addVector`

* make `hnsw.h` use blocks

* BF comment fix

* comment fix

* fix some tests

* fixed hnsw tests

* fixed hnsw-multi tests

* fixed almost all tiered HNSW tests

* Fix memory bookkeeping tests

* Fix memory bookkeeping tests 2

* fixed estimations and their tests

* review fixes

* fix review fix

* more review fixes

* move rounding up of initial capacity to a static function

* added comments on data blocks

* some optimizations (reduce the use of `getDataByInternalId`)

---------

Co-authored-by: alon <alonreshef24@gmail.com>

* HNSW blocks refactor - add lock to graph data struct (#390)

* Improved serializing code - [MOD-5372] (#391)

* improved serializing code

* review fixes

* HNSW Refactor - benchmarks - [MOD-5371] (#392)

* update benchmark files

* updated wget links

* publish serialization script

* another benchmark cleanup iteration

* review fixes

* Renaming "meta" variables (#394)

* renaming "meta" variables

* revert temp change

* Optimize Distance Functions - [MOD-5434] (#395)

* initial templated with masks implementations

* format

* tidy up

* enabled spaces tests back

* changed template type and handle residual first

* re-enabled benchmarks (keeping old names)

* download fix

* improved unit testing

* improved spaces benchmarks

* verify correctness

* some cleanup

* give up optimizing dim<16 for safety

* aligned serialization links

* added lots of comments

* added a test and small fix

* include opts only on x86 machines

* remove AVX512DQ references from the project (not in use)

* rename qty to dimension

* Update AVX_utils.h comments

* Optimize - implement align allocation for vector alignment - [MOD-5433] (#399)

* aligning query vector

* implement aligned allocation

* added alignment hing to VecSimIndexAbstract, used it in block allocation

* test fix

* review fixes

* set default value to the alignment hint (1 - any address is valid)

* refactor allocation header to have alignment flag, unify free function

* use alignment only on vector blocks

* changed default alignment value (0)

* updated tests

* added missing break

* improved comment

* removed alignment from allocator test
GuyAv46 added a commit that referenced this pull request Jul 12, 2023
* HNSW refactor 3 - [MOD-5302] (#389)

* moved vector_block

* make DataBlock available to use in vectors

* some general improvements

* implement data blocks in HNSW

* disabled serializer benchmarks and some tests

* more disable

* enabled serialization (for current implementation)

* added prefetch for range

* move generic meta of element to a separated vector

* added prefetch for metadata

* enabled save and load from bindings

* [TMP] return of the old HNSW
REVERT ME

* fix bm

* some changes in prefetch

* shortened BM

* reverted prefetches to by as before

* shorten BM

* small improvement for range

* packing structs

* unrelated performance improvement

* fix

* revert adding origin hnsw,
remove support for v1 and v2 serialization

* update for BM file

* some fixes and test updates

* [TEMP] disable 2 tests so coverage will run

* fix flow test

* more test fixes

* improved tests

* fix for bach iterator scan (needs a benchmark)

* for benchmark

* reverting some temporary changes

* more reverting

* fix for clang

* file name update

* another prefetch option

* few improvements

* some more trying

* final change

* another update to all but `hnsw.h`

* returned `increaseCapacity` responsibility to `addVector`

* make `hnsw.h` use blocks

* BF comment fix

* comment fix

* fix some tests

* fixed hnsw tests

* fixed hnsw-multi tests

* fixed almost all tiered HNSW tests

* Fix memory bookkeeping tests

* Fix memory bookkeeping tests 2

* fixed estimations and their tests

* review fixes

* fix review fix

* more review fixes

* move rounding up of initial capacity to a static function

* added comments on data blocks

* some optimizations (reduce the use of `getDataByInternalId`)

---------

Co-authored-by: alon <alonreshef24@gmail.com>

* HNSW blocks refactor - add lock to graph data struct (#390)

* Improved serializing code - [MOD-5372] (#391)

* improved serializing code

* review fixes

* HNSW Refactor - benchmarks - [MOD-5371] (#392)

* update benchmark files

* updated wget links

* publish serialization script

* another benchmark cleanup iteration

* review fixes

* Renaming "meta" variables (#394)

* renaming "meta" variables

* revert temp change

* Optimize Distance Functions - [MOD-5434] (#395)

* initial templated with masks implementations

* format

* tidy up

* enabled spaces tests back

* changed template type and handle residual first

* re-enabled benchmarks (keeping old names)

* download fix

* improved unit testing

* improved spaces benchmarks

* verify correctness

* some cleanup

* give up optimizing dim<16 for safety

* aligned serialization links

* added lots of comments

* added a test and small fix

* include opts only on x86 machines

* remove AVX512DQ references from the project (not in use)

* rename qty to dimension

* Update AVX_utils.h comments

* Optimize - implement align allocation for vector alignment - [MOD-5433] (#399)

* aligning query vector

* implement aligned allocation

* added alignment hing to VecSimIndexAbstract, used it in block allocation

* test fix

* review fixes

* set default value to the alignment hint (1 - any address is valid)

* refactor allocation header to have alignment flag, unify free function

* use alignment only on vector blocks

* changed default alignment value (0)

* updated tests

* added missing break

* improved comment

* removed alignment from allocator test
GuyAv46 added a commit that referenced this pull request Jul 12, 2023
* HNSW refactor 3 - [MOD-5302] (#389)

* moved vector_block

* make DataBlock available to use in vectors

* some general improvements

* implement data blocks in HNSW

* disabled serializer benchmarks and some tests

* more disable

* enabled serialization (for current implementation)

* added prefetch for range

* move generic meta of element to a separated vector

* added prefetch for metadata

* enabled save and load from bindings

* [TMP] return of the old HNSW
REVERT ME

* fix bm

* some changes in prefetch

* shortened BM

* reverted prefetches to by as before

* shorten BM

* small improvement for range

* packing structs

* unrelated performance improvement

* fix

* revert adding origin hnsw,
remove support for v1 and v2 serialization

* update for BM file

* some fixes and test updates

* [TEMP] disable 2 tests so coverage will run

* fix flow test

* more test fixes

* improved tests

* fix for bach iterator scan (needs a benchmark)

* for benchmark

* reverting some temporary changes

* more reverting

* fix for clang

* file name update

* another prefetch option

* few improvements

* some more trying

* final change

* another update to all but `hnsw.h`

* returned `increaseCapacity` responsibility to `addVector`

* make `hnsw.h` use blocks

* BF comment fix

* comment fix

* fix some tests

* fixed hnsw tests

* fixed hnsw-multi tests

* fixed almost all tiered HNSW tests

* Fix memory bookkeeping tests

* Fix memory bookkeeping tests 2

* fixed estimations and their tests

* review fixes

* fix review fix

* more review fixes

* move rounding up of initial capacity to a static function

* added comments on data blocks

* some optimizations (reduce the use of `getDataByInternalId`)

---------

Co-authored-by: alon <alonreshef24@gmail.com>

* HNSW blocks refactor - add lock to graph data struct (#390)

* Improved serializing code - [MOD-5372] (#391)

* improved serializing code

* review fixes

* HNSW Refactor - benchmarks - [MOD-5371] (#392)

* update benchmark files

* updated wget links

* publish serialization script

* another benchmark cleanup iteration

* review fixes

* Renaming "meta" variables (#394)

* renaming "meta" variables

* revert temp change

* Optimize Distance Functions - [MOD-5434] (#395)

* initial templated with masks implementations

* format

* tidy up

* enabled spaces tests back

* changed template type and handle residual first

* re-enabled benchmarks (keeping old names)

* download fix

* improved unit testing

* improved spaces benchmarks

* verify correctness

* some cleanup

* give up optimizing dim<16 for safety

* aligned serialization links

* added lots of comments

* added a test and small fix

* include opts only on x86 machines

* remove AVX512DQ references from the project (not in use)

* rename qty to dimension

* Update AVX_utils.h comments

* Optimize - implement align allocation for vector alignment - [MOD-5433] (#399)

* aligning query vector

* implement aligned allocation

* added alignment hing to VecSimIndexAbstract, used it in block allocation

* test fix

* review fixes

* set default value to the alignment hint (1 - any address is valid)

* refactor allocation header to have alignment flag, unify free function

* use alignment only on vector blocks

* changed default alignment value (0)

* updated tests

* added missing break

* improved comment

* removed alignment from allocator test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants