Skip to content

Conversation

GuyAv46
Copy link
Collaborator

@GuyAv46 GuyAv46 commented Jun 29, 2023

Describe the changes in the pull request

Refactor all distance functions to be templated by the residual of the dimension, so that each dimension gets a tailor-made implementation with the fewest condition checks possible and at the same time, having less code (one function for each architecture) to maintain.

The new implementation for each architecture uses mask loading of the given arch and therefore we don't have to handle "tails" of non-512-bit-multiple vectors. we handle the residual part first, so the mask loading can safely load some more data if the implementation requires it, and with the 512-bit chunks after. We assume we have enough owned data for the mask loading, so we give up optimizing dimensions smaller than 512-bit (16 for floats, 8 for doubles). This is fine because there is no too-big hit, and there are no real-life scenarios of small dimensions. We can implement spatial optimizations for small dimensions in the future.

Main objects this PR modified

  1. All distance functions.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@codecov
Copy link

codecov bot commented Jun 30, 2023

Codecov Report

Patch coverage: 97.35% and project coverage change: -0.81 ⚠️

Comparison is base (21d12f8) 96.28% compared to head (baf58c0) 95.48%.

Additional details and impacted files
@@                      Coverage Diff                       @@
##           feature_hnsw_vector_blocks     #395      +/-   ##
==============================================================
- Coverage                       96.28%   95.48%   -0.81%     
==============================================================
  Files                              67       66       -1     
  Lines                            4689     4120     -569     
==============================================================
- Hits                             4515     3934     -581     
- Misses                            174      186      +12     
Impacted Files Coverage Δ
src/VecSim/spaces/space_aux.cpp 40.00% <ø> (+6.66%) ⬆️
src/VecSim/spaces/spaces.cpp 50.00% <ø> (-30.56%) ⬇️
src/VecSim/spaces/IP/IP_AVX512_FP64.h 95.45% <95.45%> (ø)
src/VecSim/spaces/L2/L2_AVX512_FP64.h 95.83% <95.83%> (ø)
src/VecSim/spaces/IP/IP_AVX_FP32.h 96.15% <96.15%> (ø)
src/VecSim/spaces/IP/IP_AVX_FP64.h 96.15% <96.15%> (ø)
src/VecSim/spaces/L2/L2_AVX_FP64.h 96.29% <96.29%> (ø)
src/VecSim/spaces/L2/L2_AVX_FP32.h 96.42% <96.42%> (ø)
src/VecSim/spaces/IP/IP_SSE_FP64.h 96.55% <96.55%> (ø)
src/VecSim/spaces/L2/L2_SSE_FP64.h 96.66% <96.66%> (ø)
... and 9 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@GuyAv46 GuyAv46 force-pushed the guyav-optimize_dist_funcs branch from 5030493 to e9c7f93 Compare July 3, 2023 08:28
@GuyAv46 GuyAv46 changed the base branch from main to feature_hnsw_vector_blocks July 3, 2023 08:29
@GuyAv46 GuyAv46 requested a review from meiravgri July 3, 2023 08:30
@GuyAv46 GuyAv46 changed the title Optimize Distance Functions Optimize Distance Functions - [MOD-5434] Jul 3, 2023
@GuyAv46 GuyAv46 mentioned this pull request Jul 3, 2023
2 tasks
@GuyAv46 GuyAv46 marked this pull request as ready for review July 3, 2023 10:28
Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks very good and clean ! Well done
Please verify we still need all the compilation & hardware flags for DQ

@GuyAv46 GuyAv46 requested a review from DvirDukhan July 7, 2023 09:02
Comment on lines +99 to 103
ASSERT_EQ(L2_FP32_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP32_L2Sqr);
ASSERT_EQ(L2_FP64_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP64_L2Sqr);
ASSERT_EQ(IP_FP32_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP32_InnerProduct);
ASSERT_EQ(IP_FP64_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP64_InnerProduct);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will always work?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we make some optimization for dim<8/16

#pragma once

/*
* Include this file after done using the implementation chooser.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or else?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably nothing unless more X definitions will be presented

Comment on lines +13 to +14
* We assume that we are dealing with 512-bit blocks, so we define a chunk size of 16 for 32-bit
* elements, and a chunk size of 8 for 64-bit elements. The main macro is CHOOSE_IMPLEMENTATION, and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are assuming this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the way the functions were implemented in the first place. we can consider changing that but today it simply means X2 more condition checks for AVX and X4 for SSE. also it makes choosing the implementation simpler

Comment on lines +24 to +30
// Macro for 4 cases. Used to collapse the switch statement. For a given N, expands to 4 X macros
// of 4N, 4N+1, 4N+2, 4N+3.
#define C4(X, func, N) X(4 * N, func) X(4 * N + 1, func) X(4 * N + 2, func) X(4 * N + 3, func)

// Macros for 8 and 16 cases. Used to collapse the switch statement. Expands into 0-15 or 0-7 cases.
#define CASES16(X, func) C4(X, func, 0) C4(X, func, 1) C4(X, func, 2) C4(X, func, 3)
#define CASES8(X, func) C4(X, func, 0) C4(X, func, 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use C8 instead of C4

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This was done for readability (so each definition will fit one line). I can change that

target_link_libraries(VectorSimilaritySpaces cpu_features)
endif()

if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "(x86_64)|(AMD64|amd64)|(^i.86$)")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the removal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as all the functions are now templated and written in H files, we no longer have files that are compiled only sometimes

#include "space_includes.h"

template <__mmask8 mask> // (2^n)-1, where n is in 1..7 (1, 4, ..., 127)
static inline __m256 my_mm256_maskz_loadu_ps(const float *p) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need for those

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make them not templated. Probably won't affect the performance

@GuyAv46 GuyAv46 merged commit 2b3d0cf into feature_hnsw_vector_blocks Jul 12, 2023
@GuyAv46 GuyAv46 deleted the guyav-optimize_dist_funcs branch July 12, 2023 10:33
@GuyAv46 GuyAv46 restored the guyav-optimize_dist_funcs branch July 12, 2023 10:33
GuyAv46 added a commit that referenced this pull request Jul 12, 2023
* HNSW refactor 3 - [MOD-5302] (#389)

* moved vector_block

* make DataBlock available to use in vectors

* some general improvements

* implement data blocks in HNSW

* disabled serializer benchmarks and some tests

* more disable

* enabled serialization (for current implementation)

* added prefetch for range

* move generic meta of element to a separated vector

* added prefetch for metadata

* enabled save and load from bindings

* [TMP] return of the old HNSW
REVERT ME

* fix bm

* some changes in prefetch

* shortened BM

* reverted prefetches to by as before

* shorten BM

* small improvement for range

* packing structs

* unrelated performance improvement

* fix

* revert adding origin hnsw,
remove support for v1 and v2 serialization

* update for BM file

* some fixes and test updates

* [TEMP] disable 2 tests so coverage will run

* fix flow test

* more test fixes

* improved tests

* fix for bach iterator scan (needs a benchmark)

* for benchmark

* reverting some temporary changes

* more reverting

* fix for clang

* file name update

* another prefetch option

* few improvements

* some more trying

* final change

* another update to all but `hnsw.h`

* returned `increaseCapacity` responsibility to `addVector`

* make `hnsw.h` use blocks

* BF comment fix

* comment fix

* fix some tests

* fixed hnsw tests

* fixed hnsw-multi tests

* fixed almost all tiered HNSW tests

* Fix memory bookkeeping tests

* Fix memory bookkeeping tests 2

* fixed estimations and their tests

* review fixes

* fix review fix

* more review fixes

* move rounding up of initial capacity to a static function

* added comments on data blocks

* some optimizations (reduce the use of `getDataByInternalId`)

---------

Co-authored-by: alon <alonreshef24@gmail.com>

* HNSW blocks refactor - add lock to graph data struct (#390)

* Improved serializing code - [MOD-5372] (#391)

* improved serializing code

* review fixes

* HNSW Refactor - benchmarks - [MOD-5371] (#392)

* update benchmark files

* updated wget links

* publish serialization script

* another benchmark cleanup iteration

* review fixes

* Renaming "meta" variables (#394)

* renaming "meta" variables

* revert temp change

* Optimize Distance Functions - [MOD-5434] (#395)

* initial templated with masks implementations

* format

* tidy up

* enabled spaces tests back

* changed template type and handle residual first

* re-enabled benchmarks (keeping old names)

* download fix

* improved unit testing

* improved spaces benchmarks

* verify correctness

* some cleanup

* give up optimizing dim<16 for safety

* aligned serialization links

* added lots of comments

* added a test and small fix

* include opts only on x86 machines

* remove AVX512DQ references from the project (not in use)

* rename qty to dimension

* Update AVX_utils.h comments

* Optimize - implement align allocation for vector alignment - [MOD-5433] (#399)

* aligning query vector

* implement aligned allocation

* added alignment hing to VecSimIndexAbstract, used it in block allocation

* test fix

* review fixes

* set default value to the alignment hint (1 - any address is valid)

* refactor allocation header to have alignment flag, unify free function

* use alignment only on vector blocks

* changed default alignment value (0)

* updated tests

* added missing break

* improved comment

* removed alignment from allocator test
GuyAv46 added a commit that referenced this pull request Jul 12, 2023
* HNSW refactor 3 - [MOD-5302] (#389)

* moved vector_block

* make DataBlock available to use in vectors

* some general improvements

* implement data blocks in HNSW

* disabled serializer benchmarks and some tests

* more disable

* enabled serialization (for current implementation)

* added prefetch for range

* move generic meta of element to a separated vector

* added prefetch for metadata

* enabled save and load from bindings

* [TMP] return of the old HNSW
REVERT ME

* fix bm

* some changes in prefetch

* shortened BM

* reverted prefetches to by as before

* shorten BM

* small improvement for range

* packing structs

* unrelated performance improvement

* fix

* revert adding origin hnsw,
remove support for v1 and v2 serialization

* update for BM file

* some fixes and test updates

* [TEMP] disable 2 tests so coverage will run

* fix flow test

* more test fixes

* improved tests

* fix for bach iterator scan (needs a benchmark)

* for benchmark

* reverting some temporary changes

* more reverting

* fix for clang

* file name update

* another prefetch option

* few improvements

* some more trying

* final change

* another update to all but `hnsw.h`

* returned `increaseCapacity` responsibility to `addVector`

* make `hnsw.h` use blocks

* BF comment fix

* comment fix

* fix some tests

* fixed hnsw tests

* fixed hnsw-multi tests

* fixed almost all tiered HNSW tests

* Fix memory bookkeeping tests

* Fix memory bookkeeping tests 2

* fixed estimations and their tests

* review fixes

* fix review fix

* more review fixes

* move rounding up of initial capacity to a static function

* added comments on data blocks

* some optimizations (reduce the use of `getDataByInternalId`)

---------

Co-authored-by: alon <alonreshef24@gmail.com>

* HNSW blocks refactor - add lock to graph data struct (#390)

* Improved serializing code - [MOD-5372] (#391)

* improved serializing code

* review fixes

* HNSW Refactor - benchmarks - [MOD-5371] (#392)

* update benchmark files

* updated wget links

* publish serialization script

* another benchmark cleanup iteration

* review fixes

* Renaming "meta" variables (#394)

* renaming "meta" variables

* revert temp change

* Optimize Distance Functions - [MOD-5434] (#395)

* initial templated with masks implementations

* format

* tidy up

* enabled spaces tests back

* changed template type and handle residual first

* re-enabled benchmarks (keeping old names)

* download fix

* improved unit testing

* improved spaces benchmarks

* verify correctness

* some cleanup

* give up optimizing dim<16 for safety

* aligned serialization links

* added lots of comments

* added a test and small fix

* include opts only on x86 machines

* remove AVX512DQ references from the project (not in use)

* rename qty to dimension

* Update AVX_utils.h comments

* Optimize - implement align allocation for vector alignment - [MOD-5433] (#399)

* aligning query vector

* implement aligned allocation

* added alignment hing to VecSimIndexAbstract, used it in block allocation

* test fix

* review fixes

* set default value to the alignment hint (1 - any address is valid)

* refactor allocation header to have alignment flag, unify free function

* use alignment only on vector blocks

* changed default alignment value (0)

* updated tests

* added missing break

* improved comment

* removed alignment from allocator test
GuyAv46 added a commit that referenced this pull request Jul 12, 2023
* HNSW refactor 3 - [MOD-5302] (#389)

* moved vector_block

* make DataBlock available to use in vectors

* some general improvements

* implement data blocks in HNSW

* disabled serializer benchmarks and some tests

* more disable

* enabled serialization (for current implementation)

* added prefetch for range

* move generic meta of element to a separated vector

* added prefetch for metadata

* enabled save and load from bindings

* [TMP] return of the old HNSW
REVERT ME

* fix bm

* some changes in prefetch

* shortened BM

* reverted prefetches to by as before

* shorten BM

* small improvement for range

* packing structs

* unrelated performance improvement

* fix

* revert adding origin hnsw,
remove support for v1 and v2 serialization

* update for BM file

* some fixes and test updates

* [TEMP] disable 2 tests so coverage will run

* fix flow test

* more test fixes

* improved tests

* fix for bach iterator scan (needs a benchmark)

* for benchmark

* reverting some temporary changes

* more reverting

* fix for clang

* file name update

* another prefetch option

* few improvements

* some more trying

* final change

* another update to all but `hnsw.h`

* returned `increaseCapacity` responsibility to `addVector`

* make `hnsw.h` use blocks

* BF comment fix

* comment fix

* fix some tests

* fixed hnsw tests

* fixed hnsw-multi tests

* fixed almost all tiered HNSW tests

* Fix memory bookkeeping tests

* Fix memory bookkeeping tests 2

* fixed estimations and their tests

* review fixes

* fix review fix

* more review fixes

* move rounding up of initial capacity to a static function

* added comments on data blocks

* some optimizations (reduce the use of `getDataByInternalId`)

---------

Co-authored-by: alon <alonreshef24@gmail.com>

* HNSW blocks refactor - add lock to graph data struct (#390)

* Improved serializing code - [MOD-5372] (#391)

* improved serializing code

* review fixes

* HNSW Refactor - benchmarks - [MOD-5371] (#392)

* update benchmark files

* updated wget links

* publish serialization script

* another benchmark cleanup iteration

* review fixes

* Renaming "meta" variables (#394)

* renaming "meta" variables

* revert temp change

* Optimize Distance Functions - [MOD-5434] (#395)

* initial templated with masks implementations

* format

* tidy up

* enabled spaces tests back

* changed template type and handle residual first

* re-enabled benchmarks (keeping old names)

* download fix

* improved unit testing

* improved spaces benchmarks

* verify correctness

* some cleanup

* give up optimizing dim<16 for safety

* aligned serialization links

* added lots of comments

* added a test and small fix

* include opts only on x86 machines

* remove AVX512DQ references from the project (not in use)

* rename qty to dimension

* Update AVX_utils.h comments

* Optimize - implement align allocation for vector alignment - [MOD-5433] (#399)

* aligning query vector

* implement aligned allocation

* added alignment hing to VecSimIndexAbstract, used it in block allocation

* test fix

* review fixes

* set default value to the alignment hint (1 - any address is valid)

* refactor allocation header to have alignment flag, unify free function

* use alignment only on vector blocks

* changed default alignment value (0)

* updated tests

* added missing break

* improved comment

* removed alignment from allocator test
@GuyAv46 GuyAv46 deleted the guyav-optimize_dist_funcs branch July 18, 2023 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants