Optimize Distance Functions - [MOD-5434] #395

GuyAv46 · 2023-06-29T17:24:45Z

Describe the changes in the pull request

Refactor all distance functions to be templated by the residual of the dimension, so that each dimension gets a tailor-made implementation with the fewest condition checks possible and at the same time, having less code (one function for each architecture) to maintain.

The new implementation for each architecture uses mask loading of the given arch and therefore we don't have to handle "tails" of non-512-bit-multiple vectors. we handle the residual part first, so the mask loading can safely load some more data if the implementation requires it, and with the 512-bit chunks after. We assume we have enough owned data for the mask loading, so we give up optimizing dimensions smaller than 512-bit (16 for floats, 8 for doubles). This is fine because there is no too-big hit, and there are no real-life scenarios of small dimensions. We can implement spatial optimizations for small dimensions in the future.

Main objects this PR modified

All distance functions.

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

codecov · 2023-06-30T23:28:45Z

Codecov Report

Patch coverage: 97.35% and project coverage change: -0.81 ⚠️

Comparison is base (21d12f8) 96.28% compared to head (baf58c0) 95.48%.

Additional details and impacted files

@@                      Coverage Diff                       @@
##           feature_hnsw_vector_blocks     #395      +/-   ##
==============================================================
- Coverage                       96.28%   95.48%   -0.81%     
==============================================================
  Files                              67       66       -1     
  Lines                            4689     4120     -569     
==============================================================
- Hits                             4515     3934     -581     
- Misses                            174      186      +12

Impacted Files	Coverage Δ
src/VecSim/spaces/space_aux.cpp	`40.00% <ø> (+6.66%)`	⬆️
src/VecSim/spaces/spaces.cpp	`50.00% <ø> (-30.56%)`	⬇️
src/VecSim/spaces/IP/IP_AVX512_FP64.h	`95.45% <95.45%> (ø)`
src/VecSim/spaces/L2/L2_AVX512_FP64.h	`95.83% <95.83%> (ø)`
src/VecSim/spaces/IP/IP_AVX_FP32.h	`96.15% <96.15%> (ø)`
src/VecSim/spaces/IP/IP_AVX_FP64.h	`96.15% <96.15%> (ø)`
src/VecSim/spaces/L2/L2_AVX_FP64.h	`96.29% <96.29%> (ø)`
src/VecSim/spaces/L2/L2_AVX_FP32.h	`96.42% <96.42%> (ø)`
src/VecSim/spaces/IP/IP_SSE_FP64.h	`96.55% <96.55%> (ø)`
src/VecSim/spaces/L2/L2_SSE_FP64.h	`96.66% <96.66%> (ø)`
... and 9 more

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

src/VecSim/spaces/IP_space.cpp

meiravgri

looks very good and clean ! Well done
Please verify we still need all the compilation & hardware flags for DQ

src/VecSim/spaces/CMakeLists.txt

tests/unit/test_spaces.cpp

DvirDukhan · 2023-07-09T12:03:06Z

tests/unit/test_spaces.cpp

+    ASSERT_EQ(L2_FP32_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP32_L2Sqr);
+    ASSERT_EQ(L2_FP64_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP64_L2Sqr);
+    ASSERT_EQ(IP_FP32_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP32_InnerProduct);
+    ASSERT_EQ(IP_FP64_GetDistFunc(dim, ARCH_OPT_AVX512_F), FP64_InnerProduct);
 }


will always work?

Unless we make some optimization for dim<8/16

DvirDukhan · 2023-07-09T12:04:11Z

src/VecSim/spaces/implementation_chooser_cleanup.h

+#pragma once
+
+/*
+ * Include this file after done using the implementation chooser.


probably nothing unless more X definitions will be presented

DvirDukhan · 2023-07-09T12:05:30Z

src/VecSim/spaces/implementation_chooser.h

+ * We assume that we are dealing with 512-bit blocks, so we define a chunk size of 16 for 32-bit
+ * elements, and a chunk size of 8 for 64-bit elements. The main macro is CHOOSE_IMPLEMENTATION, and


why we are assuming this?

this was the way the functions were implemented in the first place. we can consider changing that but today it simply means X2 more condition checks for AVX and X4 for SSE. also it makes choosing the implementation simpler

DvirDukhan · 2023-07-09T12:48:12Z

src/VecSim/spaces/implementation_chooser.h

+// Macro for 4 cases. Used to collapse the switch statement. For a given N, expands to 4 X macros
+// of 4N, 4N+1, 4N+2, 4N+3.
+#define C4(X, func, N) X(4 * N, func) X(4 * N + 1, func) X(4 * N + 2, func) X(4 * N + 3, func)
+
+// Macros for 8 and 16 cases. Used to collapse the switch statement. Expands into 0-15 or 0-7 cases.
+#define CASES16(X, func) C4(X, func, 0) C4(X, func, 1) C4(X, func, 2) C4(X, func, 3)
+#define CASES8(X, func)  C4(X, func, 0) C4(X, func, 1)


you can use C8 instead of C4

Right. This was done for readability (so each definition will fit one line). I can change that

DvirDukhan · 2023-07-09T17:15:49Z

src/VecSim/spaces/CMakeLists.txt

 	target_link_libraries(VectorSimilaritySpaces cpu_features)
 endif()
-
-if(CMAKE_HOST_SYSTEM_PROCESSOR MATCHES "(x86_64)|(AMD64|amd64)|(^i.86$)")


why the removal?

as all the functions are now templated and written in H files, we no longer have files that are compiled only sometimes

DvirDukhan · 2023-07-09T17:28:33Z

src/VecSim/spaces/AVX_utils.h

+#include "space_includes.h"
+
+template <__mmask8 mask> // (2^n)-1, where n is in 1..7 (1, 4, ..., 127)
+static inline __m256 my_mm256_maskz_loadu_ps(const float *p) {


I don't think there is a need for those

I can make them not templated. Probably won't affect the performance

* HNSW refactor 3 - [MOD-5302] (#389) * moved vector_block * make DataBlock available to use in vectors * some general improvements * implement data blocks in HNSW * disabled serializer benchmarks and some tests * more disable * enabled serialization (for current implementation) * added prefetch for range * move generic meta of element to a separated vector * added prefetch for metadata * enabled save and load from bindings * [TMP] return of the old HNSW REVERT ME * fix bm * some changes in prefetch * shortened BM * reverted prefetches to by as before * shorten BM * small improvement for range * packing structs * unrelated performance improvement * fix * revert adding origin hnsw, remove support for v1 and v2 serialization * update for BM file * some fixes and test updates * [TEMP] disable 2 tests so coverage will run * fix flow test * more test fixes * improved tests * fix for bach iterator scan (needs a benchmark) * for benchmark * reverting some temporary changes * more reverting * fix for clang * file name update * another prefetch option * few improvements * some more trying * final change * another update to all but `hnsw.h` * returned `increaseCapacity` responsibility to `addVector` * make `hnsw.h` use blocks * BF comment fix * comment fix * fix some tests * fixed hnsw tests * fixed hnsw-multi tests * fixed almost all tiered HNSW tests * Fix memory bookkeeping tests * Fix memory bookkeeping tests 2 * fixed estimations and their tests * review fixes * fix review fix * more review fixes * move rounding up of initial capacity to a static function * added comments on data blocks * some optimizations (reduce the use of `getDataByInternalId`) --------- Co-authored-by: alon <alonreshef24@gmail.com> * HNSW blocks refactor - add lock to graph data struct (#390) * Improved serializing code - [MOD-5372] (#391) * improved serializing code * review fixes * HNSW Refactor - benchmarks - [MOD-5371] (#392) * update benchmark files * updated wget links * publish serialization script * another benchmark cleanup iteration * review fixes * Renaming "meta" variables (#394) * renaming "meta" variables * revert temp change * Optimize Distance Functions - [MOD-5434] (#395) * initial templated with masks implementations * format * tidy up * enabled spaces tests back * changed template type and handle residual first * re-enabled benchmarks (keeping old names) * download fix * improved unit testing * improved spaces benchmarks * verify correctness * some cleanup * give up optimizing dim<16 for safety * aligned serialization links * added lots of comments * added a test and small fix * include opts only on x86 machines * remove AVX512DQ references from the project (not in use) * rename qty to dimension * Update AVX_utils.h comments * Optimize - implement align allocation for vector alignment - [MOD-5433] (#399) * aligning query vector * implement aligned allocation * added alignment hing to VecSimIndexAbstract, used it in block allocation * test fix * review fixes * set default value to the alignment hint (1 - any address is valid) * refactor allocation header to have alignment flag, unify free function * use alignment only on vector blocks * changed default alignment value (0) * updated tests * added missing break * improved comment * removed alignment from allocator test

GuyAv46 added benchmarks-all and removed benchmarks-all labels Jun 29, 2023

GuyAv46 added bm-spaces benchmarks-all and removed benchmarks-all bm-spaces labels Jul 1, 2023

GuyAv46 added 14 commits July 2, 2023 23:21

initial templated with masks implementations

0a57d42

format

9e781be

tidy up

1e55bd6

enabled spaces tests back

959ca97

changed template type and handle residual first

4314ad6

re-enabled benchmarks (keeping old names)

9e196ec

download fix

5e9bbc1

improved unit testing

3c5f3cd

improved spaces benchmarks

b64fd8b

verify correctness

51a62dd

some cleanup

74c9c15

give up optimizing dim<16 for safety

6d435e4

aligned serialization links

d8f9fbb

added lots of comments

e9c7f93

GuyAv46 force-pushed the guyav-optimize_dist_funcs branch from 5030493 to e9c7f93 Compare July 3, 2023 08:28

GuyAv46 changed the base branch from main to feature_hnsw_vector_blocks July 3, 2023 08:29

GuyAv46 requested a review from meiravgri July 3, 2023 08:30

added a test and small fix

74e0fa4

GuyAv46 changed the title ~~Optimize Distance Functions~~ Optimize Distance Functions - [MOD-5434] Jul 3, 2023

GuyAv46 added the multi-build label Jul 3, 2023

GuyAv46 mentioned this pull request Jul 3, 2023

give up optimizing dim<16 for safety #396

Closed

2 tasks

include opts only on x86 machines

d32176d

GuyAv46 marked this pull request as ready for review July 3, 2023 10:28

GuyAv46 added multi-build and removed multi-build labels Jul 3, 2023

meiravgri reviewed Jul 6, 2023

View reviewed changes

src/VecSim/spaces/IP_space.cpp Show resolved Hide resolved

meiravgri reviewed Jul 6, 2023

View reviewed changes

src/VecSim/spaces/CMakeLists.txt Show resolved Hide resolved

tests/unit/test_spaces.cpp Outdated Show resolved Hide resolved

GuyAv46 added 2 commits July 6, 2023 19:38

remove AVX512DQ references from the project (not in use)

6c65927

rename qty to dimension

53e3b07

GuyAv46 requested a review from DvirDukhan July 7, 2023 09:02

Update AVX_utils.h comments

baf58c0

DvirDukhan reviewed Jul 10, 2023

View reviewed changes

DvirDukhan approved these changes Jul 12, 2023

View reviewed changes

GuyAv46 added multi-build and removed multi-build labels Jul 12, 2023

GuyAv46 merged commit 2b3d0cf into feature_hnsw_vector_blocks Jul 12, 2023

GuyAv46 deleted the guyav-optimize_dist_funcs branch July 12, 2023 10:33

GuyAv46 restored the guyav-optimize_dist_funcs branch July 12, 2023 10:33

GuyAv46 mentioned this pull request Jul 12, 2023

Feature - HNSW refactor - vector blocks - [MOD-5302] #397

Merged

2 tasks

GuyAv46 deleted the guyav-optimize_dist_funcs branch July 18, 2023 04:45

		* We assume that we are dealing with 512-bit blocks, so we define a chunk size of 16 for 32-bit
		* elements, and a chunk size of 8 for 64-bit elements. The main macro is CHOOSE_IMPLEMENTATION, and

Optimize Distance Functions - [MOD-5434] #395

Optimize Distance Functions - [MOD-5434] #395

Uh oh!

Conversation

GuyAv46 commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

meiravgri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

GuyAv46 commented Jun 29, 2023 •

edited

Loading

codecov bot commented Jun 30, 2023 •

edited

Loading