Datasets benchmark util by alonre24 · Pull Request #113 · RedisAI/VectorSimilarity

alonre24 · 2022-01-10T16:54:17Z

Waiting for #108

…ization)

…ex initial size doesn't match the actual size upon loading

codecov · 2022-01-10T16:55:04Z

Codecov Report

Merging #113 (ea4e8de) into main (c586d72) will increase coverage by 0.01%.
The diff coverage is 94.73%.

@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
+ Coverage   89.24%   89.25%   +0.01%     
==========================================
  Files          39       39              
  Lines        1803     1806       +3     
==========================================
+ Hits         1609     1612       +3     
  Misses        194      194

Impacted Files	Coverage Δ
src/VecSim/algorithms/hnsw/serialization.cpp	`95.07% <94.11%> (+0.04%)`	⬆️
src/VecSim/algorithms/hnsw/hnswlib.h	`96.07% <100.00%> (+<0.01%)`	⬆️
src/VecSim/memory/vecsim_malloc.cpp	`81.39% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c586d72...ea4e8de. Read the comment docs.

.circleci/config.yml

src/VecSim/algorithms/hnsw/serialization.cpp

.circleci/config.yml

docs/benchmarks.md

DvirDukhan · 2022-01-18T23:52:34Z

src/VecSim/algorithms/hnsw/hnswlib.h


-    data_level0_memory_ = (char *)this->allocator->allocate(max_elements_ * size_data_per_element_);
+    data_level0_memory_ =
+        (char *)this->allocator->callocate(max_elements_ * size_data_per_element_);


why the change?

valgrind is shouting if we try to write uninitialised memory to a file, and that is what happening when we save an index which has less elements than max_elements (as all the memory is allocated in advanced...)

tests/benchmark/bm_datasets.py

DvirDukhan · 2022-01-18T23:58:01Z

tests/benchmark/bm_datasets.py

+
+# Create an HNSW index from dataset based on specific params.
+def create_hnsw_index(dataset, ef_construction, M):
+    X_train = np.array(dataset['train'])


X_train =>> x ? we are not training, right?
also, distance =>> metric

This is ANN terminology... The vectors the we insert to the index are the "train set", and the vector for which we perform Top K search are the "test set"

tests/benchmark/bm_datasets.py

tests/benchmark/bm_batch_iterator.cpp

tests/benchmark/bm_basics.cpp

…VectorSimilarity into datasets_benchmark_util

src/VecSim/algorithms/hnsw/serialization.cpp

tests/benchmark/bm_basics.cpp

alonre24 added 23 commits December 29, 2021 15:07

Add demo in jupiter notebook

3b55c3f

update and clean demo

b134276

Clean outputs

e120c0f

Add save and load index methods (wip)

5101cb2

Add save and load index methods (wip)

feb3783

Fix valgrind

8c99561

Add bindings and tests

05a728c

Merge branch 'main' into hnsw_save_load_index

f13aad4

formatting

40335d9

fix warnings

6483e1b

fix valgrind (wip)

d4737a4

formatting

00252fb

rename hnswlib_c, fix valgrind test

a4327fd

milestone - serializer work + no leaks

b5204d2

Add build option that enables build without tests (and without serial…

6e6654b

…ization)

Divide load and store to functions

d5ce8dd

update bindings + formatting

17b35c1

merge main

40df5fb

small fix

7d13bf2

create script for running ann benchmarks + fix serialization when ind…

99572e8

…ex initial size doesn't match the actual size upon loading

Update google benchmark

fdf89cc

Update google benchmark

ed6566d

small changes

4e02a96

alonre24 added 6 commits January 16, 2022 11:51

Merge branch 'main' into datasets_benchmark_util

3ccc0b1

Use serialization in google benchmark

4a4059f

merge main

eb211af

Refactor batch iterator BM

a0f4624

Refactor batch iterator BM

f125c24

Fix syntax error in config.yml

791683e

alonre24 added 8 commits January 18, 2022 00:28

docs fixing

46faf94

add test for serializing empty index to increase coverage

a1d45b3

Merge branch 'main' into datasets_benchmark_util

3e8caee

formatting

c991431

WIP

f0a5d11

save memory in runtime

882cc70

fix mac build

424e5f2

fix valgrind

34174e2

alonre24 requested review from DvirDukhan and GuyAv46 January 18, 2022 16:54

alonre24 marked this pull request as ready for review January 18, 2022 16:55

GuyAv46 reviewed Jan 18, 2022

View reviewed changes

.circleci/config.yml Outdated Show resolved Hide resolved

src/VecSim/algorithms/hnsw/serialization.cpp Outdated Show resolved Hide resolved

DvirDukhan reviewed Jan 19, 2022

View reviewed changes

alonre24 added 4 commits January 19, 2022 10:43

Merge branch 'main' into datasets_benchmark_util

3f72e70

Pr comments

67698b2

Merge branch 'datasets_benchmark_util' of https://github.com/RedisAI/…

69c23fe

…VectorSimilarity into datasets_benchmark_util

bump h5py and python versions

5519843

DvirDukhan reviewed Jan 19, 2022

View reviewed changes

src/VecSim/algorithms/hnsw/serialization.cpp Outdated Show resolved Hide resolved

tests/benchmark/bm_basics.cpp Show resolved Hide resolved

alonre24 added 9 commits January 20, 2022 09:29

Add ann-benchmark to ci

133bc4b

Add poetry install to "make pybind"

f12108f

use tox to run dataset_bm in ci

9977957

use tox to run dataset_bm in ci - fix

c48de35

final fixes - remove benchmark from CI (have it only in nightly)

a5efa96

final fixes - remove benchmark from CI (have it only in nightly)

49df588

Merge branch 'main' into datasets_benchmark_util

da12aa7

formatt

edfde32

formatt

ea4e8de

DvirDukhan approved these changes Jan 20, 2022

View reviewed changes

alonre24 merged commit 8ab7ccc into main Jan 20, 2022

alonre24 deleted the datasets_benchmark_util branch January 20, 2022 11:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Datasets benchmark util#113

Datasets benchmark util#113
alonre24 merged 53 commits intomainfrom
datasets_benchmark_util

alonre24 commented Jan 10, 2022

Uh oh!

codecov bot commented Jan 10, 2022 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DvirDukhan Jan 18, 2022

Uh oh!

alonre24 Jan 19, 2022

Uh oh!

Uh oh!

DvirDukhan Jan 18, 2022

Uh oh!

alonre24 Jan 19, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

alonre24 commented Jan 10, 2022

Uh oh!

codecov bot commented Jan 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DvirDukhan Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

alonre24 Jan 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DvirDukhan Jan 18, 2022

Choose a reason for hiding this comment

Uh oh!

alonre24 Jan 19, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jan 10, 2022 •

edited

Loading