Skip to content

Comments

Datasets benchmark util#113

Merged
alonre24 merged 53 commits intomainfrom
datasets_benchmark_util
Jan 20, 2022
Merged

Datasets benchmark util#113
alonre24 merged 53 commits intomainfrom
datasets_benchmark_util

Conversation

@alonre24
Copy link
Collaborator

Waiting for #108

@codecov
Copy link

codecov bot commented Jan 10, 2022

Codecov Report

Merging #113 (ea4e8de) into main (c586d72) will increase coverage by 0.01%.
The diff coverage is 94.73%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #113      +/-   ##
==========================================
+ Coverage   89.24%   89.25%   +0.01%     
==========================================
  Files          39       39              
  Lines        1803     1806       +3     
==========================================
+ Hits         1609     1612       +3     
  Misses        194      194              
Impacted Files Coverage Δ
src/VecSim/algorithms/hnsw/serialization.cpp 95.07% <94.11%> (+0.04%) ⬆️
src/VecSim/algorithms/hnsw/hnswlib.h 96.07% <100.00%> (+<0.01%) ⬆️
src/VecSim/memory/vecsim_malloc.cpp 81.39% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c586d72...ea4e8de. Read the comment docs.

@alonre24 alonre24 marked this pull request as ready for review January 18, 2022 16:55

data_level0_memory_ = (char *)this->allocator->allocate(max_elements_ * size_data_per_element_);
data_level0_memory_ =
(char *)this->allocator->callocate(max_elements_ * size_data_per_element_);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valgrind is shouting if we try to write uninitialised memory to a file, and that is what happening when we save an index which has less elements than max_elements (as all the memory is allocated in advanced...)


# Create an HNSW index from dataset based on specific params.
def create_hnsw_index(dataset, ef_construction, M):
X_train = np.array(dataset['train'])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X_train =>> x ? we are not training, right?
also, distance =>> metric

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is ANN terminology... The vectors the we insert to the index are the "train set", and the vector for which we perform Top K search are the "test set"

@alonre24 alonre24 merged commit 8ab7ccc into main Jan 20, 2022
@alonre24 alonre24 deleted the datasets_benchmark_util branch January 20, 2022 11:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants