Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support FLOAT64 as vector data type MOD-3982 #3129

Merged
merged 12 commits into from
Oct 19, 2022
Merged

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Oct 2, 2022

Integrating the support of FLOAT64 vectors into RediSearch and add flow tests. The enablement required two small modifications (due to preparation work done in advanced):

  • In ft.create - allow specifying FLOAT64 as the value for the TYPE argument of a vector field in a schema - for example: FT.CREATE idx SCHEMA v VECTOR HNSW 6 TYPE FLOAT64 DIM 4096 DISTANCE_METRIC L2
  • In integration with json - choose the appropriate callback that converts the vector element in a json array into the right data type, based on the vector index meta data (that is, allow converting elements to float64 in addition to float32)

Rest of this PR handles testing, including refactoring of vecsim_test.py making it more generic, thus allowing test to run over more than one hard-coded data type.

@CLAassistant
Copy link

CLAassistant commented Oct 2, 2022

CLA assistant check
All committers have signed the CLA.

@lgtm-com
Copy link

lgtm-com bot commented Oct 2, 2022

This pull request introduces 1 alert and fixes 24 when merging 9f74b3b into 2144521 - view on LGTM.com

new alerts:

  • 1 for Multiplication result converted to larger type

fixed alerts:

  • 24 for Function declared in block

@lgtm-com
Copy link

lgtm-com bot commented Oct 2, 2022

This pull request introduces 1 alert and fixes 24 when merging 4393d59 into 2144521 - view on LGTM.com

new alerts:

  • 1 for Multiplication result converted to larger type

fixed alerts:

  • 24 for Function declared in block

@lgtm-com
Copy link

lgtm-com bot commented Oct 2, 2022

This pull request introduces 1 alert and fixes 24 when merging 9d5b398 into 2144521 - view on LGTM.com

new alerts:

  • 1 for Multiplication result converted to larger type

fixed alerts:

  • 24 for Function declared in block

@lgtm-com
Copy link

lgtm-com bot commented Oct 2, 2022

This pull request introduces 1 alert and fixes 24 when merging 863507d into 7991f4a - view on LGTM.com

new alerts:

  • 1 for Multiplication result converted to larger type

fixed alerts:

  • 24 for Function declared in block

@codecov
Copy link

codecov bot commented Oct 2, 2022

Codecov Report

Base: 82.71% // Head: 82.74% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (20a0b6e) compared to base (208daf1).
Patch coverage: 76.47% of modified lines in pull request are covered.

❗ Current head 20a0b6e differs from pull request most recent head 4ea2114. Consider uploading reports for the commit 4ea2114 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3129      +/-   ##
==========================================
+ Coverage   82.71%   82.74%   +0.03%     
==========================================
  Files         180      180              
  Lines       29654    29660       +6     
==========================================
+ Hits        24528    24542      +14     
+ Misses       5126     5118       -8     
Impacted Files Coverage Δ
src/aggregate/aggregate_exec.c 96.72% <60.00%> (-0.56%) ⬇️
src/debug_commads.c 88.13% <60.00%> (+0.52%) ⬆️
src/index.c 83.66% <100.00%> (+0.86%) ⬆️
src/json.c 88.88% <100.00%> (+1.34%) ⬆️
src/spec.c 87.85% <100.00%> (+0.01%) ⬆️
src/fork_gc.c 56.14% <0.00%> (-0.81%) ⬇️
src/vector_index.c 86.46% <0.00%> (+0.43%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -127,7 +127,7 @@ FT.CREATE my_index2
SCHEMA vector_field VECTOR
HNSW
14
TYPE FLOAT32
TYPE FLOAT64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add another example instead of changing it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FLAT index example is with FLOAT32... Didn't want to overload with examples...

tests/pytests/test_json.py Show resolved Hide resolved
tests/pytests/test_vecsim.py Outdated Show resolved Hide resolved
tests/pytests/test_vecsim.py Outdated Show resolved Hide resolved
Comment on lines +1268 to +1277
# For FLOAT64, this block size exceeds 10% of system memory, but not for FLOAT32
block_size = system_memory // (dim*float64_byte_size) // 9
if data_type == 'FLOAT32':
env.expect('FT.CREATE', currIdx, 'SCHEMA', 'v', 'VECTOR', 'FLAT', '10', 'TYPE', data_type,
'DIM', dim, 'DISTANCE_METRIC', 'L2', 'INITIAL_CAP', 0, 'BLOCK_SIZE', block_size).ok()
else:
env.expect('FT.CREATE', currIdx, 'SCHEMA', 'v', 'VECTOR', 'FLAT', '10', 'TYPE', data_type,
'DIM', dim, 'DISTANCE_METRIC', 'L2', 'INITIAL_CAP', 0, 'BLOCK_SIZE', block_size).error().contains(
f'Vector index block size {block_size} exceeded server limit')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing that one type is passing and the other is not

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but that's the exact purpose of the test... to test the difference in memory estimation between the two types

tests/pytests/test_vecsim.py Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Oct 3, 2022

This pull request introduces 1 alert and fixes 24 when merging 20a0b6e into 7991f4a - view on LGTM.com

new alerts:

  • 1 for Multiplication result converted to larger type

fixed alerts:

  • 24 for Function declared in block

@alonre24 alonre24 marked this pull request as ready for review October 19, 2022 13:13
@DvirDukhan DvirDukhan changed the title Support FLOAT64 as vector data type Support FLOAT64 as vector data type MOD-3982 Oct 19, 2022
@oshadmi oshadmi merged commit 8f34860 into master Oct 19, 2022
@oshadmi oshadmi deleted the alon_enable_vecsim_fp64 branch October 19, 2022 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants