-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(elastic): update offset2ids management #416
fix(elastic): update offset2ids management #416
Conversation
please re-review @JoanFM |
Codecov Report
@@ Coverage Diff @@
## main #416 +/- ##
==========================================
+ Coverage 86.53% 86.55% +0.01%
==========================================
Files 134 134
Lines 6389 6395 +6
==========================================
+ Hits 5529 5535 +6
Misses 860 860
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
assert len(elastic_doc) == len(elastic_doc[:, 'embedding']) | ||
assert len(elastic_doc) == indexed_offset_count | ||
|
||
elastic_doc._client.indices.delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not erase here, just create a random name for the index at the beginning so u do not need to care about this. In any case if test fails the index will be polluted and other tests may fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted
)['hits']['hits'][0]['_id'] | ||
assert actual_offset_index == expected_offset | ||
|
||
elastic_doc._client.indices.delete( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, nothing after asserts should be needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted
index=elastic_doc._index_name_offset2id | ||
)['count'] | ||
|
||
assert len(elastic_doc) == len(elastic_doc[:, 'embedding']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would assert that len(elastic_doc) == 7
also for extra security, this test otherwise would pass even with wrong behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay noted
index=elastic_doc._index_name_offset2id | ||
)['count'] | ||
|
||
assert len(elastic_doc._offset2ids.ids) == indexed_offset_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, what should be the length here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep will be updated
'n_dim': 3, | ||
'columns': [('price', 'int')], | ||
'distance': 'l2_norm', | ||
'index_name': 'test_add', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you perhaps randomize the index name or give the index name the exact same name as the test function name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sure will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much @alphinside for this great contribution!
Goals:
Solving inconsistent offset issue on deletion #407.
I've found that after index deletion, update offset persistence has missed on truncating the trailing offset if the len of ids is less than initial length. So in this PR I submit logic to reevaluate offset length compared to in memory offset ids list and bulk delete offset which is more than the offset ids len
Also solving inconsitent offset issue on extend/add #412
Elasticsearch handle document indexing as upsert operation when indexing using same id, however the extend logic didn't using this logic, so I update the offset indexing to not extend id that already exist