New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: use external versioning for deletes #422
Fix: use external versioning for deletes #422
Conversation
|
It looks like @dainiusjocas hasn't signed our Contributor License Agreement, yet.
You can read and sign our full Contributor License Agreement here. Once you've signed reply with Appreciation of efforts, clabot |
|
[clabot:check] |
|
@confluentinc It looks like @dainiusjocas just signed our Contributor License Agreement. 👍 Always at your service, clabot |
|
A blog post that explains why this PR is a bug fix: https://www.jocas.lt/blog/post/kc_es_data_consistency/ |
|
test this please |
|
|
|
What is the status of this change and what prevent it from moving forward? The issue (if still exists) is concerning |
|
@sp-gupta Would you be able to comment here? Or at least get someone who could? Thanks! |
The issue was fixed with other changes a couple of years ago. |
Problem
If Elasticsearch indexer is highly concurrent, keys are used as ids, and indexer is set to delete records on
nullvalues, then not using external versioning for delete requests might corrupt the data: records that should not be deleted end up being deleted.Solution
Use
"version_type" "external"for deletes and as a version number use topic offset.Does this solution apply anywhere else?
Test Strategy
Create an
ElasticsearchSinkTaskwith these properties:Then produce a sequence of records with the same key where around half of the records are null, half of them are not null with a version number, and as the last record put a document with the highest version number. Start indexing and expect that in the Elasticsearch index there is exactly one document with the "message" equal to
numOfRecords.Note:
To observe the previous faulty behavior one needs to comment the code which sets the external version for the delete request and run the integration test introduced in this PR several times, the results should be flaky.
Testing done:
Release Plan
It is safe to release and backport the code.