-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix starvation and busy waiting of ES IndexerBolt #989
Conversation
Protect BulkInsert from accidential closure, fix errors in close. Signed-off-by: Felix Engl <felix.engl@uni-bamberg.de>
can you please rebase this PR against the main branch? It will be easier to review |
Thanks @FelixEngl |
|
||
if (bulkItemsWithFailedFlag.size() == 1) { | ||
selected = bulkItemsWithFailedFlag.get(0); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review the following lines (328 - 344), in the original code we processed all BulkResponseItems. Here we abbreviate that and only process the first fail or the last ack.
Does this comply with the wanted behaviour or do we want to process all?
We could also just increase the counts like we are processing all of them but in the end we only process it once. (By increasing the counts and the the reports by the length of bulkItemsWithFailedFlag
in line 329)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed this bit. we need to process all BulkResponseItems as in the original
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reopen this PR oder making a new one fixing this mistake?
@jnioche Sorry for that. (I don't think It'll mess up any logic except counting acks and failures, so there won't be any problem with the crawling itself.) |
A new one would be easier
…On Tue, 12 Jul 2022, 16:08 Felix Engl, ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In
external/elasticsearch/src/main/java/com/digitalpebble/stormcrawler/elasticsearch/bolt/IndexerBolt.java
<#989 (comment)>
:
> + waitAckLock.unlock();
+ }
+
+ int ackCount = 0;
+ int failureCount = 0;
+
+ for (var entry : presentTuples.entrySet()) {
+ final var id = entry.getKey();
+ final var associatedTuple = entry.getValue();
+ final var bulkItemsWithFailedFlag = idsToBulkItemsWithFailedFlag.get(id);
+
+ BulkItemResponseToFailedFlag selected;
+
+ if (bulkItemsWithFailedFlag.size() == 1) {
+ selected = bulkItemsWithFailedFlag.get(0);
+ } else {
Reopen this PR oder making a new one fixing this mistake?
—
Reply to this email directly, view it on GitHub
<#989 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABVJTYID36E5WJSX6LXUKLVTWC5NANCNFSM53AVWESQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The
IndexerBold
has the same problems asStatusUpdaterBold
.Basically copy-pasted the
StatusUpdaterBold
intoIndexerBold
and changed some status handling code. (see #988)When the parent PR is accepted this one will only differ in 2 files. (Parent PR: #988)
Best Regards
Felix
Signed-off-by: Felix Engl felix.engl@uni-bamberg.de
Thanks for contributing to StormCrawler, your efforts are appreciated!
Developer Certificate of Origin
By contributing to StormCrawler, you accept and agree to the following terms and conditions (the Developer Certificate of Origin) for your present and future contributions submitted to StormCrawler.
Please refer to the Developer Certificate of Origin section in
CONTRIBUTING.md
for details.Before opening a PR, please check that:
Thanks!