Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #1043: Fixing problems after restart of Frontier service #1054

Merged

Conversation

michaeldinzinger
Copy link
Contributor

Hello,

this PR should fix two problems which I faced when the crawler was communicating with an URLFrontier service, and this service was down and restarted. These problems do not appear in case the URLFrontier just keeps running without any complications. However, in case the URL Frontier goes down and has to be restarted, it means, as a consequence, that the StormCrawler has to be restarted because it is in a failed state.

The two code changes are therefore:

  • Periodically cleaning up the waitAck cache in order to trigger the eviction of expired entires. Otherwise, it comes to a starvation as soon as the cache runs full and the waitAck cache has neither read nor write accesses, so that no eviction is triggered (but still the eviction of entries is necessary in order to release permits from the inFlightSemaphore)
    The starvation looks like this:
13:41:24.635 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30000 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.645 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30010 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.656 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30020 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.666 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30030 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.677 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30040 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.687 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30050 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.698 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30060 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.709 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30070 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.719 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30080 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.730 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30090 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.740 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30100 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.751 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30110 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.761 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30120 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.772 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30130 ms for processing. There are 0 permits available for 0 waiting threads.
13:41:24.782 [Time-limited test] WARN  c.d.s.u.StatusUpdaterBolt - Waiting more than 30140 ms for processing. There are 0 permits available for 0 waiting threads.
  • Implementing an additional thread which periodically tries to initialize the connection to the URLFrontier service. This is necessary because the restarted Frontier would not receive a new PutURL call to start up the new communication
  • Another code change was that I had to skip the code format validation. After a wearisome search for the mistake, I just couldn't find out the reason why the StatusUpdaterBoltTest.java class is not correctly formatted. So I'm happy about any comment

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
@sonatype-lift
Copy link
Contributor

sonatype-lift bot commented Apr 15, 2023

🛠 Lift Auto-fix

Some of the Lift findings in this PR can be automatically fixed. You can download and apply these changes in your local project directory of your branch to review the suggestions before committing.1

# Download the patch
curl https://lift.sonatype.com/api/patch/github.com/DigitalPebble/storm-crawler/1054.diff -o lift-autofixes.diff

# Apply the patch with git
git apply lift-autofixes.diff

# Review the changes
git diff

Want it all in a single command? Open a terminal in your project's directory and copy and paste the following command:

curl https://lift.sonatype.com/api/patch/github.com/DigitalPebble/storm-crawler/1054.diff | git apply

Once you're satisfied, commit and push your changes in your project.

Footnotes

  1. You can preview the patch by opening the patch URL in the browser.

@jnioche jnioche added this to the 2.9 milestone Apr 17, 2023
@apache apache deleted a comment from sonatype-lift bot Apr 17, 2023
@jnioche
Copy link
Contributor

jnioche commented Apr 17, 2023

Judging by grpc/grpc-java#8177 it is not necessary to recreate the channel, only the streamObservers

@jnioche
Copy link
Contributor

jnioche commented Apr 22, 2023

@michaeldinzinger I have made some changes to your PR, would it be easier if I pushed these in a separate PR and we take it as a continuation of the discussion or would you rather rework the one you have started here?

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
@michaeldinzinger
Copy link
Contributor Author

michaeldinzinger commented Apr 22, 2023

I have made some changes to your PR, would it be easier if I pushed these in a separate PR and we take it as a continuation of the discussion or would you rather rework the one you have started here?

Hello Julien, sorry, I haven't seen your message as I was also just working on it and just pushed a change to the PR. You might want to have a look and, in case your and my code modifications diverge, you could modify it here in this PR

I omitted the additional thread and used the notifyWhenStateChanged callback. Now, I hope it looks better. Maybe this also goes along with your changes

Copy link
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments for now, please look at the existing ones as well. Will have a closer look on Monday. Thanks!

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Copy link
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we are nearly there but could you please look at the remaining few comments? Thanks @michaeldinzinger

Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
Signed-off-by: Michael Dinzinger <michael.dinzinger@uni-passau.de>
@michaeldinzinger
Copy link
Contributor Author

I hope I didn't miss any comment. In case there are further remarks to work on, please let me know

Copy link
Contributor

@jnioche jnioche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great thanks @michaeldinzinger

@jnioche jnioche merged commit 91d782c into apache:master Apr 28, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants