Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

=htc #1245 fix pool idle-timeout race condition #1311

Merged
merged 1 commit into from
Aug 15, 2017

Conversation

jrudolph
Copy link
Member

Previously, a shutdown request didn't check if there are outstanding
requests so that outstanding responses were just lost.

This is still missing tests, but I haven't been able to reproduce it reliably. The reason is that there's only a short time frame where you can hit the race, i.e. while the scheduled gateway.shutdown is dispatched through PoolMasterActor to PoolInterfaceActor a new request must come in.

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed validating PR that is currently being validated by Jenkins labels Jul 26, 2017
@akka-ci
Copy link

akka-ci commented Jul 26, 2017

Test PASSed.

Copy link
Member

@ktoso ktoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix LGTM, thanks for the explanation - perhaps add it above the shuttingDown too so it's easier to remember once we revisit this code some day why it was needed without reading the impl.

I see why adding a good reproducer will be tricky... Starting a server that's very slot in delivering the responses perhaps could do it?

@jrudolph
Copy link
Member Author

I see why adding a good reproducer will be tricky... Starting a server that's very slot in delivering the responses perhaps could do it?

No, actually that's not where the race condition is. It's between PoolMasterActor and PoolInterfaceActor. PIA schedules the idle-timeout and when it triggers it sends a Shutdown message to the PMA. This will then send a Shutdown message to the PIA. The other process is an incoming request that will be dispatched from the PMA to the PIA and which will reset the idle-timeout. If the request comes in right after the idle-timeout triggered the problem happened.

@jrudolph jrudolph force-pushed the jr/w/1245-fix-pool-idle-timeout-race branch from 0006c1b to 1f17111 Compare July 27, 2017 13:30
@jrudolph
Copy link
Member Author

Added another comment as you suggested.

@akka-ci akka-ci added validating PR that is currently being validated by Jenkins tested PR that was successfully built and tested by Jenkins and removed tested PR that was successfully built and tested by Jenkins validating PR that is currently being validated by Jenkins labels Jul 27, 2017
@akka-ci
Copy link

akka-ci commented Jul 27, 2017

Test PASSed.

@jrudolph
Copy link
Member Author

I have an idea for a test, let's try that.

@ktoso
Copy link
Member

ktoso commented Jul 27, 2017

Thanks for explaining the race in detail, that's indeed very tricky to reproduce :O Looking forward to see how you get around to it :)

@jrudolph
Copy link
Member Author

Haha, the test revealed another race condition 🙄

@ktoso
Copy link
Member

ktoso commented Jul 27, 2017

The pool is a present that keeps on giving... 🙄

Previously, a shutdown request didn't check if there are outstanding
requests so that outstanding responses were just lost.
@jrudolph jrudolph force-pushed the jr/w/1245-fix-pool-idle-timeout-race branch from 1f17111 to 9427fb0 Compare July 27, 2017 15:04
@akka-ci akka-ci added validating PR that is currently being validated by Jenkins and removed tested PR that was successfully built and tested by Jenkins labels Jul 27, 2017
@jrudolph
Copy link
Member Author

The test turned out to end up quite simple. It failed for me reliably with the timings given but that might be different on another machine.

@akka-ci akka-ci added tested PR that was successfully built and tested by Jenkins and removed validating PR that is currently being validated by Jenkins labels Jul 27, 2017
@akka-ci
Copy link

akka-ci commented Jul 27, 2017

Test PASSed.

@jrudolph jrudolph merged commit 6cff700 into akka:master Aug 15, 2017
@jrudolph jrudolph deleted the jr/w/1245-fix-pool-idle-timeout-race branch August 15, 2017 09:43
@jrudolph
Copy link
Member Author

Fixes #1245

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tested PR that was successfully built and tested by Jenkins
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants