Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LIVY-729] Fix livy recover the killed session #266

Closed

Conversation

runzhiwang
Copy link
Contributor

@runzhiwang runzhiwang commented Dec 18, 2019

What changes were proposed in this pull request?

Follows are steps to reproduce the problem:

  1. Set livy.server.recovery.mode=recovery, and create an interactive session: session0 in yarn-cluster
  2. kill the yarn application of the session
  3. restart livy
  4. livy try to recover session0, but application has been killed and driver does not exist, so client can not connect to driver, and exception was thrown as the image.
  5. If the ip:port of the driver was reused by session1, client of session0 will try to connect to driver of session1, then driver will throw exception: Unexpected client ID.
  6. Both the exception threw by livy and driver will confuse the user, and recover a lot of killed sessions will delay the recover of alive session.
    image

How to fix:
When session was killed or dead, remove the session from the store

How was this patch tested?

Existed IT and UT.

@runzhiwang
Copy link
Contributor Author

@jerryshao @yiheng Could you help to review ?

@runzhiwang runzhiwang changed the title [LIVY-729] Fix livy should not recover the killed session [LIVY-729] Fix livy recover the killed session Dec 18, 2019
@runzhiwang runzhiwang closed this Dec 18, 2019
@runzhiwang runzhiwang reopened this Dec 18, 2019
@codecov-io
Copy link

codecov-io commented Dec 18, 2019

Codecov Report

Merging #266 into master will increase coverage by 0.09%.
The diff coverage is 76.47%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #266      +/-   ##
============================================
+ Coverage     68.09%   68.18%   +0.09%     
- Complexity      939      945       +6     
============================================
  Files           102      102              
  Lines          5883     5897      +14     
  Branches        893      896       +3     
============================================
+ Hits           4006     4021      +15     
  Misses         1302     1302              
+ Partials        575      574       -1
Impacted Files Coverage Δ Complexity Δ
...che/livy/server/recovery/ZooKeeperStateStore.scala 77.27% <0%> (-1.8%) 17 <0> (ø)
...e/livy/server/interactive/InteractiveSession.scala 70% <100%> (+0.63%) 51 <0> (+5) ⬆️
...he/livy/server/recovery/FileSystemStateStore.scala 62.5% <66.66%> (-0.55%) 11 <1> (ø)
...la/org/apache/livy/server/batch/BatchSession.scala 86.73% <66.66%> (+0.56%) 14 <0> (ø) ⬇️
...ain/java/org/apache/livy/rsc/driver/RSCDriver.java 80.75% <0%> (+0.83%) 46% <0%> (+1%) ⬆️
...cala/org/apache/livy/scalaapi/ScalaJobHandle.scala 55.88% <0%> (+2.94%) 7% <0%> (ø) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 393c162...1a53b03. Read the comment docs.

Copy link
Contributor

@jerryshao jerryshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@jerryshao jerryshao closed this in 1c656aa Dec 19, 2019
@yiheng
Copy link
Contributor

yiheng commented Dec 19, 2019

+1

@runzhiwang runzhiwang deleted the remove-node-session-fail branch December 19, 2019 11:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants