Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9458] Unable to recover from job failure on YARN with NPE #6101

Closed
wants to merge 1 commit into from

Conversation

yanghua
Copy link
Contributor

@yanghua yanghua commented May 30, 2018

What is the purpose of the change

This pull request fixed a NPE when recover job on YARN

Brief change log

  • *Add non-null judge in the expression *

Verifying this change

This change is already covered by existing tests, such as CoLocationConstraintTest.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@yanghua
Copy link
Contributor Author

yanghua commented May 30, 2018

cc @zentol

@kkrugler
Copy link
Contributor

I'm wondering why, now, we're encountering cases where the sharedSlot value is null? Seems like this could be caused by a deeper problem somewhere, so just adding the null check is masking something else that should be fixed.

Also, seems like we'd want a test case to verify the failure (pre-fix) and then appropriate behavior with the fix.

@kkrugler
Copy link
Contributor

One other note - I ran into this problem, but it wasn't on YARN. It was running locally (via unit test triggered in Eclipse).

@tillrohrmann
Copy link
Contributor

Could you please close this PR @yanghua. It is now subsumed by #6119.

@yanghua
Copy link
Contributor Author

yanghua commented Jun 5, 2018

@tillrohrmann OK, closing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants