Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-9877][Core] Fix StandaloneRestServer NPE when submitting application #8127

Closed
wants to merge 1 commit into from

Conversation

jerryshao
Copy link
Contributor

Detailed exception log can be seen in SPARK-9877, the problem is when creating StandaloneRestServer, self (masterEndpoint) is null. So this fix is creating StandaloneRestServer when self is available.

@SparkQA
Copy link

SparkQA commented Aug 12, 2015

Test build #40630 timed out for PR 8127 at commit fdb6158 after a configured wait of 175m.

@andrewor14
Copy link
Contributor

retest this please

@andrewor14
Copy link
Contributor

@jerryshao weird, I've been running this and it works for me. How did you reproduce it?

@jerryshao
Copy link
Contributor Author

@andrewor14 , I just started a local pseudo standalone cluster in my machine with master and worker in one machine, and then submitted a simple SparkPi application and then meet such exception as SPARK-9877 mentioned.

Here is the command line.

./bin/spark-submit --verbose --master spark://hw12100.local:6066 --deploy-mode cluster --class org.apache.spark.examples.SparkPi examples/target/scala-2.10/spark-examples-1.5.0-SNAPSHOT-hadoop2.6.0.jar

@jerryshao
Copy link
Contributor Author

I dumped the value of self when rest sever is created for debugging, it is null, seems the variable is not yet initialized.

@andrewor14
Copy link
Contributor

How did you set up your cluster? It works when I do the following:

# conf/slaves
localhost

$ sbin/stop-all.sh
$ sbin/start-all.sh
$ bin/spark-submit --master spark://...:6066 ...

@jerryshao
Copy link
Contributor Author

Hi @andrewor14 , yes, I testes with same way. Did you test with latest master? I tested using 1.4.1 release, seems no such exception.

@andrewor14
Copy link
Contributor

Ah, I just reproduced it! Do you know what caused this?

@andrewor14
Copy link
Contributor

Looks like that line was added in #5392

@jerryshao
Copy link
Contributor Author

I've already mentioned before, self (masterEndpoint) is null when creating StandaloneRestServer, so using this to submit application will meet NPE, need to delay the creation when self is initialized.

@jerryshao
Copy link
Contributor Author

Yeah, I think the new RPC framework may break the old code.

@andrewor14
Copy link
Contributor

I've bumped up the priority on this since this is a regression. LGTM will merge once tests pass. retest this please

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40721 has finished for PR 8127 at commit fdb6158.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40744 has finished for PR 8127 at commit fdb6158.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

retest this please. I just fixed a potential source of flakiness

@SparkQA
Copy link

SparkQA commented Aug 13, 2015

Test build #40790 has finished for PR 8127 at commit fdb6158.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

retest test test test this please

@jerryshao
Copy link
Contributor Author

The unit test seems so flaky :)

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1578 has finished for PR 8127 at commit fdb6158.

  • This patch fails SparkR unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1577 has finished for PR 8127 at commit fdb6158.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #1576 timed out for PR 8127 at commit fdb6158 after a configured wait of 175m.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #40824 timed out for PR 8127 at commit fdb6158 after a configured wait of 175m.

@jerryshao
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 14, 2015

Test build #40860 has finished for PR 8127 at commit fdb6158.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Great, merging into master 1.5. Thanks @jerryshao for catching this!

asfgit pushed a commit that referenced this pull request Aug 14, 2015
…ication

Detailed exception log can be seen in [SPARK-9877](https://issues.apache.org/jira/browse/SPARK-9877), the problem is when creating `StandaloneRestServer`, `self` (`masterEndpoint`) is null.  So this fix is creating `StandaloneRestServer` when `self` is available.

Author: jerryshao <sshao@hortonworks.com>

Closes #8127 from jerryshao/SPARK-9877.

(cherry picked from commit 9407baa)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 9407baa Aug 14, 2015
CodingCat pushed a commit to CodingCat/spark that referenced this pull request Aug 17, 2015
…ication

Detailed exception log can be seen in [SPARK-9877](https://issues.apache.org/jira/browse/SPARK-9877), the problem is when creating `StandaloneRestServer`, `self` (`masterEndpoint`) is null.  So this fix is creating `StandaloneRestServer` when `self` is available.

Author: jerryshao <sshao@hortonworks.com>

Closes apache#8127 from jerryshao/SPARK-9877.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants