Skip to content

Conversation

@zsxwing
Copy link
Member

@zsxwing zsxwing commented Nov 10, 2014

In yarn-cluster mode, the Web UI is running behind a yarn proxy server. Some features(or bugs?) of yarn proxy server will break the links for thread dump.

  1. Yarn proxy server will do http redirect internally, so if opening http://example.com:8088/cluster/app/application_1415344371838_0012/executors, it will fetch http://example.com:8088/cluster/app/application_1415344371838_0012/executors/ and return the content but won't change the link in the browser. Then when a user clicks Thread Dump, it will jump to http://example.com:8088/proxy/application_1415344371838_0012/threadDump/?executorId=2. This is a wrong link. The correct link should be http://example.com:8088/proxy/application_1415344371838_0012/executors/threadDump/?executorId=2.

Adding "/" to the tab links will fix it.

  1. Yarn proxy server has a bug about the URL encode/decode. When a user accesses http://example.com:8088/proxy/application_1415344371838_0006/executors/threadDump/?executorId=%3Cdriver%3E, the yarn proxy server will require http://example.com:36429/executors/threadDump/?executorId=%25253Cdriver%25253E. But Spark web server expects http://example.com:36429/executors/threadDump/?executorId=%3Cdriver%3E. Related to YARN-2844.

For now, it's a tricky approach to bypass the yarn bug.

threaddump

@zsxwing zsxwing changed the title Fix link issue of the executor thread dump page in yarn-cluster mode [SPARK-4313][WebUI][Yarn] Fix link issue of the executor thread dump page in yarn-cluster mode Nov 10, 2014
@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23145 has started for PR 3183 at commit abfa063.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 10, 2014

Test build #23145 has finished for PR 3183 at commit abfa063.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23145/
Test PASSed.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 12, 2014

/cc @JoshRosen @andrewor14

@JoshRosen
Copy link
Contributor

Hey, sorry for the late review. I independently stumbled across a similar issue and opened #3236 for this, but it looks like your PR might subsume mine (or the other way around).

Is the YARN issue due to us not encoding the querystring properly in the first place? Could you see whether my PR fixes your issue?

@JoshRosen
Copy link
Contributor

/cc @henrydavidge, you might want to take a look at this PR, too, since it solves a related problem on YARN.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 13, 2014

Is the YARN issue due to us not encoding the querystring properly in the first place? Could you see whether my PR fixes your issue?

I think it's not a problem of Spark. The problem is Yarn proxy server does not encode the query string properly so the Spark HTTP server receives a wrong request.

Just like in the screenshot, request.getParameter("executorId") will return %253Cdriver%253E and we need to decode it twice to get the real string. That's why I use a while loop to fix it.

@JoshRosen
Copy link
Contributor

Ah, makes sense. That's a really annoying bug!

@henrydavidge
Copy link
Contributor

Yeah, it seems that yarn is encoding the encoded url (%25 is the encoding of %). This pr could introduce bugs later on though if we for some reason want to use urls with % in them.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 13, 2014

Yeah, it seems that yarn is encoding the encoded url (%25 is the encoding of %). This pr could introduce bugs later on though if we for some reason want to use urls with % in them.

Agree. But now due to this Yarn bug, I cannot find a better approach to bypass it. Any thoughts?

@henrydavidge
Copy link
Contributor

We could do something fancy to ensure that we don't consider % characters that are the result of a decoding, but I don't think that's necessary. It's unlikely we'll ever want to have % in any internal urls. So I think this is ok.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 13, 2014

Due to YARN-2844, Yarn proxy server will encode the real url twice. In other modes or the future Yarn version after YARN-2844 is fixed, the url should be correct.

Now this PR does assume there is not "%" in the link and decode the content directly. If we want to solve it perfectly, we need to be able to distinguish the yarn version has this bug and other cases. That would be too complex.

So for now, I think we can warn developers not to use "%" in the url.

@JoshRosen
Copy link
Contributor

I think that executor IDs are assigned by the cluster manager, so in most cases you shouldn't wind up with % in the executor IDs because they're usually based off of hostnames.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23298 has started for PR 3183 at commit 3379ca8.

  • This patch merges cleanly.

@zsxwing
Copy link
Member Author

zsxwing commented Nov 13, 2014

@JoshRosen is there some place to warn others about the Yarn issue. I think other WebUI work in the future may be trapped in this url issue.

@SparkQA
Copy link

SparkQA commented Nov 13, 2014

Test build #23298 has finished for PR 3183 at commit 3379ca8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/23298/
Test PASSed.

@JoshRosen
Copy link
Contributor

I've tested this out locally and it works.

@zsxwing In terms of adding warnings about this, the right place would probably be wherever we allocate executorIds. If there was an explicit ExecutorId type, then we could add documentation to that type. I'd be fine with adding those warnings in a separate commit, though, since I don't think there's a great place to do this now. (Also, don't think that any of the existing implementations should return executor ids that contain %).

@zsxwing
Copy link
Member Author

zsxwing commented Nov 14, 2014

@JoshRosen not only executor id, but also any string will appear in the URL, should pay attention to %. However, I don't know a proper place to add such general docs. Other suggestion to this PR, or it's fine to merge?

@andrewor14
Copy link
Contributor

LGTM merge 1.2 master

asfgit pushed a commit that referenced this pull request Nov 14, 2014
…page in yarn-cluster mode

In yarn-cluster mode, the Web UI is running behind a yarn proxy server. Some features(or bugs?) of yarn proxy server will break the links for thread dump.

1. Yarn proxy server will do http redirect internally, so if opening `http://example.com:8088/cluster/app/application_1415344371838_0012/executors`, it will fetch `http://example.com:8088/cluster/app/application_1415344371838_0012/executors/` and return the content but won't change the link in the browser. Then when a user clicks `Thread Dump`, it will jump to `http://example.com:8088/proxy/application_1415344371838_0012/threadDump/?executorId=2`. This is a wrong link. The correct link should be `http://example.com:8088/proxy/application_1415344371838_0012/executors/threadDump/?executorId=2`.

Adding "/" to the tab links will fix it.

2. Yarn proxy server has a bug about the URL encode/decode. When a user accesses `http://example.com:8088/proxy/application_1415344371838_0006/executors/threadDump/?executorId=%3Cdriver%3E`, the yarn proxy server will require `http://example.com:36429/executors/threadDump/?executorId=%25253Cdriver%25253E`. But Spark web server expects `http://example.com:36429/executors/threadDump/?executorId=%3Cdriver%3E`. Related to [YARN-2844](https://issues.apache.org/jira/browse/YARN-2844).

For now, it's a tricky approach to bypass the yarn bug.

![threaddump](https://cloud.githubusercontent.com/assets/1000778/4972567/d1ccba64-68ad-11e4-983e-257530cef35a.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #3183 from zsxwing/SPARK-4313 and squashes the following commits:

3379ca8 [zsxwing] Encode the executor id in the thread dump link and update the comment
abfa063 [zsxwing] Fix link issue of the executor thread dump page in yarn-cluster mode

(cherry picked from commit 156cf33)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in 156cf33 Nov 14, 2014
@zsxwing zsxwing deleted the SPARK-4313 branch November 15, 2014 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants