Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34659][UI] Forbid using keyword "proxy" or "history" in reverse proxy URL #36176

Closed
wants to merge 1 commit into from

Conversation

gengliangwang
Copy link
Member

@gengliangwang gengliangwang commented Apr 13, 2022

What changes were proposed in this pull request?

When the reverse proxy URL contains "proxy" or "history", the application ID in UI is wrongly parsed.
For example, if we set spark.ui.reverseProxyURL as "/test/proxy/prefix" or "/test/history/prefix", the application ID is parsed as "prefix" and the related API calls will fail in stages/executors pages:

.../api/v1/applications/prefix/allexecutors

instead of

.../api/v1/applications/app-20220413142241-0000/allexecutors

There are more contexts in #31774
We can fix this entirely like #36174, but it is risky and complicated to do that.

Why are the changes needed?

Avoid users setting keywords in reverse proxy URL and getting wrong UI results.

Does this PR introduce any user-facing change?

No

How was this patch tested?

A new unit test.
Also doc preview:
image

@gengliangwang
Copy link
Member Author

val proxyUrl = _conf.get(UI_REVERSE_PROXY_URL.key, "").stripSuffix("/") +
"/proxy/" + _applicationId
System.setProperty("spark.ui.proxyBase", proxyUrl)
val proxyUrl = _conf.get(UI_REVERSE_PROXY_URL).getOrElse("").stripSuffix("/")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to get the ConfigEntry to trigger the checkValue method.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @gengliangwang and @cloud-fan .

@dongjoon-hyun
Copy link
Member

The commit passed all tests here in @gengliangwang 's repo.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 3.4.0.

@PerilousApricot
Copy link

@dongjoon-hyun @cloud-fan Wait a second! This is exactly the wrong solution, and it means that it spark can't be proxied via Jupyter. Why was the original solution unceremoneously thrown out after 4 months of waiting. We could have at least discussed it further before deciding to do that.

@PerilousApricot
Copy link

@dongjoon-hyun It's extremely disappointing to be desperately waiting for a feature, get no response for months, have to plead with people to even take a look at it then have someone swoop in at the last moment and undoing all the work without even a discussion about it. Since this have been moved from 3.3 (the original ask) to 3.4, surely there's enough time to at least speak with people before doing this

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 13, 2022

Yes, we can add it back properly as you said. This is for Apache Spark 3.4 . BTW, are you sure the following? Are you using JupiterLab or the class Notebook?

it means that it spark can't be proxied via Jupyter.

@PerilousApricot
Copy link

PerilousApricot commented Apr 13, 2022 via email

@gengliangwang
Copy link
Member Author

@PerilousApricot Note that #31774 doesn't solve all the problems. If you read #36174, you will find that we need to handle the proxy/history keyword in the method createTemplateURI and createRESTEndPointForExecutorsPage?
This solution is not "wrong". It just prevents the confusing behavior if there is "proxy" or "history" keyword in the reverse proxy URL.
I would appreciate it if someone can point out what the problem is in JupiterLab. All I can get from #31774 doesn't help me understand it. IIUC JupiterLab URL has to be ".../proxy/ui_port"? If yes, can the proxy URL of JupiterLab be configurated? Is it possible we adjust JupiterLab to make it work with Spark instead?

@dongjoon-hyun
Copy link
Member

I also agree with @gengliangwang . From my side, I also asked the same question (like @gengliangwang ) to JupiterLab guy, but it seems that the limitation doesn't came from their code. I mean JupyterLab doesn't enforce proxy or history keyword from their code side.

JupiterLab URL has to be ".../proxy/ui_port"?

@PerilousApricot
Copy link

PerilousApricot commented Apr 14, 2022 via email

@PerilousApricot
Copy link

PerilousApricot commented Apr 14, 2022 via email

@PerilousApricot
Copy link

PerilousApricot commented Apr 14, 2022 via email

@gengliangwang
Copy link
Member Author

@PerilousApricot My major interest is in Spark SQL. I only work on UI in my spare time, after finishing my internal works and open source Spark SQL works. Besides Spark UI is working fine in the standalone mode.

On the dev list thread of the Spark 3.3 release, SPARK-34659 is mentioned. So I created this PR to prevent confusing behaviors. Code changes like #36174 are risky(or we can say it doesn't resolve all the issues), considering we are going to release 3.3.

Free feel to take over #36174, and please make sure you verified it working with the reverse proxy on a standalone cluster containing master/worker.

And you can ping @jasonli-db for a review who is focusing on Spark UI in Databricks.

@PerilousApricot
Copy link

PerilousApricot commented Apr 14, 2022

@gengliangwang I'm not sure I understand -- are you saying that someone else should've reviewed the PR in the first place? The intent (and what I was trying to do since December) was to get this into the 3.3 release. I don't use databricks, so is @jasonli-db still the person with the correct knowledge, or should I speak to someone else instead? Looking at their GH profile, I don't see any activity on the Spark repo. I honestly don't know at all how Spark's development model works, but I'm obviously doing something wrong here. Please let me know what I can do to satisfy your concerns, since it appears that at least for now you've taken ownership of this.

@cloud-fan
Copy link
Contributor

With my Apache hat on, I'd say there is no component owner in this community. A PR can be merged as long as there is one committer who supports it and no objections from others. In your case, it seems that there are not many interests from committers because the fix is so complicated.

As a Spark committer, I do feel the same with @gengliangwang and @dongjoon-hyun . The fact is that "proxy" and "history" are already reserved keywords in Spark UI. This PR just fails earlier and provides a better error message. We can still put more effort to make them not reserved keywords, but I'm not very confident about it and I'd like to see other solutions like making JupiterLab using a different word other than "proxy" in the Spark UI.

@PerilousApricot
Copy link

@cloud-fan with your two hats, would you at least concede that it would make sense to at least see the proposed solution before it's called "too complicated"? It seems a bit like putting the cart before the horse here to reject something that hasn't been written.

@cloud-fan
Copy link
Contributor

I did take a look at #36174 and I see code like var ind = words.indexOf("proxy");. It seems to me that some Spark code assumes "proxy" and "history" are reserved keywords and I'm not sure how many code blocks like this are out there. I don't think it's hard to fix it, but I think it's hard to find all these places that need to fix. In addition, reverse proxy and history server can bring more complexity as @gengliangwang mentioned.

To clarify, merging this PR does not mean rejecting the fixing PR. As I said before, this PR actually just improves the error message. You are welcome to continue the fix and find a committer to review it. Unfortunately, I'm not a Spark UI expert and I'm not confident to merge the fix. If you can explain the entire Spark UI stack and how your fix works, I believe it can make it easier for other committers to review the PR and give them more confidence to merge it.

@PerilousApricot
Copy link

Of course, the PR needs to be updated anyway for the cases that @gengliangwang found, I was speaking to not just your comment, but the others as well. To phrase it differently: "I'm going to update the PR, lets judge the complexity based on that"

You're critique about indexOf is well taken, and if that's the concern from @gengliangwang and @dongjoon-hyun, then I'm glad factorize the fix differently. I've repeatedly tried to emphasize that I'm willing and able to change the code to fit your criteria but, importantly, it's helpful to know what that criteria is. Now that I have your note about indexOf and @gengliangwang's comment in
#36174, I know what you're looking for and can change things accordingly.

With respect to..

To clarify, merging this PR does not mean rejecting the fixing PR

... then I misunderstood the situation, but to me, what I saw was a new PR disabling the functionality without commenting on the previous PR or giving an opportunity to fix the issues beforehand seemed like the door was being slammed on the previous PR.

I'll update #31774 with your comments and get back to you. Thanks!

@PerilousApricot
Copy link

PerilousApricot commented Oct 11, 2022 via email

@PerilousApricot
Copy link

PerilousApricot commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants