New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-34659][UI] Forbid using keyword "proxy" or "history" in reverse proxy URL #36176
Conversation
val proxyUrl = _conf.get(UI_REVERSE_PROXY_URL.key, "").stripSuffix("/") + | ||
"/proxy/" + _applicationId | ||
System.setProperty("spark.ui.proxyBase", proxyUrl) | ||
val proxyUrl = _conf.get(UI_REVERSE_PROXY_URL).getOrElse("").stripSuffix("/") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get the ConfigEntry to trigger the checkValue
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @gengliangwang and @cloud-fan .
The commit passed all tests here in @gengliangwang 's repo. |
Merged to master for Apache Spark 3.4.0. |
@dongjoon-hyun @cloud-fan Wait a second! This is exactly the wrong solution, and it means that it spark can't be proxied via Jupyter. Why was the original solution unceremoneously thrown out after 4 months of waiting. We could have at least discussed it further before deciding to do that. |
@dongjoon-hyun It's extremely disappointing to be desperately waiting for a feature, get no response for months, have to plead with people to even take a look at it then have someone swoop in at the last moment and undoing all the work without even a discussion about it. Since this have been moved from 3.3 (the original ask) to 3.4, surely there's enough time to at least speak with people before doing this |
Yes, we can add it back properly as you said. This is for Apache Spark 3.4 . BTW, are you sure the following? Are you using JupiterLab or the class Notebook?
|
Or, alternately, if you have specific things to change in #31774, can you
know what they are so they can be fixed and merged? I'm more than willing
to do it if that's what it takes to get it over the line, but I need more
guidance than "tricky" to be able to do that.--
It's dark in this basement.
|
@PerilousApricot Note that #31774 doesn't solve all the problems. If you read #36174, you will find that we need to handle the proxy/history keyword in the method |
I also agree with @gengliangwang . From my side, I also asked the same question (like @gengliangwang ) to JupiterLab guy, but it seems that the limitation doesn't came from their code. I mean JupyterLab doesn't enforce
|
I think the first confusion is that it's *JupyterHub* please see
#31774 (comment) . I was
asked to make a reproducer that didn't involve jupyter, which I did. Do I
understand correctly that you now would like me to make a second reproducer
to show how JupyterHub works?
--
It's dark in this basement.
|
IIUC JupiterLab URL has to be ".../proxy/ui_port"? If yes, can the proxy
URL of JupiterLab be configurated? Is it possible we adjust JupiterLab to
make it work with Spark instead?
No, it's not configurable, if it was, we wouldn't be here :) This whole
saga is two people pointing at each other saying the other one should fix
it, and the users are stuck in the middle. If you will point me to your
reservations then I'll fix the PR, but please at least give me a chance.
--
It's dark in this basement.
|
@gengliangwang <https://github.com/gengliangwang> I see #36174, but there
was like half an hour between when you wrote that and when the next commit
turning off the functionality completely happened, so I didn't quite have a
chance to update the PR in the interim (I was also asleep). I'll fix the
things you mention there for the correct fix for you to review
--
It's dark in this basement.
|
@PerilousApricot My major interest is in Spark SQL. I only work on UI in my spare time, after finishing my internal works and open source Spark SQL works. Besides Spark UI is working fine in the standalone mode. On the dev list thread of the Spark 3.3 release, SPARK-34659 is mentioned. So I created this PR to prevent confusing behaviors. Code changes like #36174 are risky(or we can say it doesn't resolve all the issues), considering we are going to release 3.3. Free feel to take over #36174, and please make sure you verified it working with the reverse proxy on a standalone cluster containing master/worker. And you can ping @jasonli-db for a review who is focusing on Spark UI in Databricks. |
@gengliangwang I'm not sure I understand -- are you saying that someone else should've reviewed the PR in the first place? The intent (and what I was trying to do since December) was to get this into the 3.3 release. I don't use databricks, so is @jasonli-db still the person with the correct knowledge, or should I speak to someone else instead? Looking at their GH profile, I don't see any activity on the Spark repo. I honestly don't know at all how Spark's development model works, but I'm obviously doing something wrong here. Please let me know what I can do to satisfy your concerns, since it appears that at least for now you've taken ownership of this. |
With my Apache hat on, I'd say there is no component owner in this community. A PR can be merged as long as there is one committer who supports it and no objections from others. In your case, it seems that there are not many interests from committers because the fix is so complicated. As a Spark committer, I do feel the same with @gengliangwang and @dongjoon-hyun . The fact is that "proxy" and "history" are already reserved keywords in Spark UI. This PR just fails earlier and provides a better error message. We can still put more effort to make them not reserved keywords, but I'm not very confident about it and I'd like to see other solutions like making JupiterLab using a different word other than "proxy" in the Spark UI. |
@cloud-fan with your two hats, would you at least concede that it would make sense to at least see the proposed solution before it's called "too complicated"? It seems a bit like putting the cart before the horse here to reject something that hasn't been written. |
I did take a look at #36174 and I see code like To clarify, merging this PR does not mean rejecting the fixing PR. As I said before, this PR actually just improves the error message. You are welcome to continue the fix and find a committer to review it. Unfortunately, I'm not a Spark UI expert and I'm not confident to merge the fix. If you can explain the entire Spark UI stack and how your fix works, I believe it can make it easier for other committers to review the PR and give them more confidence to merge it. |
Of course, the PR needs to be updated anyway for the cases that @gengliangwang found, I was speaking to not just your comment, but the others as well. To phrase it differently: "I'm going to update the PR, lets judge the complexity based on that" You're critique about indexOf is well taken, and if that's the concern from @gengliangwang and @dongjoon-hyun, then I'm glad factorize the fix differently. I've repeatedly tried to emphasize that I'm willing and able to change the code to fit your criteria but, importantly, it's helpful to know what that criteria is. Now that I have your note about indexOf and @gengliangwang's comment in With respect to..
... then I misunderstood the situation, but to me, what I saw was a new PR disabling the functionality without commenting on the previous PR or giving an opportunity to fix the issues beforehand seemed like the door was being slammed on the previous PR. I'll update #31774 with your comments and get back to you. Thanks! |
@gengliangwang <https://github.com/gengliangwang> I'm not sure I understand
-- are you saying that someone else should've reviewed the PR in the first
place? The intent (and what I was trying to do since December) was to get
this into the 3.3 release. I don't use databricks, so is @jasonli-db
<https://github.com/jasonli-db> still the person with the correct
knowledge, or should I speak to someone else instead? I honestly don't know
at all how Spark's development model works, but I'm obviously doing
something wrong here. Please let me know what I can do to satisfy your
concerns, since it appears that at least for now you've taken ownership of
this.
--
It's dark in this basement.
|
Yes, it's an issue for years now, see
jupyterhub/jupyter-server-proxy#57 .. it
influences being able to implement functionality like more-functional
Jupyter extensions that can query/display the status of jobs inline, and
especially since Dask has a very nice JupyterLab extension, it would be
good to get this blocker out of the way so that the equivalent can be
developed for Spark.
Can we at least have the initial #31774
PR for 3.3 so *something* works while we wait to get a real solution after
all the wait? Then we can code the total fix against the 3.4 branch.
… Message ID: ***@***.***>
|
What changes were proposed in this pull request?
When the reverse proxy URL contains "proxy" or "history", the application ID in UI is wrongly parsed.
For example, if we set spark.ui.reverseProxyURL as "/test/proxy/prefix" or "/test/history/prefix", the application ID is parsed as "prefix" and the related API calls will fail in stages/executors pages:
instead of
There are more contexts in #31774
We can fix this entirely like #36174, but it is risky and complicated to do that.
Why are the changes needed?
Avoid users setting keywords in reverse proxy URL and getting wrong UI results.
Does this PR introduce any user-facing change?
No
How was this patch tested?
A new unit test.
Also doc preview: