-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-2017] [SPARK-2016] Web UI responsiveness with large numbers of tasks/partitions #1682
Conversation
Can one of the admins verify this patch? |
…css for tables and that speeds up rendering a lot
Identified the bottleneck in the rendering of the page containing the data tables. It was related to the css styling used (bootstrap with some nth-child(odd)). Added a custom style instead and now all rendering is much faster. See Jira SPARK-2017 for more details |
…ed rending using javascript by the web ui
I added a configuration property "spark.ui.jsRenderingEnabled" that controls whether the rendering of the tables happens using Javascript or not. It is enable by default. This ensures that people that cannot or do not want to run javascript to do the rendering, they can use the web ui as before. |
It seems like having two different table rendering techniques, server-side HTML and client-side Javascript, could become a maintenance / complexity burden. Do you think it's important for the UI to work without Javascript? It could be important for tools that scrape the web UI, but it would be better if those tools consumed JSON data instead. Personally, I'd be a fan of delivering a stable JSON API and rewriting the web UI as a Javascript application that consumes data from those endpoints, but I'm open to other opinions. |
Also, do you mind editing the title of this PR so that it's tracked correctly by our review tools? Something like |
Can one of the admins verify this patch? |
FWIW I run Spark in AWS and my ops team requires that all web interfaces exposed through the proxy into the enclave have both SSL and user auth. Spark doesn't support those, so for now I'm making heavy use of the webui via the links command line browser. All that to say, yes there are users accessing the interface via no-Javascript browsers. I can understand though if you make the web interface use JS and expose a REST API since that's a more generally attractive setup for checking status on a cluster. |
@ash211 This weekend, I'm actually working on writing a design document for web UI improvements in Spark 1.2. SSL encryption, authentication, and ACLs are all features that I'm planning to put on the roadmap. Do you have SSH to your EC2 machines? One option is to use a SSH proxy to view the full web UI in your browser. Once you've set up the proxy, you can use a browser plugin like FoxyProxy to seamlessly proxy requests for the UI. |
Hi, I have updated the title of the pull request and make sure it is mergable The reason I added the env variable ""spark.ui.jsRenderingEnabled" and Later, anybody that relies on not using javascript, should access the info Right now I see this Pull Request as a working proof of concept of what the
[ { We could use a much compact format where the first line are the names Also we need to include tests for the JSON interface.
In the JSON interface, we should include a parameter that tells you
When we request the data, the server should do the sorting. That is, the JSON api should receive a parameter telling the server Let me know what you think about the points above. On Sat, Sep 6, 2014 at 3:44 PM, Josh Rosen notifications@github.com wrote:
|
@JoshRosen You mentioned above that you are working on a design document for web UI improvements. Are you going to post it anywhere? I would like to take all the above and help make it happen. |
@carlosfuertes I'm working on a draft this weekend; I'll make a post to the developers mailing list sometime next week once I've gotten the basics fleshed out. Is the duplication of the keys a significant problem in practice? If we configure the API's HTTP server to use gzip compression, I think the API responses still shouldn't be too big. If we do choose to remove this redundancy, would it make more sense to return the data in CSV or TSV format? You've raised some good questions about sorting, pagination, etc. I think we should address these in a consistent way throughout the API. Are there some existing JSON APIs that you think made good decisions for these features? |
@JoshRosen I can imagine that avoiding duplication of keys can save easily 50% or more. If we are aiming at big data sizes that can matter a lot. In the JIRA post I explain how I was seeing already 15Mb data sizes with just 50,000 jobs. Gzip can make sense but if you start with something that is already 50% smaller that's even better. In any case that is very simple to test and benchmark (with this PR for example is very simple) and an optimization at the end of the day. To be very concrete, something like [ {"key1": row1_value1, "key2": row1_value2}, versus using { meta: { keys: [ "key1", "key2" ] }, I would stick to using JSON and not csv or tsv, since JSON interacts extremely well with javascript (it was design to), and pretty much anything, and you have the flexibility to add other meta information and parse it. I have seen some HTTP APIs that incorporate pagination and so forth (ex. steam web page) but I do not have a particular one in mind freely available in the wild... I'm thinking that talking about JSON API is a bit confusing and it would be better to refer to the HTTP API (which returns JSON for some calls). I think it may make sense in the doc that you are working on to delineate the full (RESTful) HTTP API for the WebUI rather than just the JSON part so that the global picture and design is clear. |
I've opened SPARK-3644 as a forum for discussing the design of a REST API; sorry for the delay (got busy with other work / bug fixing). |
…aster` (apache#1682) ### What changes were proposed in this pull request? This PR aims to support a driver-only K8s Spark job. ### Why are the changes needed? Some workloads like SQL DDL operations can be executed by driver only. ### Does this PR introduce _any_ user-facing change? No. This is a new feature. ### How was this patch tested? Pass the CIs with the newly add test case.
I address here issues SPARK-2017 and SPARK-2016 by serving the data for the tables under Spark UI web interface as JSON for later rendering, and using javascript in the browser to build the tables from an ajax request on that JSON.
Main addition is exposing paths with the JSON data as:
/stages/stage/tasks/json/?id=nnn&attempt=mmm
/storage/json
/storage/rdd/workers/json?id=nnn
/storage/rdd/blocks/json?id=nnn
I also add a new env variable "spark.ui.jsRenderingEnabled" (true by default) which controls whether to use js to do rendering or not for backward compatibility for people who cannot use js.