-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29543][SS][UI]Structured Streaming Web UI #26201
Conversation
Test build #112437 has finished for PR 26201 at commit
|
Test build #112440 has finished for PR 26201 at commit
|
Currently, the streaming query page entry is in |
Test build #112444 has finished for PR 26201 at commit
|
retest this please. |
Amazing work! I've seen multiple requests on having tab for structured streaming be equivalent to streaming tab, and happy to see this effort. I've just skimmed through the code (don't know about front-end tech.) and have general comment:
|
@HeartSaVioR Thanks for your feedback
This is a great suggestion. I agree with you. But IMHO, we can do it in a separate PR after/before this.
Basically, streaming query is also a sql query, so it is reasonable to place the entry in
I will add some UTs
Sure, I will add them. |
Thank you, @uncleGen . Nice UI. I'm very sorry that 3.0-preview misses this. |
Test build #112446 has finished for PR 26201 at commit
|
Actually my point was, it seems to be a bit weird to make SQL event listener to be couple with other stuff. It's an event listener, and the best way to ingest the data into that listener is sending event. I'm not saying we should enable this in SHS right now, but it will be available smoothly if we follow the general way to ingest the data.
I meant the graph between streaming tab and here would be quite similar, so curious it could be reused, maybe with refactoring if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in skimmed review.
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLTab.scala
Outdated
Show resolved
Hide resolved
Get it. If so, we should do some refactoring work, since these two code path are located in two different packages. I do not mind to do this work in a follow-up PR. What is your opinion?
Sorry, I do not get what you mean. Could you please point out the codebase where you are referring |
Uh, I meant the query information like name, query ID, etc, as statistic page doesn't show any information on query. |
I'm OK with it. Let's reconsider if someone claims again.
Sorry my bad. I was confused SQLTab with SQL listener. I'd still suggest looping through SQL listener and access through SQLAppStore, but that's OK as it is once we are OK with supporting it only for Spark UI (not SHS). |
I think it's a good addition. Started to catch up and testing... |
Test build #112457 has finished for PR 26201 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look a bit but stopped it as I wouldn't like to make myself continuously ask where the code comes from. I hope there're comments everywhere whenever it copied code from somewhere. It would save reviewers' time significantly.
Bunch of codes seem to be copied from streaming - while we review this, we might be better to find the way how to reuse them. It wouldn't be a blocker for this PR, but doing it in prior would even save times for reviewing, helps everyone.
sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/UIUtils.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/UIUtils.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
Outdated
Show resolved
Hide resolved
Test build #116682 has finished for PR 26201 at commit
|
Test build #116681 has finished for PR 26201 at commit
|
Today, I will give a commit to fix ut failure. |
Test build #116744 has finished for PR 26201 at commit
|
Test build #116899 has finished for PR 26201 at commit
|
Kindly reminder to @sarutak @dongjoon-hyun and @zsxwing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach looks good! Left some minor comments. Regarding the UI, it's better to make the following changes to make this page consistent with others:
Query Name
->Name
.Submit Time
->Submitted
.Last Batch Id
->Latest Batch
.
import org.apache.spark.sql.streaming.ui.UIUtils._ | ||
import org.apache.spark.ui.{GraphUIData, JsCollector, UIUtils => SparkUIUtils, WebUIPage} | ||
|
||
class StreamingQueryStatisticsPage(parent: StreamingQueryTab) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's make this a package private class.
import org.apache.spark.sql.streaming.ui.UIUtils._ | ||
import org.apache.spark.ui.{UIUtils => SparkUIUtils, WebUIPage} | ||
|
||
class StreamingQueryPage(parent: StreamingQueryTab) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's make this a package private class
* data to show. | ||
*/ | ||
lazy val streamingQueryStatusListener: Option[StreamingQueryStatusListener] = | ||
if (conf.get(UI_ENABLED)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add one more flag spark.sql.streaming.ui.enabled
for SS UI separately? Since we are adding one more streaming listener, it would be better if there is a flag to turn if off separately rather than turning off the whole Spark UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copy that.
* UI data for both active and inactive query. | ||
* TODO: Add support for history server. | ||
*/ | ||
class StreamingQueryStatusListener(sqlConf: SQLConf) extends StreamingQueryListener { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please make this package private.
import org.apache.spark.internal.Logging | ||
import org.apache.spark.ui.{SparkUI, SparkUITab} | ||
|
||
class StreamingQueryTab(val statusListener: StreamingQueryStatusListener, sparkUI: SparkUI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to keep this public cause it was called in SharedStatus, also StreamingQueryStatusListener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant adding private[streaming]
or private[sql]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for clarifying, done in uncleGen#22
parent.addStaticHandler(StreamingQueryTab.STATIC_RESOURCE_DIR, "/static/sql") | ||
} | ||
|
||
object StreamingQueryTab { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
queryStatus.updateProcess(event.progress, streamingProgressRetention) | ||
} | ||
|
||
override def onQueryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to lock the whole method. There is a race condition that allQueryStatus
may miss a query that's being removed from activeQueryStatus
and added to inactiveQueryStatus
. Since we are locking the whole method. It's better to use synchronized
directly rather than inactiveQueryStatus.synchronized
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, use synchronized directly in the next commit.
* @since 2.1.0 | ||
*/ | ||
@Evolving | ||
class QueryStartedEvent private[sql]( | ||
val id: UUID, | ||
val runId: UUID, | ||
val name: String) extends Event | ||
val name: String, | ||
val submitTime: Long) extends Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use submissionTime
to make it consistent with other places.
Thanks for reviewing @zsxwing and thanks for your help when i was busy with other things @xuanyuanking |
Test build #117413 has finished for PR 26201 at commit
|
Could you check the test failure on the build 117413? Looks to be related. |
Test build #117435 has finished for PR 26201 at commit
|
retest this please |
Test build #117516 has finished for PR 26201 at commit
|
LGTM. Thanks! Merging to master. |
Finally! Thanks again for the amazing effort to get this done. This has been missed one. |
### What changes were proposed in this pull request? There is a minor issue in #26201 In the streaming statistics page, there is such error ``` streaming-page.js:211 Uncaught TypeError: Cannot read property 'top' of undefined at SVGCircleElement.<anonymous> (streaming-page.js:211) at SVGCircleElement.__onclick (d3.min.js:1) ``` in the console after clicking the timeline graph. ![image](https://user-images.githubusercontent.com/1097932/76479745-14b26280-63ca-11ea-9079-0065321795f9.png) This PR is to fix it. ### Why are the changes needed? Fix the error of javascript execution. ### Does this PR introduce any user-facing change? No, the error shows up in the console. ### How was this patch tested? Manual test. Closes #27883 from gengliangwang/fixSelector. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request? There is a minor issue in #26201 In the streaming statistics page, there is such error ``` streaming-page.js:211 Uncaught TypeError: Cannot read property 'top' of undefined at SVGCircleElement.<anonymous> (streaming-page.js:211) at SVGCircleElement.__onclick (d3.min.js:1) ``` in the console after clicking the timeline graph. ![image](https://user-images.githubusercontent.com/1097932/76479745-14b26280-63ca-11ea-9079-0065321795f9.png) This PR is to fix it. ### Why are the changes needed? Fix the error of javascript execution. ### Does this PR introduce any user-facing change? No, the error shows up in the console. ### How was this patch tested? Manual test. Closes #27883 from gengliangwang/fixSelector. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com> (cherry picked from commit 0f46325) Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
### What changes were proposed in this pull request? There is a minor issue in apache#26201 In the streaming statistics page, there is such error ``` streaming-page.js:211 Uncaught TypeError: Cannot read property 'top' of undefined at SVGCircleElement.<anonymous> (streaming-page.js:211) at SVGCircleElement.__onclick (d3.min.js:1) ``` in the console after clicking the timeline graph. ![image](https://user-images.githubusercontent.com/1097932/76479745-14b26280-63ca-11ea-9079-0065321795f9.png) This PR is to fix it. ### Why are the changes needed? Fix the error of javascript execution. ### Does this PR introduce any user-facing change? No, the error shows up in the console. ### How was this patch tested? Manual test. Closes apache#27883 from gengliangwang/fixSelector. Authored-by: Gengliang Wang <gengliang.wang@databricks.com> Signed-off-by: Gengliang Wang <gengliang.wang@databricks.com>
What changes were proposed in this pull request?
This PR adds two pages to Web UI for Structured Streaming:
Input Rate
,Process Rate
,Input Rows
,Batch Duration
andOperation Duration
Why are the changes needed?
It helps users to better monitor Structured Streaming query.
Does this PR introduce any user-facing change?
No
How was this patch tested?