Skip to content

Comments

[SPARK-38234] [SQL] [SS] Added structured streaming monitoring APIs.#35547

Closed
yeskarthik wants to merge 1 commit intoapache:branch-3.1from
yeskarthik:hs-structured-streaming-api-3.1
Closed

[SPARK-38234] [SQL] [SS] Added structured streaming monitoring APIs.#35547
yeskarthik wants to merge 1 commit intoapache:branch-3.1from
yeskarthik:hs-structured-streaming-api-3.1

Conversation

@yeskarthik
Copy link
Contributor

What changes were proposed in this pull request?

Add two monitoring REST API endpoints under SQL that provides details about the structured streaming queries. Even though a store exists for it and the data is presented in the UI under the "Structured Streaming" tab, this data is not exposed as a REST API.

Summary API

Returns the summary of all existing streaming queries.

GET /{appId}/sql/streamingqueries

Response is list of StreamingQueryData.

Progress API

Returns the progress events of a specific streaming query by runId. User can also specify how many of the most recent events needs to be retrieved by using the last query parameter. By default, we return the most recent progress event i.e. last is set to 1.

GET /{appId}/sql/streamingqueries/{runId}/progress?last={N}

Response is list of StreamingQueryProgress.

Note: We are not introducing new object definitions for the response since we are just returning the data from the store without aggregation, these are existing event structures - StreamingQueryData and StreamingQueryProgress.

Why are the changes needed?

This data can be used for monitoring, detecting streaming and to build custom dashboards. This monitoring API will be similar to the monitoring APIs that are present for DStreams - refer SPARK-18470.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • Added 4 tests under SqlResourceWithActualMetricsSuite to test the endpoints when streaming code is executed. Tested various cases, parameters, boundary conditions
    image

  • Added 2 tests under SqlResourceInvalidEndpointSuite.scala to test the endpoints when no streaming code is running.
    image

  • As mentioned earlier, since we are just returning the objects from the store, there is no need to test them individually.

@yeskarthik yeskarthik changed the title [SPARK-38234] [SQL] [SS] Added streaming APIs and corresponding tests. [SPARK-38234] [SQL] [SS] Added structured streaming monitoring APIs. Feb 16, 2022
@yeskarthik yeskarthik marked this pull request as ready for review February 16, 2022 23:08
@yeskarthik
Copy link
Contributor Author

yeskarthik commented Feb 16, 2022

@HeartSaVioR @uncleGen @dongjoon-hyun @xuanyuanking @zsxwing can you please take a look at this PR? Thanks

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @yeskarthik .

However, Apache Spark has a versioning policy which doesn't allow a new feature or improvement on the released versions.

You can open a PR to master branch for Apache Spark 3.3. Thanks!

@yeskarthik
Copy link
Contributor Author

yeskarthik commented Feb 17, 2022

@dongjoon-hyun thank you! I will make a PR against master.

@dongjoon-hyun
Copy link
Member

Thank you, @yeskarthik !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants