Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48102][SS] Track duration for acquiring source/sink metrics while reporting streaming query progress #46350

Closed
wants to merge 1 commit into from

Conversation

anishshri-db
Copy link
Contributor

@anishshri-db anishshri-db commented May 3, 2024

What changes were proposed in this pull request?

Track duration for acquiring source/sink metrics while reporting streaming query progress

Why are the changes needed?

Change needed to help us understand how long the source/sink progress metrics calculation is taking. Also need to understand distribution if multiple sources are used

Sample log:

17:26:14.769 INFO org.apache.spark.sql.execution.streaming.MicroBatchExecutionContext: Extracting source progress metrics for source=MemoryStream[value#636] took duration_ms=0
17:26:14.769 INFO org.apache.spark.sql.execution.streaming.MicroBatchExecutionContext: Extracting sink progress metrics for sink=MemorySink took duration_ms=0

Existing test:

[info] Run completed in 9 seconds, 995 milliseconds.
[info] Total number of tests run: 11
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 11, failed 0, canceled 0, ignored 1, pending 0
[info] All tests passed.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-48102] Track duration for acquiring source/sink metrics while reporting streaming query progress [SPARK-48102][SS] Track duration for acquiring source/sink metrics while reporting streaming query progress May 3, 2024
@anishshri-db
Copy link
Contributor Author

cc - @HeartSaVioR - PTAL, thx !

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…ile reporting streaming query progress

### What changes were proposed in this pull request?
Track duration for acquiring source/sink metrics while reporting streaming query progress

### Why are the changes needed?
Change needed to help us understand how long the source/sink progress metrics calculation is taking. Also need to understand distribution if multiple sources are used

Sample log:
```
17:26:14.769 INFO org.apache.spark.sql.execution.streaming.MicroBatchExecutionContext: Extracting source progress metrics for source=MemoryStream[value#636] took duration_ms=0
17:26:14.769 INFO org.apache.spark.sql.execution.streaming.MicroBatchExecutionContext: Extracting sink progress metrics for sink=MemorySink took duration_ms=0
```

Existing test:
```
[info] Run completed in 9 seconds, 995 milliseconds.
[info] Total number of tests run: 11
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 11, failed 0, canceled 0, ignored 1, pending 0
[info] All tests passed.
```

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing unit tests

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#46350 from anishshri-db/task/SPARK-48102.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants