Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Jun 17, 2025

What changes were proposed in this pull request?

This PR aims to add a streaming word count example, org.apache.spark.examples.streaming.HdfsWordCount.

In addition, this example will show a event log rolling feature which is enabled by default in Apache Spark 4.0.0.

spark.eventLog.enabled: "true"
spark.eventLog.dir: "s3a://spark-events/"
spark.eventLog.rolling.maxFileSize: "10m"
$ aws s3 --profile localstack ls s3://spark-events/ --recursive
2025-06-17 09:43:55          0 eventlog_v2_stream-word-count-0/
2025-06-17 09:43:55          0 eventlog_v2_stream-word-count-0/appstatus_stream-word-count-0.inprogress
2025-06-17 09:52:02    1957278 eventlog_v2_stream-word-count-0/events_1_stream-word-count-0.zstd

Why are the changes needed?

To show a streaming example with event log.

Does this PR introduce any user-facing change?

No behavior change because this is an example.

How was this patch tested?

Manual tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun dongjoon-hyun changed the title [SPARK-52513] Add a streaming word count example [SPARK-52513] Add a streaming word count example with rolling event logs Jun 17, 2025
@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . Merged to main.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-52513 branch June 17, 2025 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants