Skip to content

Conversation

@fedimser
Copy link
Contributor

@fedimser fedimser commented Nov 21, 2025

What changes were proposed in this pull request?

  • Added TwsTester, a test helper for writing unit tests for StatefulProcessor implementations that will be used in TrwansformWithState operator in streaming queries. It processes input rows and returns output rows equivalent to those that would be produced by the processor in an actual Spark streaming query.
  • Supported functionality:
    • Processing input rows and producing output rows via test().
    • Initial state setup via constructor parameter.
    • Direct state manipulation via setValueState, setListState, setMapState.
    • Direct state inspection via peekValueState, peekListState, peekMapState.
  • Language support:
    • Scala.
    • PySpark. In PySpark, TwsTester supports both Row and Pandas StatefulProcessors (the one is used in transformWithState, the other is used in transformWithStateInPandas).
  • Not Supported:
    • Timers: only TimeMode.None is supported. If the processor attempts to register or use timers, an error is thrown.
    • TTL: State TTL configurations are ignored. All state persists indefinitely.

Why are the changes needed?

Some users requested unit testing functionality for TWS.

Does this PR introduce any user-facing change?

Yes, it adds new public API to Spark:

  • org.apache.spark.sql.streaming.TwsTester in Scala.
  • pyspark.sql.streaming.Twstester in PySpark.

How was this patch tested?

Added unit and end-to-end tests in this PR. End-to-end tests compare TwsTester output with results of a real streaming query.

Was this patch authored or co-authored using generative AI tooling?

Yes. Cursor with claude-4.5-sonnet was used to assist with coding and generate some of documentation and tests.
Generated-by: claude-4.5-sonnet

- Add TwsTester.scala: Main testing framework for TransformWithState
- Add InMemoryStatefulProcessorHandleImpl.scala: In-memory implementation for testing
- Add TwsTesterSuite.scala: Test suite for TwsTester framework
- Add processors: EventTimeWindow, MultiTimer, RunningCount, SessionTimeout, TopK, WordFrequency
@fedimser fedimser changed the title tws-tester-3 [SPARK-54122] [WIP] Implement TwsTester Nov 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant