New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-46911][SS] Adding deleteIfExists operator to StatefulProcessorHandleImpl #44903
Conversation
...re/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala
Outdated
Show resolved
Hide resolved
...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
Outdated
Show resolved
Hide resolved
...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
Outdated
Show resolved
Hide resolved
...re/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala
Outdated
Show resolved
Hide resolved
...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala
Show resolved
Hide resolved
f5e41a1
to
d63aeee
Compare
80696e2
to
db52b3d
Compare
sql/api/src/main/scala/org/apache/spark/sql/streaming/StatefulProcessorHandle.scala
Outdated
Show resolved
Hide resolved
...re/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulProcessorHandleImpl.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala
Outdated
Show resolved
Hide resolved
cc @HeartSaVioR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nits. I haven't looked into the test change in detail and I may probably revisit the code, but I can sign off without revisiting as @anishshri-db reviewed that part already.
...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/scala/org/apache/spark/sql/streaming/TransformWithStateSuite.scala
Outdated
Show resolved
Hide resolved
@transient private var _countState: ValueState[Long] = _ | ||
@transient private var _mostRecent: ValueState[String] = _ | ||
@transient var _processorHandle: StatefulProcessorHandle = _ | ||
|
||
override def init( | ||
handle: StatefulProcessorHandle, | ||
outputMode: OutputMode) : Unit = { | ||
handle: StatefulProcessorHandle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 4 spaces for method params. https://github.com/databricks/scala-style-guide?tab=readme-ov-file#indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
4f38d57
to
0f2bd41
Compare
cc @HeartSaVioR, addressed all comments |
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
GA failure is only from docker integration test which is unrelated. |
Thanks! Merging to master. |
What changes were proposed in this pull request?
Adding the
deleteIfExists
method to theStatefulProcessorHandle
in order to remove state variables from the State Store. Implemented only for RocksDBStateStoreProvider, as we do not currently support multiple column families for HDFS.Why are the changes needed?
This functionality is needed to so users can remove state from the state store from the StatefulProcessorHandleImpl
Does this PR introduce any user-facing change?
Yes - this functionality (removing column families) was previously not supported from our RocksDB client.
How was this patch tested?
Added a unit test that creates two streams with the same checkpoint directory. The second stream removes state that was created in the first stream upon initialization. We ensure that the state from the previous stream isn't kept.
Was this patch authored or co-authored using generative AI tooling?