Skip to content

Conversation

@anishshri-db
Copy link
Contributor

@anishshri-db anishshri-db commented Oct 31, 2024

What changes were proposed in this pull request?

Add stateful processor handle APIs using implicit encoders in Scala

Why are the changes needed?

Without the changes, users have to pass explicit SQL encoders for state types while acquiring an instance of the underlying state variable

Does this PR introduce any user-facing change?

Yes

Users can now implicits available in Scala through import spark.implicits._ and only provide the type while getting the state objects. For eg -

      override def init(outputMode: OutputMode, timeMode: TimeMode): Unit = {
         _myValueState = getHandle.getValueState[Long]("myValueState", TTLConfig.NONE)
      }

How was this patch tested?

Existing unit tests

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-50128] Add stateful processor handle APIs using implicit encoders in Scala [DO-NOT-MERGE][SPARK-50128] Add stateful processor handle APIs using implicit encoders in Scala Oct 31, 2024
@anishshri-db anishshri-db marked this pull request as draft October 31, 2024 20:21
@anishshri-db anishshri-db changed the title [DO-NOT-MERGE][SPARK-50128] Add stateful processor handle APIs using implicit encoders in Scala [SPARK-50128][SS] Add stateful processor handle APIs using implicit encoders in Scala Nov 3, 2024
@anishshri-db anishshri-db marked this pull request as ready for review November 3, 2024 17:01
@anishshri-db
Copy link
Contributor Author

cc - @jingz-db @HeartSaVioR - PTAL, thx !

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change itself looks good to me.

I feel like requiring users to provide TTLConfig(Duration.ZERO) isn't great though. Shall we check whether we could provide a dedicated instance for this? Like TTLConfig.NONE or TTLConfig.none() (if the former does not work with Java).

@anishshri-db
Copy link
Contributor Author

I feel like requiring users to provide TTLConfig(Duration.ZERO) isn't great though. Shall we check whether we could provide a dedicated instance for this? Like TTLConfig.NONE or TTLConfig.none()

Done

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 pending CI

@HeartSaVioR
Copy link
Contributor

https://github.com/anishshri-db/spark/actions/runs/11675450690/job/32510196280

  • Run / Build modules: pyspark-mllib, pyspark-ml, pyspark-ml-connect
  • Run / Run Docker integration tests

These failures are unrelated to this PR.

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants