-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-55337][SS] Fix MemoryStream backward compatibility #54108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
JIRA Issue Information=== Bug SPARK-55337 === This comment was automatically generated by GitHub Actions |
9d29544 to
6074dff
Compare
|
|
||
| // Deprecated: Used when an implicit SQLContext is in scope | ||
| @deprecated("Use MemoryStream.apply with an implicit SparkSession instead of SQLContext", "4.1.0") | ||
| def apply[A: Encoder]()(implicit sqlContext: SQLContext): MemoryStream[A] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the problem. This is not backward compatible, as the previous API is def apply[A: Encoder](implicit sqlContext: SQLContext): MemoryStream[A] (no parentheses).
There is no way to keep both implicits. So the proposal here is to only keep implicit SQLContext, and require to pass SparkSession implicitly.
5825d62 to
14f1f0f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cloud-fan , we cannot create a follow-up for the released JIRA issue because your PR has a different fix version, 4.1.2 (or 4.2.0), instead of 4.1.0. Please create a new JIRA ID.
### What changes were proposed in this pull request? This is a followup to apache#52402 that addresses backward compatibility concerns: 1. Keep the original `implicit SQLContext` factory methods for full backward compatibility 2. Add new overloads with explicit `SparkSession` parameter for new code 3. Fix `TestGraphRegistrationContext` to provide implicit `spark` and `sqlContext` to avoid name shadowing issues in nested classes 4. Remove redundant `implicit val sparkSession` declarations from pipeline tests that are no longer needed with the fix ### Why are the changes needed? PR apache#52402 changed the MemoryStream API to use `implicit SparkSession` which broke backward compatibility for code that only has `implicit SQLContext` available. This followup ensures: - Old code continues to work without modification - New code can use SparkSession with explicit parameters - Internal implementation uses SparkSession (modernization from apache#52402) ### Does this PR introduce _any_ user-facing change? No. This maintains full backward compatibility while adding new API options. ### How was this patch tested? Existing tests pass. The API changes are additive. ### Was this patch authored or co-authored using generative AI tooling? Yes Co-authored-by: Cursor <cursoragent@cursor.com>
14f1f0f to
823133a
Compare
|
Thank you for getting a new JIRA ID. |
Remove the `apply[A: Encoder](numPartitions: Int, sparkSession: SparkSession)` factory method that creates a semantic trap - it can accidentally match calls like `MemoryStream[T](0, spark)` interpreting the first argument as `numPartitions` instead of `id`, causing zero partitions to be created and no data to flow. Users who need both `numPartitions` and explicit `SparkSession` can use the case class constructor directly: `new MemoryStream[A](id, sparkSession, Some(numPartitions))`. Co-authored-by: Cursor <cursoragent@cursor.com>
What changes were proposed in this pull request?
This is a followup to #52402 that addresses backward compatibility concerns:
implicit SQLContextfactory methods for full backward compatibilitySparkSessionparameter for new codeTestGraphRegistrationContextto provide implicitsparkandsqlContextto avoid name shadowing issues in nested classesimplicit val sparkSessiondeclarations from pipeline tests that are no longer needed with the fixWhy are the changes needed?
PR #52402 changed the MemoryStream API to use
implicit SparkSessionwhich broke backward compatibility for code that only hasimplicit SQLContextavailable. This followup ensures:Does this PR introduce any user-facing change?
No. This maintains full backward compatibility while adding new API options.
How was this patch tested?
Existing tests pass. The API changes are additive.
Was this patch authored or co-authored using generative AI tooling?
Yes
Made with Cursor