[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes#45658
[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes#45658yaooqinn wants to merge 3 commits intoapache:branch-3.5from
Conversation
…pe mapping changes
docs/sql-migration-guide.md
Outdated
| - Since Spark 3.5, `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` is enabled by default. To restore the previous behavior, set `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` to `false`. | ||
| - Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`. | ||
| - Since Spark 3.5, the Avro will throw `AnalysisException` when reading Interval types as Date or Timestamp types, or reading Decimal types with lower precision. To restore the legacy behavior, set `spark.sql.legacy.avro.allowIncompatibleSchema` to `true` | ||
| - Since Spark 3.5, MySQL JDBC datasource will read TINYINT(n > 1) as ByteType, TINYINT UNSIGNED is read as ShortType, while in Spark 3.4 and below, they were read as IntegerType. To restore the previous behavior, you can cast the column to the old type. Note that for 3.5.0 and 3.5.1, TINYINT UNSIGNED is wrongly read as ByteType, and it is fixed in 3.5.2. |
There was a problem hiding this comment.
According to the context, Since Spark 3.5.2 instead of Since Spark 3.5?
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Only one question (https://github.com/apache/spark/pull/45658/files#r1535017016)
docs/sql-migration-guide.md
Outdated
|
|
||
| ## Upgrading from Spark SQL 3.5.1 to 3.5.2 | ||
|
|
||
| - Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, while in 3.5.0 and 3.5.1, it was wrongly read as ByteType. |
There was a problem hiding this comment.
3.5.0 and 3.5.1, it was wrongly read as ByteType part seems to conflict with the following section, Upgrading from Spark SQL 3.5.0 to 3.5.1's TINYINT UNSIGNED as ByteType, while in Spark 3.5.0 and below, they were read as IntegerType line.
There was a problem hiding this comment.
Just for my understanding, for TINYINT UNSIGNED, does Spark 3.5.0 read it as ByteType or IntegerType?
There was a problem hiding this comment.
Oops, it shall be while in 3.5.1
…ping changes ### What changes were proposed in this pull request? Add migration guide for TINYINT type mapping changes ### Why are the changes needed? behavior change doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ### Was this patch authored or co-authored using generative AI tooling? no Closes #45658 from yaooqinn/SPARK-47462-FB. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
|
Merged to branch-3.5 for Apache Spark 3.5.2. Thank you, @yaooqinn . |
|
Oh, @yaooqinn . It seems that SPARK-47462 is not in branch-3.5. |
|
Could you double-check and cherry-pick the original PR to branch-3.5,please? |
|
The bug was fixed in SPARK-47435. The patch set in branch 3.5 is fine. The PR is kind of a backport PR for #45633, so they share the same Jira ID(SPARK-47462). I know it's a bit complicated. Things might be clearer if I break #45633 into 2~3 PRs. |
I mean 3857c16 |
|
Thank you for double-checking. |
What changes were proposed in this pull request?
Add migration guide for TINYINT type mapping changes
Why are the changes needed?
behavior change doc
Does this PR introduce any user-facing change?
no
How was this patch tested?
doc buildWas this patch authored or co-authored using generative AI tooling?
no