[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes by yaooqinn · Pull Request #45658 · apache/spark

yaooqinn · 2024-03-22T03:27:59Z

What changes were proposed in this pull request?

Add migration guide for TINYINT type mapping changes

Why are the changes needed?

behavior change doc

Does this PR introduce any user-facing change?

no

How was this patch tested?

doc build

Was this patch authored or co-authored using generative AI tooling?

no

…pe mapping changes

dongjoon-hyun · 2024-03-22T03:53:31Z

docs/sql-migration-guide.md

 - Since Spark 3.5, `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` is enabled by default. To restore the previous behavior, set `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` to `false`.
 - Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.
 - Since Spark 3.5, the Avro will throw `AnalysisException` when reading Interval types as Date or Timestamp types, or reading Decimal types with lower precision. To restore the legacy behavior, set `spark.sql.legacy.avro.allowIncompatibleSchema` to `true`
+- Since Spark 3.5, MySQL JDBC datasource will read TINYINT(n > 1) as ByteType, TINYINT UNSIGNED is read as ShortType, while in Spark 3.4 and below, they were read as IntegerType. To restore the previous behavior, you can cast the column to the old type. Note that for 3.5.0 and 3.5.1, TINYINT UNSIGNED is wrongly read as ByteType, and it is fixed in 3.5.2.


According to the context, Since Spark 3.5.2 instead of Since Spark 3.5?

dongjoon-hyun

+1, LGTM. Only one question (https://github.com/apache/spark/pull/45658/files#r1535017016)

dongjoon-hyun · 2024-03-22T05:58:50Z

docs/sql-migration-guide.md


+## Upgrading from Spark SQL 3.5.1 to 3.5.2
+
+- Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, while in 3.5.0 and 3.5.1, it was wrongly read as ByteType.


3.5.0 and 3.5.1, it was wrongly read as ByteType part seems to conflict with the following section, Upgrading from Spark SQL 3.5.0 to 3.5.1's TINYINT UNSIGNED as ByteType, while in Spark 3.5.0 and below, they were read as IntegerType line.

Just for my understanding, for TINYINT UNSIGNED, does Spark 3.5.0 read it as ByteType or IntegerType?

Oops, it shall be while in 3.5.1

docs/sql-migration-guide.md

dongjoon-hyun

+1, LGTM.

…ping changes ### What changes were proposed in this pull request? Add migration guide for TINYINT type mapping changes ### Why are the changes needed? behavior change doc ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? doc build ### Was this patch authored or co-authored using generative AI tooling? no Closes #45658 from yaooqinn/SPARK-47462-FB. Authored-by: Kent Yao <yao@apache.org> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>

dongjoon-hyun · 2024-03-22T06:06:22Z

Merged to branch-3.5 for Apache Spark 3.5.2. Thank you, @yaooqinn .

dongjoon-hyun · 2024-03-22T06:08:24Z

Oh, @yaooqinn . It seems that SPARK-47462 is not in branch-3.5.

$ git log --oneline | grep SPARK-47462
e57a7d06883 [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes

dongjoon-hyun · 2024-03-22T06:08:46Z

Could you double-check and cherry-pick the original PR to branch-3.5,please?

yaooqinn · 2024-03-22T06:15:53Z

The bug was fixed in SPARK-47435. The patch set in branch 3.5 is fine.

The PR is kind of a backport PR for #45633, so they share the same Jira ID(SPARK-47462). I know it's a bit complicated. Things might be clearer if I break #45633 into 2~3 PRs.

yaooqinn · 2024-03-22T06:20:47Z

The bug was

I mean 3857c16

dongjoon-hyun · 2024-03-22T14:43:33Z

Thank you for double-checking.

[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for integral ty…

1b7d7df

…pe mapping changes

github-actions bot added the DOCS label Mar 22, 2024

dongjoon-hyun reviewed Mar 22, 2024

View reviewed changes

dongjoon-hyun approved these changes Mar 22, 2024

View reviewed changes

address comments

c020ea6

dongjoon-hyun reviewed Mar 22, 2024

View reviewed changes

yaooqinn commented Mar 22, 2024

View reviewed changes

docs/sql-migration-guide.md Outdated Show resolved Hide resolved

Update docs/sql-migration-guide.md

16c14eb

dongjoon-hyun approved these changes Mar 22, 2024

View reviewed changes

dongjoon-hyun closed this Mar 22, 2024


		## Upgrading from Spark SQL 3.5.1 to 3.5.2

		- Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, while in 3.5.0 and 3.5.1, it was wrongly read as ByteType.

Comments

Conversation

yaooqinn commented Mar 22, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

yaooqinn Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Mar 22, 2024

Uh oh!

dongjoon-hyun commented Mar 22, 2024

Uh oh!

dongjoon-hyun commented Mar 22, 2024

Uh oh!

yaooqinn commented Mar 22, 2024

Uh oh!

yaooqinn commented Mar 22, 2024

Uh oh!

dongjoon-hyun commented Mar 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants