Skip to content

Comments

[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes#45658

Closed
yaooqinn wants to merge 3 commits intoapache:branch-3.5from
yaooqinn:SPARK-47462-FB
Closed

[SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes#45658
yaooqinn wants to merge 3 commits intoapache:branch-3.5from
yaooqinn:SPARK-47462-FB

Conversation

@yaooqinn
Copy link
Member

What changes were proposed in this pull request?

Add migration guide for TINYINT type mapping changes

Why are the changes needed?

behavior change doc

Does this PR introduce any user-facing change?

no

How was this patch tested?

doc build

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the DOCS label Mar 22, 2024
- Since Spark 3.5, `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` is enabled by default. To restore the previous behavior, set `spark.sql.optimizer.canChangeCachedPlanOutputPartitioning` to `false`.
- Since Spark 3.5, the `array_insert` function is 1-based for negative indexes. It inserts new element at the end of input arrays for the index -1. To restore the previous behavior, set `spark.sql.legacy.negativeIndexInArrayInsert` to `true`.
- Since Spark 3.5, the Avro will throw `AnalysisException` when reading Interval types as Date or Timestamp types, or reading Decimal types with lower precision. To restore the legacy behavior, set `spark.sql.legacy.avro.allowIncompatibleSchema` to `true`
- Since Spark 3.5, MySQL JDBC datasource will read TINYINT(n > 1) as ByteType, TINYINT UNSIGNED is read as ShortType, while in Spark 3.4 and below, they were read as IntegerType. To restore the previous behavior, you can cast the column to the old type. Note that for 3.5.0 and 3.5.1, TINYINT UNSIGNED is wrongly read as ByteType, and it is fixed in 3.5.2.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the context, Since Spark 3.5.2 instead of Since Spark 3.5?

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


## Upgrading from Spark SQL 3.5.1 to 3.5.2

- Since 3.5.2, MySQL JDBC datasource will read TINYINT UNSIGNED as ShortType, while in 3.5.0 and 3.5.1, it was wrongly read as ByteType.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3.5.0 and 3.5.1, it was wrongly read as ByteType part seems to conflict with the following section, Upgrading from Spark SQL 3.5.0 to 3.5.1's TINYINT UNSIGNED as ByteType, while in Spark 3.5.0 and below, they were read as IntegerType line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, for TINYINT UNSIGNED, does Spark 3.5.0 read it as ByteType or IntegerType?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, it shall be while in 3.5.1

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request Mar 22, 2024
…ping changes

### What changes were proposed in this pull request?

Add migration guide for TINYINT type mapping changes
### Why are the changes needed?

behavior change doc
### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?
doc build

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #45658 from yaooqinn/SPARK-47462-FB.

Authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
@dongjoon-hyun
Copy link
Member

Merged to branch-3.5 for Apache Spark 3.5.2. Thank you, @yaooqinn .

@dongjoon-hyun
Copy link
Member

Oh, @yaooqinn . It seems that SPARK-47462 is not in branch-3.5.

$ git log --oneline | grep SPARK-47462
e57a7d06883 [SPARK-47462][SQL][FOLLOWUP][3.5] Add migration guide for TINYINT mapping changes

@dongjoon-hyun
Copy link
Member

Could you double-check and cherry-pick the original PR to branch-3.5,please?

@yaooqinn
Copy link
Member Author

The bug was fixed in SPARK-47435. The patch set in branch 3.5 is fine.

The PR is kind of a backport PR for #45633, so they share the same Jira ID(SPARK-47462). I know it's a bit complicated. Things might be clearer if I break #45633 into 2~3 PRs.

@yaooqinn
Copy link
Member Author

The bug was

I mean 3857c16

@dongjoon-hyun
Copy link
Member

Thank you for double-checking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants