Skip to content

Comments

[SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters#40090

Closed
wangyum wants to merge 5 commits intoapache:masterfrom
wangyum:SPARK-41741
Closed

[SPARK-41741][SQL] Encode the string using the UTF_8 charset in ParquetFilters#40090
wangyum wants to merge 5 commits intoapache:masterfrom
wangyum:SPARK-41741

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Feb 20, 2023

What changes were proposed in this pull request?

This PR makes it encode the string using the UTF_8 charset in ParquetFilters.

Why are the changes needed?

Fix data issue where the default charset is not UTF_8.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manual test.

@github-actions github-actions bot added the SQL label Feb 20, 2023
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise

@wangyum wangyum closed this in d5fa41e Feb 20, 2023
wangyum added a commit that referenced this pull request Feb 20, 2023
…etFilters

This PR makes it encode the string using the `UTF_8` charset in `ParquetFilters`.

Fix data issue where the default charset is not `UTF_8`.

No.

Manual test.

Closes #40090 from wangyum/SPARK-41741.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit d5fa41e)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
wangyum added a commit that referenced this pull request Feb 20, 2023
…etFilters

This PR makes it encode the string using the `UTF_8` charset in `ParquetFilters`.

Fix data issue where the default charset is not `UTF_8`.

No.

Manual test.

Closes #40090 from wangyum/SPARK-41741.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit d5fa41e)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
@wangyum
Copy link
Member Author

wangyum commented Feb 20, 2023

Merged to master, branch-3.4 and branch-3.3.

@wangyum wangyum deleted the SPARK-41741 branch February 20, 2023 11:33
snmvaughan pushed a commit to snmvaughan/spark that referenced this pull request Jun 20, 2023
…etFilters

This PR makes it encode the string using the `UTF_8` charset in `ParquetFilters`.

Fix data issue where the default charset is not `UTF_8`.

No.

Manual test.

Closes apache#40090 from wangyum/SPARK-41741.

Authored-by: Yuming Wang <yumwang@ebay.com>
Signed-off-by: Yuming Wang <yumwang@ebay.com>
(cherry picked from commit d5fa41e)
Signed-off-by: Yuming Wang <yumwang@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants