Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fix unicode handling in HTTPClient #8046

Merged
merged 1 commit into from Jul 13, 2023

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Jul 12, 2023

fixes #7821

Without the fix, tests would fail with

expected: struct<1: id: required int (unique ID 馃お), 2: data: required string>
 but was: struct<1: id: required int (unique ID ?), 2: data: required string>

Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks @nastra !

Copy link
Contributor

@dramaticlly dramaticlly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, maybe also worth updating this hardcoded UTF-8 in PartitionSpec to using StandardCharsets as well? https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L174

fixes apache#7821

Without the fix, tests would fail with
```
expected: struct<1: id: required int (unique ID 馃お), 2: data: required string>
 but was: struct<1: id: required int (unique ID ?), 2: data: required string>
```
@nastra
Copy link
Contributor Author

nastra commented Jul 13, 2023

LGTM, maybe also worth updating this hardcoded UTF-8 in PartitionSpec to using StandardCharsets as well? https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L174

Seems like there are a few more places that use the hardcoded string and it's not directly related to this PR. @dramaticlly would you be interested in updating those?

@dramaticlly
Copy link
Contributor

LGTM, maybe also worth updating this hardcoded UTF-8 in PartitionSpec to using StandardCharsets as well? https://github.com/apache/iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L174

Seems like there are a few more places that use the hardcoded string and it's not directly related to this PR. @dramaticlly would you be interested in updating those?

Sure I can cut a separate PR to update those.

@danielcweeks danielcweeks merged commit 9572585 into apache:master Jul 13, 2023
41 checks passed
@nastra nastra deleted the string-entity-encoding branch July 14, 2023 06:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] Encountered Chinese garbled code issue during Iceberg rest HTTP transmission
4 participants