Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode bug fix cleanup #4

Merged
merged 2 commits into from
Oct 17, 2014
Merged

Unicode bug fix cleanup #4

merged 2 commits into from
Oct 17, 2014

Conversation

whua-airbnb
Copy link

@airbnb/di

We have a bug that all the non-ascii unicode characters in our log files are converted into "?" since 08/05/2014. The problem was that when we convert String from/to byte[], we did not specify the character set (Charset) in our code. When the default Charset is not UTF-8, the non-ascii unicode characters are not handled correctly.

@@ -1,123 +1,128 @@
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard to understand what changes have happened in this pom file due to re-formatting. Would it be possible to have it formatted like it was done originally so that diffs are visible properly?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried. However, in the old file most of the white spaces are tabs. I just replaced the tabs into spaces. Unfortunately diff does not pick the changes in a way that we want.

@krishnap
Copy link

Other than the minor cosmetic issue, the change looks good to me. Thanks for tracking down this tricky bug.

@brndnmtthws
Copy link

Minus some formatting stuff, looks good. 👍 merge away.

We can deploy this on Monday to one of the clusters, and let it bake.

@andykram
Copy link
Collaborator

Good find, but why not just run the app with -Dfile.encoding=UTF-8?

@whua-airbnb
Copy link
Author

Hi Andy,
I still have not figured out why we run -Dfile.encoding=US-ASCII on 08/05/2014 yet. I assume there are some reasons. However, I think it is more reliable to specify the character set rather than depend on the default, since the default might change without informing us.

@andykram
Copy link
Collaborator

@whua-airbnb: I disagree. The way you specify file encoding is via -Dfile.encoding. We could make an alternate configuration flag, but that seems like a waste. There's no good reason for us to be running with -Dfile.encoding=US-ASCII, we should just switch it back.

whua-airbnb added a commit that referenced this pull request Oct 17, 2014
@whua-airbnb whua-airbnb merged commit be338cf into master Oct 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants