Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support charsets other than UTF-8 #150

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

reline
Copy link
Contributor

@reline reline commented Nov 26, 2020

No description provided.

@reline reline mentioned this pull request Nov 26, 2020
@reline reline marked this pull request as ready for review December 22, 2020 21:38
@sockeqwe
Copy link
Contributor

Thanks, I will take a look at this tomorrow or first days of January.

Just by scrolling over the code changes your PR looks good. Will check more in detail.

Thanks for your contrinution and help 👍

if (source == null) {
throw new NullPointerException("source == null");
}
this.source = source;
this.buffer = source.buffer();
this.charset = charset;
UNQUOTED_STRING_TERMINALS = ByteString.encodeString(" >/=\n", charset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

public void defaultCharsetUTF8() {
TikXml tikXml = new TikXml.Builder().build();

Assert.assertEquals(StandardCharsets.UTF_8, tikXml.config.charset());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet! great test

@@ -435,7 +439,7 @@ public void xmlDeclarationInBeginElementScope() throws IOException {
Assert.fail("Exception expected");
} catch (IOException e) {
Assert.assertEquals(
"Xml Declatraion <?xml version=\"1.0\" encoding=\"UTF-8\"?> can only be written at the beginning of a xml document! You are not at the beginning of a xml document: current xml scope is ELEMENT_OPENING at path /foo",
"Xml Declaration <?xml version=\"1.0\" encoding=\"UTF-8\"?> can only be written at the beginning of a xml document! You are not at the beginning of a xml document: current xml scope is ELEMENT_OPENING at path /foo",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

.readString(StandardCharsets.ISO_8859_1);

Assert.assertEquals(expected, actual);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice test coverage!

@sockeqwe
Copy link
Contributor

sockeqwe commented Jan 8, 2021

Overall looks good to me, the real question to me though is if we should set the charset in the builder up front or read the XML declaration encoding property: <?xml version="1.0" encoding="UTF-8"?> and then decide which charset should be used. Even less sure how it should work best for writing ...

@reline
Copy link
Contributor Author

reline commented Jan 8, 2021

Overall looks good to me, the real question to me though is if we should set the charset in the builder up front or read the XML declaration encoding property: <?xml version="1.0" encoding="UTF-8"?> and then decide which charset should be used. Even less sure how it should work best for writing ...

I had the same thought, but wasn't sure if that was too out of scope.

I think I would approach it by using the default UTF-8 encoding as it already is to read and write, also allow an encoding to be set for reading and writing (like I've done here), then finally only when reading XML if an encoding property exists it would use that encoding instead and disregard whatever was set in the config. The only question is should we allow clients to override the encoding property when reading in case it is incorrect?

This could be interpreted as a breaking change as well, since currently all files are read using UTF-8 encoding, whereas this would start reading files using the encoding declared in the file.

@sockeqwe
Copy link
Contributor

@Bodo1981 what do you think about adding support for this in general?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants