[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler#85
[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler#85vlsi wants to merge 1 commit intoapache:masterfrom vlsi:request_encoding
Conversation
| try (ServletInputStream inputStream = request.getInputStream()) { | ||
| rawRequest = AvaticaUtils.readFully(inputStream, buffer); | ||
| byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer); | ||
| String encoding = request.getCharacterEncoding(); |
There was a problem hiding this comment.
request#getReader() might be better here, however UnsynchronizedBuffer is for byte[] only, and I didn't want to alter the code much.
server/src/main/java/org/apache/calcite/avatica/server/AvaticaJsonHandler.java
Show resolved
Hide resolved
| byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer); | ||
| String encoding = request.getCharacterEncoding(); | ||
| if (encoding == null) { | ||
| encoding = "UTF-8"; |
There was a problem hiding this comment.
Instead of hardcoding "UTF-8" here wouldn't be better to obtain this information in more generic/configurable way. Maybe something along the lines of Charset.defaultCharset(); or System.getProperty("file.encoding")?
There was a problem hiding this comment.
StandardCharsets would be a preferred way of getting a Charset instance.
There was a problem hiding this comment.
+1 for StandardCharsets.UTF_8, i think the default UTF8 is okey, cause UTF8 is the most popular encoding for Internet.
There was a problem hiding this comment.
@zabetak , @danny0405 , request.getCharacterEncoding(); returns String, and I used "UTF-8" here just to simplify the code and have a single new String(bytes, encoding) call for both cases (encoding is set, and encoding is not set)
There was a problem hiding this comment.
If we always want to use UTF-8 then we can even leave the code as is. I mentioned defaultCharset() in the case that we want to allow other encodings that depend on the VM and the underlying OS.
There was a problem hiding this comment.
If we always want to use UTF-8 then we can even leave the code as is
Yeah, we don't need to let people provide something else. UTF-8 will be sufficient to encode everything (afaik). Using some other encoding isn't worth the hassle of us altering the protocol to allow users to tell us that they want to use some other encoding :)
https://github.com/apache/calcite-avatica/blob/master/server/src/test/java/org/apache/calcite/avatica/remote/RemoteMetaTest.java This is the easiest place to add a new test. It should be quite obvious how to do this (just JDBC). Did you test this by hand to validate your fix since you didn't write a test? |
|
I have not tested it |
|
Friendly ping @vlsi . Can you please take a look at the reviews on this PR? |
0640c66 to
d52c203
Compare
|
Hi, here is my test codes for this issue. package org.apache.calcite.avatica.server;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import org.apache.calcite.avatica.AvaticaUtils;
import org.apache.calcite.avatica.util.UnsynchronizedBuffer;
import org.junit.Assert;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class AvaticaJsonHandlerTest {
private static final Logger LOG = LoggerFactory.getLogger(AvaticaJsonHandlerTest.class);
private String requestChineseData = "Hello Word (你好,世界) !";
final ThreadLocal<UnsynchronizedBuffer> threadLocalBuffer;
public AvaticaJsonHandlerTest() {
this.threadLocalBuffer = new ThreadLocal<UnsynchronizedBuffer>() {
@Override public UnsynchronizedBuffer initialValue() {
return new UnsynchronizedBuffer();
}
};
}
@Test
public void testShapshotCorrect() throws IOException {
String requestCharacterEncoding = null;
String requestHeader = null;
InputStream requestInputStream = new ByteArrayInputStream(requestChineseData.getBytes(StandardCharsets.UTF_8));
String rawRequest = requestHeader;
if (rawRequest == null) {
// Avoid a new buffer creation for every HTTP request
final UnsynchronizedBuffer buffer = threadLocalBuffer.get();
try (InputStream inputStream = requestInputStream) {
byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
String encoding = requestCharacterEncoding;
if (encoding == null) {
encoding = StandardCharsets.UTF_8.name();
}
rawRequest = new String(bytes, encoding);
} finally {
// Reset the offset into the buffer after we're done
buffer.reset();
}
}
final String jsonRequest = rawRequest;
LOG.info("Correct decoded request: {}", jsonRequest);
Assert.assertEquals(requestChineseData, jsonRequest);
}
@Test
public void testShapshotError() throws IOException {
String requestHeader = null;
InputStream requestInputStream = new ByteArrayInputStream(requestChineseData.getBytes(StandardCharsets.UTF_8));
String rawRequest = requestHeader;
if (rawRequest == null) {
// Avoid a new buffer creation for every HTTP request
final UnsynchronizedBuffer buffer = threadLocalBuffer.get();
try (InputStream inputStream = requestInputStream) {
rawRequest = AvaticaUtils.readFully(inputStream, buffer);
} finally {
// Reset the offset into the buffer after we're done
buffer.reset();
}
}
final String jsonRequest =
new String(rawRequest.getBytes("ISO-8859-1"), "UTF-8");
LOG.info("Error decoded request: {}", jsonRequest);
Assert.assertEquals(requestChineseData, jsonRequest);
}
} |
d90fb8c to
92045d0
Compare
|
We have encountered the same problem in our production environment. |
|
@DonnyZone Would you be able to write a test and take over this PR? |
|
#119 resolves this with a test addition. |
I'm pretty confident that this one fixes https://issues.apache.org/jira/browse/CALCITE-2704, and this change should be way better than #76.
I've no idea how to test the change though, so please don't ask me to add tests. Thank you.