[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler by vlsi · Pull Request #85 · apache/calcite-avatica

vlsi · 2019-02-12T21:25:55Z

I'm pretty confident that this one fixes https://issues.apache.org/jira/browse/CALCITE-2704, and this change should be way better than #76.

I've no idea how to test the change though, so please don't ask me to add tests. Thank you.

closes #76

vlsi · 2019-02-12T21:30:59Z

server/src/main/java/org/apache/calcite/avatica/server/AvaticaJsonHandler.java

          try (ServletInputStream inputStream = request.getInputStream()) {
-            rawRequest = AvaticaUtils.readFully(inputStream, buffer);
+            byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
+            String encoding = request.getCharacterEncoding();


request#getReader() might be better here, however UnsynchronizedBuffer is for byte[] only, and I didn't want to alter the code much.

F21 · 2019-03-26T07:31:51Z

Looking at the changeset, it looks like this supersedes #76? @vlsi can you confirm this is the case? If so, we can close #76.

vlsi · 2019-03-26T07:35:46Z

I cannot test/validate it, but judging by the code I think this PR does supersede #76, and #76 can be closed.

F21 · 2019-03-26T07:41:33Z

Thanks @vlsi, I've gone ahead and closed #76

server/src/main/java/org/apache/calcite/avatica/server/AvaticaJsonHandler.java

zabetak · 2019-03-29T07:59:20Z

server/src/main/java/org/apache/calcite/avatica/server/AvaticaJsonHandler.java

+            byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
+            String encoding = request.getCharacterEncoding();
+            if (encoding == null) {
+              encoding = "UTF-8";


Instead of hardcoding "UTF-8" here wouldn't be better to obtain this information in more generic/configurable way. Maybe something along the lines of Charset.defaultCharset(); or System.getProperty("file.encoding")?

StandardCharsets would be a preferred way of getting a Charset instance.

+1 for StandardCharsets.UTF_8, i think the default UTF8 is okey, cause UTF8 is the most popular encoding for Internet.

@zabetak , @danny0405 , request.getCharacterEncoding(); returns String, and I used "UTF-8" here just to simplify the code and have a single new String(bytes, encoding) call for both cases (encoding is set, and encoding is not set)

If we always want to use UTF-8 then we can even leave the code as is. I mentioned defaultCharset() in the case that we want to allow other encodings that depend on the VM and the underlying OS.

If we always want to use UTF-8 then we can even leave the code as is

Yeah, we don't need to let people provide something else. UTF-8 will be sufficient to encode everything (afaik). Using some other encoding isn't worth the hassle of us altering the protocol to allow users to tell us that they want to use some other encoding :)

joshelser · 2019-04-09T13:07:34Z

I've no idea how to test the change though, so please don't ask me to add tests. Thank you.

https://github.com/apache/calcite-avatica/blob/master/server/src/test/java/org/apache/calcite/avatica/remote/RemoteMetaTest.java This is the easiest place to add a new test. It should be quite obvious how to do this (just JDBC).

Did you test this by hand to validate your fix since you didn't write a test?

vlsi · 2019-04-09T13:32:45Z

I have not tested it

F21 · 2019-04-15T22:04:33Z

Friendly ping @vlsi . Can you please take a look at the reviews on this PR?

leotu · 2019-06-20T09:49:11Z

Hi, here is my test codes for this issue.
Issue: #76

package org.apache.calcite.avatica.server;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;

import org.apache.calcite.avatica.AvaticaUtils;
import org.apache.calcite.avatica.util.UnsynchronizedBuffer;
import org.junit.Assert;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AvaticaJsonHandlerTest {
	private static final Logger LOG = LoggerFactory.getLogger(AvaticaJsonHandlerTest.class);
	
	private String requestChineseData = "Hello Word (你好，世界) !";
	
	final ThreadLocal<UnsynchronizedBuffer> threadLocalBuffer;
	
	public AvaticaJsonHandlerTest() {
	     this.threadLocalBuffer = new ThreadLocal<UnsynchronizedBuffer>() {
	      @Override public UnsynchronizedBuffer initialValue() {
	        return new UnsynchronizedBuffer();
	      }
	    };
	}
	    
	@Test
	public void testShapshotCorrect() throws IOException {
		String requestCharacterEncoding = null;
		String requestHeader = null;
		
		InputStream requestInputStream = new ByteArrayInputStream(requestChineseData.getBytes(StandardCharsets.UTF_8));
		
		String rawRequest = requestHeader;
        if (rawRequest == null) {
          // Avoid a new buffer creation for every HTTP request
          final UnsynchronizedBuffer buffer = threadLocalBuffer.get();
          try (InputStream inputStream = requestInputStream) {
            byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
            String encoding = requestCharacterEncoding;
            if (encoding == null) {
              encoding = StandardCharsets.UTF_8.name();
            }
            rawRequest = new String(bytes, encoding);
            
          } finally {
            // Reset the offset into the buffer after we're done
            buffer.reset();
          }
        }
        final String jsonRequest = rawRequest;
        LOG.info("Correct decoded request: {}", jsonRequest);
        Assert.assertEquals(requestChineseData, jsonRequest);
	}

	@Test
	public void testShapshotError() throws IOException {
		String requestHeader = null;
		
		InputStream requestInputStream = new ByteArrayInputStream(requestChineseData.getBytes(StandardCharsets.UTF_8));
		
		String rawRequest = requestHeader;
        if (rawRequest == null) {
          // Avoid a new buffer creation for every HTTP request
          final UnsynchronizedBuffer buffer = threadLocalBuffer.get();
          try (InputStream inputStream = requestInputStream) {
            rawRequest = AvaticaUtils.readFully(inputStream, buffer);
          } finally {
            // Reset the offset into the buffer after we're done
            buffer.reset();
          }
        }
        final String jsonRequest =
                new String(rawRequest.getBytes("ISO-8859-1"), "UTF-8");
        LOG.info("Error decoded request: {}", jsonRequest);
        Assert.assertEquals(requestChineseData, jsonRequest);
	}

}

DonnyZone · 2020-01-17T03:33:59Z

We have encountered the same problem in our production environment.
This fix works well.

F21 · 2020-01-17T03:34:55Z

@DonnyZone Would you be able to write a test and take over this PR?

DonnyZone · 2020-01-17T03:43:29Z

@F21 Sure, it is valuable to merge the fix into Avatica. I will combine the work of @vlsi and @leotu.

DonnyZone · 2020-01-17T06:24:33Z

@F21 Add a test for the fix in PR

joshelser · 2020-01-17T17:29:54Z

#119 resolves this with a test addition.

[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler

9a9dbf5

closes #76

vlsi commented Feb 12, 2019

View reviewed changes

F21 mentioned this pull request Mar 26, 2019

[CALCITE-2704] Multilingual decoded problem #76

Closed

zabetak reviewed Mar 29, 2019

View reviewed changes

F21 force-pushed the master branch 2 times, most recently from 0640c66 to d52c203 Compare May 9, 2019 08:41

zabetak force-pushed the master branch from 5639977 to 8f329f4 Compare July 11, 2019 07:55

vlsi force-pushed the master branch 3 times, most recently from d90fb8c to 92045d0 Compare November 17, 2019 14:44

F21 force-pushed the master branch from 204d588 to 512bbee Compare December 11, 2019 21:58

DonnyZone mentioned this pull request Jan 17, 2020

[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler #119

Closed

joshelser closed this Jan 17, 2020

Conversation

vlsi commented Feb 12, 2019

Uh oh!

vlsi Feb 12, 2019

Choose a reason for hiding this comment

Uh oh!

F21 commented Mar 26, 2019

Uh oh!

vlsi commented Mar 26, 2019

Uh oh!

F21 commented Mar 26, 2019

Uh oh!

Uh oh!

zabetak Mar 29, 2019

Choose a reason for hiding this comment

Uh oh!

joshelser Apr 9, 2019

Choose a reason for hiding this comment

Uh oh!

danny0405 Apr 12, 2019

Choose a reason for hiding this comment

Uh oh!

vlsi Jun 20, 2019

Choose a reason for hiding this comment

Uh oh!

zabetak Jun 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshelser Jun 21, 2019

Choose a reason for hiding this comment

Uh oh!

joshelser commented Apr 9, 2019

Uh oh!

vlsi commented Apr 9, 2019

Uh oh!

F21 commented Apr 15, 2019

Uh oh!

leotu commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DonnyZone commented Jan 17, 2020

Uh oh!

F21 commented Jan 17, 2020

Uh oh!

DonnyZone commented Jan 17, 2020

Uh oh!

DonnyZone commented Jan 17, 2020

Uh oh!

joshelser commented Jan 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zabetak Jun 21, 2019 •

edited

Loading

leotu commented Jun 20, 2019 •

edited

Loading