Skip to content

[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler#85

Closed
vlsi wants to merge 1 commit intoapache:masterfrom
vlsi:request_encoding
Closed

[CALCITE-2704] Avoid use of ISO-8859-1 to parse request in JsonHandler#85
vlsi wants to merge 1 commit intoapache:masterfrom
vlsi:request_encoding

Conversation

@vlsi
Copy link
Contributor

@vlsi vlsi commented Feb 12, 2019

I'm pretty confident that this one fixes https://issues.apache.org/jira/browse/CALCITE-2704, and this change should be way better than #76.

I've no idea how to test the change though, so please don't ask me to add tests. Thank you.

try (ServletInputStream inputStream = request.getInputStream()) {
rawRequest = AvaticaUtils.readFully(inputStream, buffer);
byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
String encoding = request.getCharacterEncoding();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

request#getReader() might be better here, however UnsynchronizedBuffer is for byte[] only, and I didn't want to alter the code much.

@F21
Copy link
Member

F21 commented Mar 26, 2019

Looking at the changeset, it looks like this supersedes #76? @vlsi can you confirm this is the case? If so, we can close #76.

@vlsi
Copy link
Contributor Author

vlsi commented Mar 26, 2019

I cannot test/validate it, but judging by the code I think this PR does supersede #76, and #76 can be closed.

@F21
Copy link
Member

F21 commented Mar 26, 2019

Thanks @vlsi, I've gone ahead and closed #76

byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
String encoding = request.getCharacterEncoding();
if (encoding == null) {
encoding = "UTF-8";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hardcoding "UTF-8" here wouldn't be better to obtain this information in more generic/configurable way. Maybe something along the lines of Charset.defaultCharset(); or System.getProperty("file.encoding")?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StandardCharsets would be a preferred way of getting a Charset instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for StandardCharsets.UTF_8, i think the default UTF8 is okey, cause UTF8 is the most popular encoding for Internet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zabetak , @danny0405 , request.getCharacterEncoding(); returns String, and I used "UTF-8" here just to simplify the code and have a single new String(bytes, encoding) call for both cases (encoding is set, and encoding is not set)

Copy link
Member

@zabetak zabetak Jun 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always want to use UTF-8 then we can even leave the code as is. I mentioned defaultCharset() in the case that we want to allow other encodings that depend on the VM and the underlying OS.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we always want to use UTF-8 then we can even leave the code as is

Yeah, we don't need to let people provide something else. UTF-8 will be sufficient to encode everything (afaik). Using some other encoding isn't worth the hassle of us altering the protocol to allow users to tell us that they want to use some other encoding :)

@joshelser
Copy link
Member

I've no idea how to test the change though, so please don't ask me to add tests. Thank you.

https://github.com/apache/calcite-avatica/blob/master/server/src/test/java/org/apache/calcite/avatica/remote/RemoteMetaTest.java This is the easiest place to add a new test. It should be quite obvious how to do this (just JDBC).

Did you test this by hand to validate your fix since you didn't write a test?

@vlsi
Copy link
Contributor Author

vlsi commented Apr 9, 2019

I have not tested it

@F21
Copy link
Member

F21 commented Apr 15, 2019

Friendly ping @vlsi . Can you please take a look at the reviews on this PR?

@F21 F21 force-pushed the master branch 2 times, most recently from 0640c66 to d52c203 Compare May 9, 2019 08:41
@leotu
Copy link

leotu commented Jun 20, 2019

Hi, here is my test codes for this issue.
Issue: #76

package org.apache.calcite.avatica.server;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;

import org.apache.calcite.avatica.AvaticaUtils;
import org.apache.calcite.avatica.util.UnsynchronizedBuffer;
import org.junit.Assert;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AvaticaJsonHandlerTest {
	private static final Logger LOG = LoggerFactory.getLogger(AvaticaJsonHandlerTest.class);
	
	private String requestChineseData = "Hello Word (你好,世界) !";
	
	final ThreadLocal<UnsynchronizedBuffer> threadLocalBuffer;
	
	public AvaticaJsonHandlerTest() {
	     this.threadLocalBuffer = new ThreadLocal<UnsynchronizedBuffer>() {
	      @Override public UnsynchronizedBuffer initialValue() {
	        return new UnsynchronizedBuffer();
	      }
	    };
	}
	    
	@Test
	public void testShapshotCorrect() throws IOException {
		String requestCharacterEncoding = null;
		String requestHeader = null;
		
		InputStream requestInputStream = new ByteArrayInputStream(requestChineseData.getBytes(StandardCharsets.UTF_8));
		
		String rawRequest = requestHeader;
        if (rawRequest == null) {
          // Avoid a new buffer creation for every HTTP request
          final UnsynchronizedBuffer buffer = threadLocalBuffer.get();
          try (InputStream inputStream = requestInputStream) {
            byte[] bytes = AvaticaUtils.readFullyToBytes(inputStream, buffer);
            String encoding = requestCharacterEncoding;
            if (encoding == null) {
              encoding = StandardCharsets.UTF_8.name();
            }
            rawRequest = new String(bytes, encoding);
            
          } finally {
            // Reset the offset into the buffer after we're done
            buffer.reset();
          }
        }
        final String jsonRequest = rawRequest;
        LOG.info("Correct decoded request: {}", jsonRequest);
        Assert.assertEquals(requestChineseData, jsonRequest);
	}

	@Test
	public void testShapshotError() throws IOException {
		String requestHeader = null;
		
		InputStream requestInputStream = new ByteArrayInputStream(requestChineseData.getBytes(StandardCharsets.UTF_8));
		
		String rawRequest = requestHeader;
        if (rawRequest == null) {
          // Avoid a new buffer creation for every HTTP request
          final UnsynchronizedBuffer buffer = threadLocalBuffer.get();
          try (InputStream inputStream = requestInputStream) {
            rawRequest = AvaticaUtils.readFully(inputStream, buffer);
          } finally {
            // Reset the offset into the buffer after we're done
            buffer.reset();
          }
        }
        final String jsonRequest =
                new String(rawRequest.getBytes("ISO-8859-1"), "UTF-8");
        LOG.info("Error decoded request: {}", jsonRequest);
        Assert.assertEquals(requestChineseData, jsonRequest);
	}

}

@DonnyZone
Copy link
Contributor

We have encountered the same problem in our production environment.
This fix works well.

@F21
Copy link
Member

F21 commented Jan 17, 2020

@DonnyZone Would you be able to write a test and take over this PR?

@DonnyZone
Copy link
Contributor

@F21 Sure, it is valuable to merge the fix into Avatica. I will combine the work of @vlsi and @leotu.

@DonnyZone
Copy link
Contributor

@F21 Add a test for the fix in PR

@joshelser
Copy link
Member

#119 resolves this with a test addition.

@joshelser joshelser closed this Jan 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants