Simple HTTP proxying fails #92

t-beckmann · 2014-11-28T09:30:06Z

The script below requires express.js and request.js simply installed using npm. When running it on node.js and browsing to http://localhost:8090/request you'll get the landing page of google, as expected. However, when running in Trireme, the response data is truncated half-way of data. Trouble is, the reading side of the client socket is closed although not all data was read...

var request = require('request');
var express = require('express');
var app = express();

app.get('/request', function(req, res) {
    request('http://www.google.de', function(error, response, body) {
       if(error) {
           console.log(error);
           res.send(response.statusCode, error);
           throw error;
       }
       console.log(body);
       res.send(response.statusCode, body);
    });
});
app.listen(8090);

console.log(process.versions);

The text was updated successfully, but these errors were encountered:

t-beckmann · 2014-11-28T10:20:10Z

~~I debugged this, HTTPParser seems to call onMessageComplete() too early.~~

t-beckmann · 2014-11-28T14:15:54Z

The problem originates at request.js, line #1297. There a buffer object is converted to a string using buffer.toString(encoding) with encoding=undefined. The buffer contains German special characters (ü), the resulting string is truncated at the first such character.

gbrail · 2014-11-29T17:03:07Z

Thanks for tracking this down!

The latest request module (in master) looks a little different. Have they fixed this in the meantime?

https://github.com/request/request/blob/master/request.js

Nonetheless, it looks to me like this is a case where Trireme is handling the invalid string encoding differently than regular Node. That means it's something that we should fix. Do you think you know enough to put together a test case? Java has many options for string encoding and decoding that effect how invalid characters are handled and it may well be the case that we need to choose another one.

t-beckmann · 2014-12-03T11:03:02Z

Yes, I tried this code and it fails:

 if (self.encoding === null) {
            // response.body = buffer
            // can't move to this until https://github.com/rvagg/bl/issues/13
            response.body = buffer.slice()
          } else {
            response.body = buffer.toString(self.encoding)
          }

Here, self.encoding is undefined, so that the code path calls buffer.toString(undefined) which will apply UTF-8 decoding to the buffer contents. The buffer contents however use a German charset and can not be interpreted as UTF-8.

My guess is, node.js represents strings as UTF-8 internally, so that conversion from buffer to string back to buffer again is a plain memcpy. In my understanding, response.js is broken in that it does not respect the charset attribute of the content-type header sent by default. A workaround is to specify binary encoding in the request options explicitly.

t-beckmann · 2014-12-03T15:44:04Z

The following code will print okay on node and 256 != 128 on trireme:

var buffer1 = new Buffer(0x100);
for(i = 0; i < 0x100; ++i) {
    buffer1[i] = i;
}
var length1 = buffer1.length;

var buffer2 = new Buffer(buffer1.toString(), 'binary');
var length2 = buffer2.length;

if(length1 != length2) {
    console.log('', length1, '!=', length2);
} else {
    console.log('okay');
}

My guess was wrong, but node and Trireme treat illegal byte sequences differently. As you pointed out, the replacement mechnism can be used here.

This will not fix request.js however, even on node the code mentioned in this issue does NOT provide the original data as sent by the remote server. It just replaces the special characters. So, I'm not sure if this is worth fixing as not specifying a correct encoding is an issue of request.js.

gbrail · 2014-12-03T18:15:36Z

Yep, that makes sense.

The short-term fix is going to be to change the "Utils.bufferToString" class to tell the CharsetEncoder to "replace" unrecognized characters rather than to "report" them. That brings Trireme to work more like regular node in that it will replace those characters.

(Interestingly, in this case, Java will replace the unrecognized characters with "?" while node uses a different ASCII code.)

Longer-term I think that we want to do more because Node does a lot of specific things with Base64 encoding that we don't do quite the same way and I'd like to fix that.

Now, on your point about request, you're absolutely right -- it looks like google.de returns content encoded in iso-8859-1, which is not the same as UTF-8, and in fact it is not a character set that's built in to Node -- even if you passed that character set name to request, I still think it'd fail. Technically request should be using a module like "iconv" or "iconv-lite" to convert the characters, since those modules (which also work in Trireme) support a much larger universe of character sets.

gbrail · 2014-12-04T01:43:06Z

This turned out to be complicated because the default UTF-8 converter in Java was treating the umlaut character as an impartial UTF-8 character rather than as an invalid character, and it stopped the decoding.

The Node.js "string_decoder" module handles this correctly but Trireme was using its own implementation. I changed that back, and also changed the string decoding and encoding code to replace invalid characters rather than stop, and now things work a lot more like regular Node.

I'd like to push out a release in the next day or two that contains the latest few fixes, as they're all pretty important bugs to get fixed out there. Can you try what is in master and see if it is any better? It worked on "http://www.google.de/" for me.

t-beckmann · 2014-12-04T16:28:24Z

Tested it, the output is just like node.js, sweet!

By the way, on Windows when running the tests via Maven I get this, which is probably because the test not being maintained on this platform:

Tests in error:
  testNpm(io.apigee.trireme.core.test.CertParserTest): Error: Mismatch in valid_
to: Mrz 14 19:06:17 2015 GMT !== Mar 14 19:06:17 2015 GMT (comparecerts.js#12)

So, I build using mvn -Dmaven.test.skip=true package, copied the npm-package output to my node_modules folder and ran it. It'll probably take a few days to get my server code base uptodate, but I don't expect to run into problems.

gbrail self-assigned this Dec 1, 2014

gbrail added next release and removed next release labels Dec 2, 2014

gbrail added this to the 0.8.4 milestone Dec 2, 2014

gbrail closed this as completed Dec 4, 2014

sufie unassigned gbrail Jun 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple HTTP proxying fails #92

Simple HTTP proxying fails #92

t-beckmann commented Nov 28, 2014

t-beckmann commented Nov 28, 2014

t-beckmann commented Nov 28, 2014

gbrail commented Nov 29, 2014

t-beckmann commented Dec 3, 2014

t-beckmann commented Dec 3, 2014

gbrail commented Dec 3, 2014

gbrail commented Dec 4, 2014

t-beckmann commented Dec 4, 2014

Simple HTTP proxying fails #92

Simple HTTP proxying fails #92

Comments

t-beckmann commented Nov 28, 2014

t-beckmann commented Nov 28, 2014

t-beckmann commented Nov 28, 2014

gbrail commented Nov 29, 2014

t-beckmann commented Dec 3, 2014

t-beckmann commented Dec 3, 2014

gbrail commented Dec 3, 2014

gbrail commented Dec 4, 2014

t-beckmann commented Dec 4, 2014