Parse Error when trying to get this URL #20

Closed
karli2000 opened this Issue Jul 3, 2013 · 6 comments

Projects

None yet

2 participants

@karli2000

Hi,

sorry, today is my big testing day :)
When i try to get this URL:
http://www.amazon.co.uk/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951

i get this error-object:

{ [Error: Parse Error]
bytesParsed: 20,
code: 'HPE_INVALID_CONSTANT',
headers:
{ date: 'Wed, 03 Jul 2013 14:52:20 GMT',
server: 'Server',
'x-amz-id-1': 'A42B6F61ACC04C2EB8DE',
'x-amz-id-2': 'w/dGiH/4tTFFRtLq9QdWOm4H/6mMIGkocOA7Jwoket0cOIGKtudh1A==',
'content-type': 'text/html;charset=UTF-8',
'content-encoding': 'gzip',
vary: 'Accept-Encoding,User-Agent',
'set-cookie':
[ 'session-id=279-3003121-1937300; Domain=.amazon.co.uk; Expires=Tue, 28-Jun-2033 14:52:20 GMT; Path=/',
'session-id-time=2003583140l; Domain=.amazon.co.uk; Expires=Tue, 28-Jun-2033 14:52:20 GMT; Path=/' ],
'transfer-encoding': 'chunked' },
url: 'http://www.amazon.co.uk/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951',
method: 'GET' }

Any idea why?

Thank you,
Max

@SaltwaterC
Owner

~/Projects cat http.js

var http = require('http-request');

http.get('http://www.amazon.co.uk/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951', function (err, res) {
    if (err) {
        console.error(err);
        return;
    }
    console.log(res.code, res.headers);
});

~/Projects node http.js
200 { date: 'Thu, 04 Jul 2013 10:26:05 GMT',
server: 'Server',
pragma: 'no-cache',
'x-amz-id-1': '10HMRES9C1C68XQMXTH1',
p3p: 'policyref="http://www.amazon.co.uk/w3c/p3p.xml",CP="CAO DSP LAW CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA HEA PRE LOC GOV OTC "',
'cache-control': 'no-cache',
expires: '-1',
'x-amz-id-2': 'UboVYgk+XVW5YvyGRNJ+Rv8HWm3rXPVD8dpMCbNJ8rrccjUAHJimuItA6F1L6uR1',
vary: 'Accept-Encoding,User-Agent',
'content-encoding': 'gzip',
'content-type': 'text/html; charset=ISO-8859-1',
'set-cookie':
[ 'x-wl-uid=1q8GvZX6Eli51PU5s4Ut9xZELqZPLMvrmQiWtbj2mnvftrMOL7DJ/PEUGYZ+423N7rLXbj3sG2gc=; path=/; domain=.amazon.co.uk; expires=Tue, 01-Jan-2036 00:00:01 GMT',
'session-id-time=2082758401l; path=/; domain=.amazon.co.uk; expires=Tue, 01-Jan-2036 00:00:01 GMT',
'session-id=276-1520740-9907147; path=/; domain=.amazon.co.uk; expires=Tue, 01-Jan-2036 00:00:01 GMT' ],
'transfer-encoding': 'chunked' }

Can't reproduce it with the URL alone.

@karli2000

i use this settings:

http.get({
        url: url,
        timeout: 20000,
        maxRedirects: 10,
        noUserAgent: true,
        maxBody: 1000000,
        noSslVerifier: true,
        headers: {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.37 Safari/537.36',
            'Accept-Language': 'en,en-US;*',
            'Accept': 'text/html,*/*;q=0.8'

        }
    }, function(err, result) {

looks like when i set noUserAgent to false it works, why?

Thank you,
Max

@SaltwaterC
Owner

Found the issues:

  1. Failed to fix the normalizeHeaders() private method after refactoring the code. The header names were not used as lowered cased in order to have consistent results and the evaluation of noUserAgent failed. The request had two user agent headers your "User-Agent" and mine "user-agent". That's why it worked by turning off noUserAgent (not actually turning it off in the first place).
  2. Your user agent breaks the client with the Amazon page. Passing the user agent twice actually made Amazon to use my header (the reason why it worked).

This script reproduces the issue with the core HTTP client:

var options = {
    host: 'www.amazon.co.uk',
    port: 80,
    path: '/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951',
    headers: {
        'user-agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.37 Safari/537.36',
        'accept-language': 'en,en-US;*',
        accept: 'text/html,*/*;q=0.8',
        'accept-encoding': 'gzip,deflate'
    },
    method: 'GET'
}

var events = 0;
var request = require('http').request(options, function (response) {
    console.log(response.statusCode, response.headers)
    response.on('data', function (data) {
        events++
    })
})

request.on('error', function (err) {
    console.error('error after %d data events', events)
    console.error(err)
})

request.end()

Unfortunately the issue is unrelated to http-request. I'll try to find some more info before opening an issue on node.js' issue tracker.

@karli2000

WOW! Not an easy find, great work!
My user-agent is the one from the actual chrome beta, strange.

Thank you,
Max

@SaltwaterC
Owner

nodejs/node-v0.x-archive#5479 - the upstream says it's not their issue. The recommendation is to use another user-agent / disable it entirely.

@SaltwaterC SaltwaterC closed this Jul 4, 2013
@karli2000

Hehe, nice one :) Ok, i i will use a different user agent.

Thank you,
Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment