Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Error when trying to get this URL #20

Closed
karli2000 opened this issue Jul 3, 2013 · 6 comments
Closed

Parse Error when trying to get this URL #20

karli2000 opened this issue Jul 3, 2013 · 6 comments

Comments

@karli2000
Copy link

Hi,

sorry, today is my big testing day :)
When i try to get this URL:
http://www.amazon.co.uk/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951

i get this error-object:

{ [Error: Parse Error]
bytesParsed: 20,
code: 'HPE_INVALID_CONSTANT',
headers:
{ date: 'Wed, 03 Jul 2013 14:52:20 GMT',
server: 'Server',
'x-amz-id-1': 'A42B6F61ACC04C2EB8DE',
'x-amz-id-2': 'w/dGiH/4tTFFRtLq9QdWOm4H/6mMIGkocOA7Jwoket0cOIGKtudh1A==',
'content-type': 'text/html;charset=UTF-8',
'content-encoding': 'gzip',
vary: 'Accept-Encoding,User-Agent',
'set-cookie':
[ 'session-id=279-3003121-1937300; Domain=.amazon.co.uk; Expires=Tue, 28-Jun-2033 14:52:20 GMT; Path=/',
'session-id-time=2003583140l; Domain=.amazon.co.uk; Expires=Tue, 28-Jun-2033 14:52:20 GMT; Path=/' ],
'transfer-encoding': 'chunked' },
url: 'http://www.amazon.co.uk/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951',
method: 'GET' }

Any idea why?

Thank you,
Max

@SaltwaterC
Copy link
Owner

~/Projects cat http.js

var http = require('http-request');

http.get('http://www.amazon.co.uk/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951', function (err, res) {
    if (err) {
        console.error(err);
        return;
    }
    console.log(res.code, res.headers);
});

~/Projects node http.js
200 { date: 'Thu, 04 Jul 2013 10:26:05 GMT',
server: 'Server',
pragma: 'no-cache',
'x-amz-id-1': '10HMRES9C1C68XQMXTH1',
p3p: 'policyref="http://www.amazon.co.uk/w3c/p3p.xml",CP="CAO DSP LAW CUR ADM IVAo IVDo CONo OTPo OUR DELi PUBi OTRi BUS PHY ONL UNI PUR FIN COM NAV INT DEM CNT STA HEA PRE LOC GOV OTC "',
'cache-control': 'no-cache',
expires: '-1',
'x-amz-id-2': 'UboVYgk+XVW5YvyGRNJ+Rv8HWm3rXPVD8dpMCbNJ8rrccjUAHJimuItA6F1L6uR1',
vary: 'Accept-Encoding,User-Agent',
'content-encoding': 'gzip',
'content-type': 'text/html; charset=ISO-8859-1',
'set-cookie':
[ 'x-wl-uid=1q8GvZX6Eli51PU5s4Ut9xZELqZPLMvrmQiWtbj2mnvftrMOL7DJ/PEUGYZ+423N7rLXbj3sG2gc=; path=/; domain=.amazon.co.uk; expires=Tue, 01-Jan-2036 00:00:01 GMT',
'session-id-time=2082758401l; path=/; domain=.amazon.co.uk; expires=Tue, 01-Jan-2036 00:00:01 GMT',
'session-id=276-1520740-9907147; path=/; domain=.amazon.co.uk; expires=Tue, 01-Jan-2036 00:00:01 GMT' ],
'transfer-encoding': 'chunked' }

Can't reproduce it with the URL alone.

@karli2000
Copy link
Author

i use this settings:

http.get({
        url: url,
        timeout: 20000,
        maxRedirects: 10,
        noUserAgent: true,
        maxBody: 1000000,
        noSslVerifier: true,
        headers: {
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.37 Safari/537.36',
            'Accept-Language': 'en,en-US;*',
            'Accept': 'text/html,*/*;q=0.8'

        }
    }, function(err, result) {

looks like when i set noUserAgent to false it works, why?

Thank you,
Max

@SaltwaterC
Copy link
Owner

Found the issues:

  1. Failed to fix the normalizeHeaders() private method after refactoring the code. The header names were not used as lowered cased in order to have consistent results and the evaluation of noUserAgent failed. The request had two user agent headers your "User-Agent" and mine "user-agent". That's why it worked by turning off noUserAgent (not actually turning it off in the first place).
  2. Your user agent breaks the client with the Amazon page. Passing the user agent twice actually made Amazon to use my header (the reason why it worked).

This script reproduces the issue with the core HTTP client:

var options = {
    host: 'www.amazon.co.uk',
    port: 80,
    path: '/Toasters-Kitchen-Appliances-Home-Garden/b?ie=UTF8&node=11716951',
    headers: {
        'user-agent': 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.37 Safari/537.36',
        'accept-language': 'en,en-US;*',
        accept: 'text/html,*/*;q=0.8',
        'accept-encoding': 'gzip,deflate'
    },
    method: 'GET'
}

var events = 0;
var request = require('http').request(options, function (response) {
    console.log(response.statusCode, response.headers)
    response.on('data', function (data) {
        events++
    })
})

request.on('error', function (err) {
    console.error('error after %d data events', events)
    console.error(err)
})

request.end()

Unfortunately the issue is unrelated to http-request. I'll try to find some more info before opening an issue on node.js' issue tracker.

@karli2000
Copy link
Author

WOW! Not an easy find, great work!
My user-agent is the one from the actual chrome beta, strange.

Thank you,
Max

@SaltwaterC
Copy link
Owner

nodejs/node-v0.x-archive#5479 - the upstream says it's not their issue. The recommendation is to use another user-agent / disable it entirely.

@karli2000
Copy link
Author

Hehe, nice one :) Ok, i i will use a different user agent.

Thank you,
Max

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants