scraping williamhill website returns rubbish #251

hughht5 opened this Issue Jan 4, 2012 · 3 comments


None yet

3 participants

hughht5 commented Jan 4, 2012

The simple script below returns a bunch of rubbish. It works for most websites, but not william hill:

var Browser = require("zombie");
var assert = require("assert");

// Load the page from localhost
browser = new Browser()
browser.visit("", function () {

run with node

��{�^�a�yp��p������Ή��`��(���S]-����'N�8q�����/���?�ݻ���u;�݇�ׯ�Eiٲ>��-���3�ۗG��Ee�,���mF���MI��Q�۲������ڊ��ZG��O�J�^S��Cg���JO�緹�Oݎ����P����ET�n;v������v���D�tvJn��J��8'��햷r�v:��m��J��Z�nh�]�� �����Z����.{�Z���Ӳl�B'�.¶�D�$n�/��u"���z������Ni��"Nj��\00_I\00\��S��O�E8{"�m;��h���,o�����Q�y��;��a[��������c���q�D�띊?����/|?:�;���Z!�}���/�wے�h�<����������%�������A�K=-a��~'
(actual output is much longer)

Anyone know why this happens, and specifically why it happens on the only site i actually want to scrape???


keichii commented Jan 10, 2012

i think the output in gzip, you need to specify text in the headers

hughht5 commented Jan 10, 2012

Thank alot, that makes perfect sense. As I'm a noob to this though, could you offer me an example of how to decompress the gzipped response? I don't know how to edit the headers.

Thanks again,

assaf commented May 28, 2012

Zombie will now send accept-encoding header to indicate it does not support gzip.

@assaf assaf closed this May 28, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment