Casper.download() not working correctly with binaries #73

Closed
n1k0 opened this Issue Mar 23, 2012 · 27 comments

Comments

Projects
None yet
8 participants
@n1k0
Member

n1k0 commented Mar 23, 2012

From someone having reported the issue privately by email:

casper.download is supposed to make the job done.

but in my try , the casper.download() works weirdly and the saved
image files are all broken.

I made a sample code to show the download issue . I have run the following code on windows xp 32 bits with phantomjs 1.4.1 & caserjs 0.6.4.

I use casperjs.download() & casperjs.captureSelector() to download the same image file.
captureSelector gives good image file.download gives broken image file

phantom.casperPath = 'E:/casperjs';
var casperjsFile = phantom.casperPath + '/bin/bootstrap.js';
var ret = phantom.injectJs(casperjsFile);
if (ret) {
       console.log("load capserjs successfully");
       var casper = require("casper").create( {
               verbose : true,
               logLevel : 'info'
       });
} else {
       console.log("load failed");
}

var logo = null;
casper.start('http://www.baidu.com/', function() {
       logo = this.evaluate(function() {
               var imgUrl = document.querySelector('img').getAttribute('src');
               var title = document.title;

               console.log("title="+title);
               return title;
       });

       // a.jpg will be a broken image file
       this.wait(2000,function() {
               casper.echo ("start downloading");
               this.download("http://www.baidu.com/img/baidu_sylogo1.gif","a.jpg");
               this.echo("finish download");
       });


   // b.jpg is a good image file
       this.captureSelector("b.gif","img[usemap='#mp']");

});

console.log("ready to go");
casper.run(function() {
       //this.exit();
});
@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Mar 23, 2012

Member

As phantomjs 1.5 now ships with a WebKit version providing Uint8Array, this will solve the issue.

Member

n1k0 commented Mar 23, 2012

As phantomjs 1.5 now ships with a WebKit version providing Uint8Array, this will solve the issue.

@xpepermint

This comment has been minimized.

Show comment
Hide comment
@xpepermint

xpepermint May 28, 2012

Hey, any workarounds?

Hey, any workarounds?

@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 May 28, 2012

Member

I'm still struggling with writing base64 encoded contents onto the filesystem using native phantomjs' fs module.

I'm more and more thinking that this should be solved by the c++ side of things in phantomjs rather than hacking around in casperjs… stay tuned though.

Member

n1k0 commented May 28, 2012

I'm still struggling with writing base64 encoded contents onto the filesystem using native phantomjs' fs module.

I'm more and more thinking that this should be solved by the c++ side of things in phantomjs rather than hacking around in casperjs… stay tuned though.

@xpepermint

This comment has been minimized.

Show comment
Hide comment
@xpepermint

xpepermint May 28, 2012

Thanks for your answer @n1k0!

Thanks for your answer @n1k0!

@timbunce

This comment has been minimized.

Show comment
Hide comment
@timbunce

timbunce Jun 1, 2012

Contributor

Untested, but this might help:

Change: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'w');
To: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'wb'); // note the 'b' flag

Contributor

timbunce commented Jun 1, 2012

Untested, but this might help:

Change: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'w');
To: fs.write(targetPath, cu.decode(this.base64encode(url, method, data)), 'wb'); // note the 'b' flag

@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Jun 4, 2012

Member

Works in at least 80% of cases, good enough right now.

Member

n1k0 commented Jun 4, 2012

Works in at least 80% of cases, good enough right now.

@n1k0 n1k0 closed this in e19d77e Jun 4, 2012

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 13, 2013

Hello! It does not work again. casper.download() saves images as zero sized.
casperjs --version
1.0.2

Hello! It does not work again. casper.download() saves images as zero sized.
casperjs --version
1.0.2

@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Feb 13, 2013

Member

Sample test case, pretty please?

Member

n1k0 commented Feb 13, 2013

Sample test case, pretty please?

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

This comment has been minimized.

Show comment
Hide comment
@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Feb 13, 2013

Member

Works for me with both casper 1.0.2 and master… what version of casper and phantom are you using? on which platform?

Edit: also, pasting the encountered error or a stack trace would eventually help.

Member

n1k0 commented Feb 13, 2013

Works for me with both casper 1.0.2 and master… what version of casper and phantom are you using? on which platform?

Edit: also, pasting the encountered error or a stack trace would eventually help.

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 14, 2013

$ phantomjs --version
1.8.1
$ casperjs --version
1.0.2
Platform: OpenSUSE Linux 12.2 32bit.

I have not any errors in console. Just I have zero sized images as result :(

$ phantomjs --version
1.8.1
$ casperjs --version
1.0.2
Platform: OpenSUSE Linux 12.2 32bit.

I have not any errors in console. Just I have zero sized images as result :(

@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Feb 14, 2013

Member

Are you trying to download things over SSL? If so,you may want to try to use the --ignore-ssl-errors option

Member

n1k0 commented Feb 14, 2013

Are you trying to download things over SSL? If so,you may want to try to use the --ignore-ssl-errors option

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 14, 2013

In my example i did not use https
var url = 'http://google.ru/logos/2013/fyodor_shalyapins_140th_birthday-1047005-hp.jpg';
this.download(url, 'test.jpg');

In my example i did not use https
var url = 'http://google.ru/logos/2013/fyodor_shalyapins_140th_birthday-1047005-hp.jpg';
this.download(url, 'test.jpg');

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 14, 2013

Is there another way to save image in filesystem?

Is there another way to save image in filesystem?

@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Feb 14, 2013

Member

Nope. That's a strange issue I unfortunately can't investigate until I can reproduce it :/

Member

n1k0 commented Feb 14, 2013

Nope. That's a strange issue I unfortunately can't investigate until I can reproduce it :/

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 14, 2013

May be there is a way to see additional info (errors or another)? In linux console I have not errors

May be there is a way to see additional info (errors or another)? In linux console I have not errors

@hexid

This comment has been minimized.

Show comment
Hide comment
@hexid

hexid Feb 14, 2013

Collaborator

I had been running into this issue when trying to download some images, however I found that it was due to the images being on a subdomain of the page I was viewing. The script below shows an example of the problem.
I'm not sure if this is the same problem that is being experienced here, however they could be connected.

var casper = require('casper').create();
var img = 'http://i.imgur.com/rvNBmlf.gif';

casper.start();

casper.thenOpen('http://i.imgur.com/', function() { // the sub-domain of the image
  this.download(img, 'Success.gif', 'GET');
});
casper.thenOpen('http://imgur.com/', function() { // the domain the image was found
  this.download(img, 'Failed.gif', 'GET');
});
casper.thenOpen(img, function() { // the image
  this.download(img, 'Success2.gif', 'GET');
});

casper.run(function() {
  this.echo('Finished downloading.');
  this.exit();
});

Tested using CasperJS 1.0.2 and PhantomJS 1.8.1

Collaborator

hexid commented Feb 14, 2013

I had been running into this issue when trying to download some images, however I found that it was due to the images being on a subdomain of the page I was viewing. The script below shows an example of the problem.
I'm not sure if this is the same problem that is being experienced here, however they could be connected.

var casper = require('casper').create();
var img = 'http://i.imgur.com/rvNBmlf.gif';

casper.start();

casper.thenOpen('http://i.imgur.com/', function() { // the sub-domain of the image
  this.download(img, 'Success.gif', 'GET');
});
casper.thenOpen('http://imgur.com/', function() { // the domain the image was found
  this.download(img, 'Failed.gif', 'GET');
});
casper.thenOpen(img, function() { // the image
  this.download(img, 'Success2.gif', 'GET');
});

casper.run(function() {
  this.echo('Finished downloading.');
  this.exit();
});

Tested using CasperJS 1.0.2 and PhantomJS 1.8.1

@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Feb 14, 2013

Member

In this case, could using the web-security=no option solve the issue?

Member

n1k0 commented Feb 14, 2013

In this case, could using the web-security=no option solve the issue?

@hexid

This comment has been minimized.

Show comment
Hide comment
@hexid

hexid Feb 14, 2013

Collaborator

That did it.

Also, it should pointed out that the pageSettings.webSecurityEnabled option is currently missing from the API.

Collaborator

hexid commented Feb 14, 2013

That did it.

Also, it should pointed out that the pageSettings.webSecurityEnabled option is currently missing from the API.

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 14, 2013

Hello, hexid!
Your example works for me! I mean Success.gif and Success2.gif were loaded correctly and Failed.gif was loaded as zero-sized.

Hello, hexid!
Your example works for me! I mean Success.gif and Success2.gif were loaded correctly and Failed.gif was loaded as zero-sized.

@itelmenko

This comment has been minimized.

Show comment
Hide comment
@itelmenko

itelmenko Feb 14, 2013

I tested my examples and they work fine with web-security=false! Thank you!

I tested my examples and they work fine with web-security=false! Thank you!

@FergusNelson

This comment has been minimized.

Show comment
Hide comment
@FergusNelson

FergusNelson Jun 23, 2013

I am also hitting this issue. Here are some more details.
phantomjs --version
1.9.1
casperjs --version
1.0.2

scren-capture.js

var casper = require("casper").create({
    viewportSize: {
        width: 1024,
        height: 768
    }, 
    pageSettings: {
        webSecurityEnabled: false
    },
    verbose: true,
    loglevel: 'debug'
});

var address = casper.cli.get(0);
var output       = casper.cli.get(1);

if (!address || !output || !/\.(png|jpg|pdf)$/i.test(output)) {
    casper
        .echo("Usage: $ casperjs screen-capture.js <address> <output.[jpg|png|pdf]>")
        .exit(1)
    ;
}

casper.start(address, function(status) {
    if (status !== 'success') {
        casper.echo(casper.page.settings.webSecurityEnabled);
        this.download(address, output +'.binary');
    } else {
    this.waitForSelector(".stream-container", (function() {
        this.captureSelector(filename, "html");
        this.echo("Saved screenshot of " + (this.getCurrentUrl()) + " to " + filename);
    }), (function() {
        this.die("Timeout reached. Fail whale?");
        this.exit();
    }), 12000);
    }
});

casper.run();

command 
c:\Program Files\casper\samples>casperjs --web-security=no screen-capture.js  ht
tp://www.elliottmarketingpr.com/wp-content/uploads/2012/07/Foodservice-Europe-Go
urmet-Burger-UK-by-Katie-Dunne1.pdf out.png
false
[error] [remote] getBinary(): Error while fetching http://www.elliottmarketingpr
.com/wp-content/uploads/2012/07/Foodservice-Europe-Gourmet-Burger-UK-by-Katie-Du
nne1.pdf: Error: NETWORK_ERR: XMLHttpRequest Exception 101

I am also hitting this issue. Here are some more details.
phantomjs --version
1.9.1
casperjs --version
1.0.2

scren-capture.js

var casper = require("casper").create({
    viewportSize: {
        width: 1024,
        height: 768
    }, 
    pageSettings: {
        webSecurityEnabled: false
    },
    verbose: true,
    loglevel: 'debug'
});

var address = casper.cli.get(0);
var output       = casper.cli.get(1);

if (!address || !output || !/\.(png|jpg|pdf)$/i.test(output)) {
    casper
        .echo("Usage: $ casperjs screen-capture.js <address> <output.[jpg|png|pdf]>")
        .exit(1)
    ;
}

casper.start(address, function(status) {
    if (status !== 'success') {
        casper.echo(casper.page.settings.webSecurityEnabled);
        this.download(address, output +'.binary');
    } else {
    this.waitForSelector(".stream-container", (function() {
        this.captureSelector(filename, "html");
        this.echo("Saved screenshot of " + (this.getCurrentUrl()) + " to " + filename);
    }), (function() {
        this.die("Timeout reached. Fail whale?");
        this.exit();
    }), 12000);
    }
});

casper.run();

command 
c:\Program Files\casper\samples>casperjs --web-security=no screen-capture.js  ht
tp://www.elliottmarketingpr.com/wp-content/uploads/2012/07/Foodservice-Europe-Go
urmet-Burger-UK-by-Katie-Dunne1.pdf out.png
false
[error] [remote] getBinary(): Error while fetching http://www.elliottmarketingpr
.com/wp-content/uploads/2012/07/Foodservice-Europe-Gourmet-Burger-UK-by-Katie-Du
nne1.pdf: Error: NETWORK_ERR: XMLHttpRequest Exception 101
@n1k0

This comment has been minimized.

Show comment
Hide comment
@n1k0

n1k0 Jun 24, 2013

Member

@FergusNelson have you tried using the --web-security=no CLI option or the webSecurityEnabled setting as suggested above?

Member

n1k0 commented Jun 24, 2013

@FergusNelson have you tried using the --web-security=no CLI option or the webSecurityEnabled setting as suggested above?

@FergusNelson

This comment has been minimized.

Show comment
Hide comment
@FergusNelson

FergusNelson Jun 24, 2013

@n1k0 Yes I am using that command line option. See above for the exact command that I am running. I also added some console out for "casper.page.settings.webSecurityEnabled", which is the "false" output line above, so it is getting set correctly, but also still throwing an error.

@n1k0 Yes I am using that command line option. See above for the exact command that I am running. I also added some console out for "casper.page.settings.webSecurityEnabled", which is the "false" output line above, so it is getting set correctly, but also still throwing an error.

@hellojinjie

This comment has been minimized.

Show comment
Hide comment
@hellojinjie

hellojinjie Jul 21, 2013

I also encountered this issue.
With
pageSettings: {
webSecurityEnabled: false
}
Problem solved.

thx

I also encountered this issue.
With
pageSettings: {
webSecurityEnabled: false
}
Problem solved.

thx

@sdakuri sdakuri referenced this issue in hdxsfbr/coursera-downloader Oct 8, 2013

Merged

Bugfix - Downloaded files were of 0 bytes. #2

@pasht

This comment has been minimized.

Show comment
Hide comment
@pasht

pasht Oct 3, 2014

I'm trying to download some video files for my coursera account with no luck. The files get created on my disk but their length is zero. After logging and getting the correct links, here is the code that I'm using

casper.thenOpen('https://class.coursera.org/mmds-001/lecture',function(){
this.waitUntilVisible('div[class="course-lectures-list"]', function(){
links=this.evaluate(getLinks)
this.eachThen(links,function(response){
this.echo('Downloading '+response.data.filename)
this.download(response.data.link,'./'+response.data.filename+'.mp4')
})
})
})
I ve used the -web-security=no CLI option as suggested above with no success !!!
I believe that the Coursera is hosted at Amazon. Any thoughts ?

pasht commented Oct 3, 2014

I'm trying to download some video files for my coursera account with no luck. The files get created on my disk but their length is zero. After logging and getting the correct links, here is the code that I'm using

casper.thenOpen('https://class.coursera.org/mmds-001/lecture',function(){
this.waitUntilVisible('div[class="course-lectures-list"]', function(){
links=this.evaluate(getLinks)
this.eachThen(links,function(response){
this.echo('Downloading '+response.data.filename)
this.download(response.data.link,'./'+response.data.filename+'.mp4')
})
})
})
I ve used the -web-security=no CLI option as suggested above with no success !!!
I believe that the Coursera is hosted at Amazon. Any thoughts ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment