Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refs #803 - Add option to create a new WebPage instance from casper #826

Merged
merged 4 commits into from Feb 6, 2016

Conversation

Projects
None yet
@ilangv
Copy link
Contributor

commented Feb 12, 2014

Currently casperjs does not handle closing a page and creating a new
page object.

If user does a casper.page.close() there is no option to create a new
page object.

With this change we can use a casper.page = casper.newPage()

Allow user to create a new page object
Currently casperjs does not handle closing a page and creating a new
page object.

If user does a casper.page.close() there is no option to create a new
page object.

With this change we can use a casper.page = casper.newPage()
@n1k0

This comment has been minimized.

Copy link
Member

commented Feb 12, 2014

Thanks. Could you please add docs and a test please?

@n1k0 n1k0 added Qualified labels Feb 12, 2014

@n1k0 n1k0 added this to the 1.1 milestone Feb 12, 2014

@mlb5000

This comment has been minimized.

Copy link

commented Feb 12, 2014

Ya, I don't think this is quite all the way there, or I'm possibly using it wrong. I just integrated this into 1.1-beta3 as a test with the code from #803 and get this error:

Error: cannot access member `customHeaders' of deleted QObject
@mlb5000

This comment has been minimized.

Copy link

commented Feb 12, 2014

@ilangv Digging a little deeper and looking at casper.start it looks like there is some additional initialization that must happen on the page object after it is created. However, even after bringing everything relevant to page initialization into the newPage function it still doesn't want to cooperate.

@mlb5000

This comment has been minimized.

Copy link

commented Feb 12, 2014

@ilangv I take that back, I messed up my test structure (different function prototypes between thenOpen and open), but there are still a couple things required in newPage. I got it to work as I want using:

var links = ['http://www.google.com', 'http://www.yahoo.com'];

var casper = require('casper').create({
    logging: 'error',
    pageSettings: {
        webSecurityEnabled: false
    }
});

casper.start('http://www.amazon.com', function () {
    this.echo(this.getTitle());
});

casper.eachThen(links, function(item) {
    var url = item.data;
    this.open(url).then(function (){
        this.echo(this.getTitle());
    }).then(function () {
        casper.page.close();
        casper.newPage();
    });
});

casper.run(function (){
    this.echo('Done');
    casper.exit();
});

combined with this (obviously a copy/paste, but a simple extract method refactoring should take care of it):

Casper.prototype.newPage = function newPage() {
    "use strict";
    this.checkStarted();
    //copied from casper.start()
    this.page = this.mainPage = createPage(this);
    this.page.settings = utils.mergeObjects(this.page.settings, this.options.pageSettings);
    if (utils.isClipRect(this.options.clipRect)) {
        this.page.clipRect = this.options.clipRect;
    }
    if (utils.isObject(this.options.viewportSize)) {
        this.page.viewportSize = this.options.viewportSize;
    }
    return this.page;
};
@mlb5000

This comment has been minimized.

Copy link

commented Feb 12, 2014

Would it also be worth it to check the deleted status of the current WebPage object to either throw an error or call close() on it for the caller? As it stands, calling newPage without first calling close() would result in a memory leak, so we might was well protect from that within casper.

@ilangv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2014

@mlb5000 👍 Valid point

@ilangv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2014

I think this should solve it.

Calling a page.close() explicitly should do the job

Casper.prototype.newPage = function newPage() {
        "use strict";
        this.checkStarted();
        //Close the existing page object. Does not harm anything even if close() is called twice
        this.page.close();
        //copied from casper.start()
        this.page = this.mainPage = createPage(this);
        this.page.settings = utils.mergeObjects(this.page.settings, this.options.pageSettings);
        if (utils.isClipRect(this.options.clipRect)) {
        this.page.clipRect = this.options.clipRect;
    }
    if (utils.isObject(this.options.viewportSize)) {
        this.page.viewportSize = this.options.viewportSize;
    }
    return this.page;
};
@ilangv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2014

@n1k0 Will add tests and update docs shortly. Thanks

@mlb5000

This comment has been minimized.

Copy link

commented Feb 14, 2014

Unfortunately something still isn't getting cleaned up. After this patch I'm still getting this error after my eachThen runs for a while, even with newPage called each time around.

QEventDispatcherUNIXPrivate(): Unable to create thread pipe: Too many open files
2014-02-13T21:18:06 [FATAL] QEventDispatcherUNIXPrivate(): Can not continue without a thread pipe
Abort trap: 6
@mlb5000

This comment has been minimized.

Copy link

commented Feb 14, 2014

It's worth noting that without this it grows by about 10MB per page load, and grows much slower with it. It still gets to over 2GB memory usage after a few hundred iterations though...

ilangv added some commits Feb 14, 2014

Added documentation and unit test
Made change to attach page settings to the newly created WebPage object
and included unit test and documentation
@n1k0

This comment has been minimized.

Copy link
Member

commented Feb 14, 2014

Looks like an improvement. How about renaming newPage to reset though?

@ilangv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2014

@n1k0 casper.reset() sounds misleading. Tends me to think of it as a master reset to the entire casper object rather than just the page. Don't you think so?

@ilangv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 14, 2014

@mlb5000 Don't see this happening in my case. I am currently running the crawler code https://github.com/seethroughtrees/casperjs-spider with the above change and all is well so far. Am I missing something?

@mlb5000

This comment has been minimized.

Copy link

commented Feb 14, 2014

@n1k0 Are you talking in the sense of calling start again? Start clears all steps and history as well as resetting the page. This would be counterproductive in my use case, and is why I don't use it.

@ilangv That's a little concerning. I'm calling casper.newPage() as the last step of my eachThen callback and still it's definitely still leaking. What other resources within Phantom, aside from the page, could be leaking?

@n1k0

This comment has been minimized.

Copy link
Member

commented Feb 14, 2014

What other resources within Phantom, aside from the page, could be leaking?

The casper instance stores a bunch of logs, history, resource objects and so on; it might be interesting to check these and provide a way to purge the data they contain.

@stefan--

This comment has been minimized.

Copy link

commented Sep 25, 2014

@mlb5000 any update on this issue? I am facing the same issue... thanks

@cptX

This comment has been minimized.

Copy link

commented Oct 6, 2014

Hi to all, I have written several scripts with casperjs to scrap webpages the last couple of years. I'm using casperjs in Linux. The problem with the increased need in resources (CPU, RAM) was always a big pain for all these scripts and imagine some of them are running standalone in a vps without my supervision. To manage the increase in ram I was automatically restarting the scripts through bash scripts.

Yesterday I came across this thread and I was amazed by the idea to close my headless page in order to reallocate resources and keep the RAM consumption stable and not rising.
I tested it yesterday by adding the following code after a casper.thenOpen inside a loop of several iterations (target was 1000 iterations).

casper.then(function() {
casper.page.close();
casper.newPage();
this.wait(2000, function() {
});
});

Initially I tested it with no wait and it failed completely. The program was hanging in casper.newPage() for ever. Then I added a wait of 1000ms and it managed to do some iterations (~5) but again it hanged. Then I increased it to 2000ms and it managed to do 80-100 iterations before hanging. I repeated the test with my pc restarted (so all other resources were relatively free) and the script has hanged in the 270th iteration... After a while (~30secs) by it's own started working again.
During all these iterations the RAM is kept relatively stable and increasing only very slowly. The difference is huge compared to the same code without page.close().

The reasons I'm posting here are the following:

  1. I want to say this is a must and we definitely should find a way to close pages at the end of every iteration.
  2. I see some instability on the above code. I don't understand why we need so much time to wait. 2000ms is too much for 1000 iterations. I could live with it if at least was stable. I don't understand why it hangs every now and then.
  3. I want to ask if it's possible to make the casper.newPage() blocking until the new Page is ready. That way we don't need to put an unreliable wait() for it.
@mlb5000

This comment has been minimized.

Copy link

commented Oct 6, 2014

@cptX I do the same as you to manage the ever growing memory consumption; manage an external queue of URLs to process and execute Casper from another program in a loop. Unfortunately my own experiments came to the same unstable conclusion as your own so I just decided not to trust it altogether and continue on with looping from an outside program.

@cptX

This comment has been minimized.

Copy link

commented Oct 6, 2014

This promise of closing every page is too good to reject it so soon! I definitely want to stay with it! I saw so much memory impact difference that for sure can change everything in my implementations...
If we focus on finding the problems of the instabilities?
Why is the delay necessary? What happens behind? How can we make it blocking until the new page is ready?

@mickaelandrieu

This comment has been minimized.

Copy link
Member

commented Oct 7, 2014

For now I use this tip to reduce RAM and CPU used by my own scripts.
You can use event to send a KILL signal to phantomjs:

casper.on('run.complete', function() {
    casper.die('kill phantom', 1);
});
@mlb5000

This comment has been minimized.

Copy link

commented Oct 7, 2014

@mickaelandrieu Thanks for the tip, I'll have to try that. I have noticed that if I'm running in a loop for a long time I'll start to accumulate a lot of orphaned PhantomJS instances (from times when something crashes). Maybe this would fix that.

@cptX

This comment has been minimized.

Copy link

commented Oct 7, 2014

@mickaelandrieu Is it possible to use your script in a loop and if yes do we have to start a new phantom instance somehow after that? How?

@mickaelandrieu

This comment has been minimized.

Copy link
Member

commented Oct 7, 2014

@cptX can you post your ask in mailing list ?

@cptX

This comment has been minimized.

Copy link

commented Oct 8, 2014

Hi mickaelandrieu, as I'm new here I don't understand what you mean to post in the mailing list.

@mickaelandrieu

This comment has been minimized.

@cptX

This comment has been minimized.

Copy link

commented Oct 10, 2014

@mickaelandrieu

This comment has been minimized.

Copy link
Member

commented Oct 10, 2014

@n1k0 can you take a look at this ? I will try to reviews others waiting PR

@diwu1989

This comment has been minimized.

Copy link

commented May 26, 2015

I just hit the memory issue on a scraper, would definitely love to see this merged in 👍

});

test.assert(casper.started, 'Casper.start() started');

This comment has been minimized.

Copy link
@mickaelandrieu

mickaelandrieu May 26, 2015

Member

can you remove this unused lines ?

@mickaelandrieu

This comment has been minimized.

Copy link
Member

commented May 26, 2015

@n1k0 @paazmaya for me it's 👍

@ssmulders

This comment has been minimized.

Copy link

commented Jan 14, 2016

Please add this - it just fixed a major bug for me and would love to have it in the main branch 👯 👍

@istr

This comment has been minimized.

Copy link
Collaborator

commented Jan 17, 2016

Looks ok to me, too. @n1k0 @paazmaya @mickaelandrieu @hexid any reason to still keep this open?

@paazmaya

This comment has been minimized.

Copy link
Collaborator

commented Feb 6, 2016

Let's do this

paazmaya added a commit that referenced this pull request Feb 6, 2016

Merge pull request #826 from ilangv/master
refs #803 - Add option to create a new WebPage instance from casper

@paazmaya paazmaya merged commit 8d561c2 into casperjs:master Feb 6, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@ilangv

This comment has been minimized.

Copy link
Contributor Author

commented Feb 6, 2016

At last 👍

@robin-obrien

This comment has been minimized.

Copy link

commented Feb 28, 2016

@istr, Any estimation when can it be released, I wish I could start using it as it seems to be critical numerous memory issues

@paazmaya

This comment has been minimized.

Copy link
Collaborator

commented Feb 29, 2016

Would assume this coming out in the next beta.... couple weeks.
The actual 1.1 hopefully before Summer

@robin-obrien

This comment has been minimized.

Copy link

commented Feb 29, 2016

Thanks for the update, I am facing fatal memory issues running on Windows (BTW: On Mac, it is much more stable) and I am very eager having this fix. I would very grateful if you can guide me how to implement the fix in my code. Can I do it in my program or should I download the latest CasperJS code and compile it to a program?

@istr

This comment has been minimized.

Copy link
Collaborator

commented Mar 1, 2016

I'll try to review the change-set against beta 5 and publish beta 6 next weekend.

@mickaelandrieu

This comment has been minimized.

Copy link
Member

commented Mar 1, 2016

@istr and my students will enjoy this new version as well :) Thank you!

@robin-obrien

This comment has been minimized.

Copy link

commented Mar 2, 2016

@istr I really appreciate this

@zarkerus

This comment has been minimized.

Copy link

commented Apr 4, 2016

@ilangv @cptX When I try to use function newPage(), task manager show there are too many TCP Connection and the application got hanged, not sure why but that might be the reason

`casper.start();

var json = require('list.json');
casper.then(function() {
var current = 0
var end = json.length;
for (;current < end;) {
(function(cntr) {
casper.thenOpen(json[cntr].url, function() {
console.log(json[cntr].id);
var data = this.evaluate(function() {
return $("title");
});
}).then(function() {
casper.page.close();
casper.newPage();
});;
})(current);
current++;
}
});
casper.run();`

@istr

This comment has been minimized.

Copy link
Collaborator

commented Apr 4, 2016

@NghiemDuongHung Please open a new issue with your finding. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.