refs #803 - Add option to create a new WebPage instance from casper #826
Conversation
Currently casperjs does not handle closing a page and creating a new page object. If user does a casper.page.close() there is no option to create a new page object. With this change we can use a casper.page = casper.newPage()
Thanks. Could you please add docs and a test please? |
Ya, I don't think this is quite all the way there, or I'm possibly using it wrong. I just integrated this into 1.1-beta3 as a test with the code from #803 and get this error:
|
@ilangv Digging a little deeper and looking at |
@ilangv I take that back, I messed up my test structure (different function prototypes between thenOpen and open), but there are still a couple things required in
combined with this (obviously a copy/paste, but a simple extract method refactoring should take care of it):
|
Would it also be worth it to check the deleted status of the current |
@mlb5000 👍 Valid point |
I think this should solve it. Calling a Casper.prototype.newPage = function newPage() {
"use strict";
this.checkStarted();
//Close the existing page object. Does not harm anything even if close() is called twice
this.page.close();
//copied from casper.start()
this.page = this.mainPage = createPage(this);
this.page.settings = utils.mergeObjects(this.page.settings, this.options.pageSettings);
if (utils.isClipRect(this.options.clipRect)) {
this.page.clipRect = this.options.clipRect;
}
if (utils.isObject(this.options.viewportSize)) {
this.page.viewportSize = this.options.viewportSize;
}
return this.page;
}; |
@n1k0 Will add tests and update docs shortly. Thanks |
Unfortunately something still isn't getting cleaned up. After this patch I'm still getting this error after my
|
It's worth noting that without this it grows by about 10MB per page load, and grows much slower with it. It still gets to over 2GB memory usage after a few hundred iterations though... |
Made change to attach page settings to the newly created WebPage object and included unit test and documentation
Looks like an improvement. How about renaming |
@n1k0 casper.reset() sounds misleading. Tends me to think of it as a master reset to the entire casper object rather than just the page. Don't you think so? |
@mlb5000 Don't see this happening in my case. I am currently running the crawler code https://github.com/seethroughtrees/casperjs-spider with the above change and all is well so far. Am I missing something? |
@n1k0 Are you talking in the sense of calling @ilangv That's a little concerning. I'm calling |
The casper instance stores a bunch of logs, history, resource objects and so on; it might be interesting to check these and provide a way to purge the data they contain. |
@mlb5000 any update on this issue? I am facing the same issue... thanks |
Hi to all, I have written several scripts with casperjs to scrap webpages the last couple of years. I'm using casperjs in Linux. The problem with the increased need in resources (CPU, RAM) was always a big pain for all these scripts and imagine some of them are running standalone in a vps without my supervision. To manage the increase in ram I was automatically restarting the scripts through bash scripts. Yesterday I came across this thread and I was amazed by the idea to close my headless page in order to reallocate resources and keep the RAM consumption stable and not rising. casper.then(function() { Initially I tested it with no wait and it failed completely. The program was hanging in casper.newPage() for ever. Then I added a wait of 1000ms and it managed to do some iterations (~5) but again it hanged. Then I increased it to 2000ms and it managed to do 80-100 iterations before hanging. I repeated the test with my pc restarted (so all other resources were relatively free) and the script has hanged in the 270th iteration... After a while (~30secs) by it's own started working again. The reasons I'm posting here are the following:
|
@cptX I do the same as you to manage the ever growing memory consumption; manage an external queue of URLs to process and execute Casper from another program in a loop. Unfortunately my own experiments came to the same unstable conclusion as your own so I just decided not to trust it altogether and continue on with looping from an outside program. |
This promise of closing every page is too good to reject it so soon! I definitely want to stay with it! I saw so much memory impact difference that for sure can change everything in my implementations... |
For now I use this tip to reduce RAM and CPU used by my own scripts. casper.on('run.complete', function() {
casper.die('kill phantom', 1);
}); |
@mickaelandrieu Thanks for the tip, I'll have to try that. I have noticed that if I'm running in a loop for a long time I'll start to accumulate a lot of orphaned PhantomJS instances (from times when something crashes). Maybe this would fix that. |
@mickaelandrieu Is it possible to use your script in a loop and if yes do we have to start a new phantom instance somehow after that? How? |
@cptX can you post your ask in mailing list ? |
Hi mickaelandrieu, as I'm new here I don't understand what you mean to post in the mailing list. |
OK I opened an issue here: https://groups.google.com/forum/#!topic/casperjs/h-KyQWYXFfY |
@n1k0 can you take a look at this ? I will try to reviews others waiting PR |
I just hit the memory issue on a scraper, would definitely love to see this merged in 👍 |
}); | ||
|
||
test.assert(casper.started, 'Casper.start() started'); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you remove this unused lines ?
Please add this - it just fixed a major bug for me and would love to have it in the main branch 👯 👍 |
Looks ok to me, too. @n1k0 @paazmaya @mickaelandrieu @hexid any reason to still keep this open? |
Let's do this |
refs #803 - Add option to create a new WebPage instance from casper
At last 👍 |
@istr, Any estimation when can it be released, I wish I could start using it as it seems to be critical numerous memory issues |
Would assume this coming out in the next beta.... couple weeks. |
Thanks for the update, I am facing fatal memory issues running on Windows (BTW: On Mac, it is much more stable) and I am very eager having this fix. I would very grateful if you can guide me how to implement the fix in my code. Can I do it in my program or should I download the latest CasperJS code and compile it to a program? |
I'll try to review the change-set against beta 5 and publish beta 6 next weekend. |
@istr and my students will enjoy this new version as well :) Thank you! |
@istr I really appreciate this |
@ilangv @cptX When I try to use function newPage(), task manager show there are too many TCP Connection and the application got hanged, not sure why but that might be the reason `casper.start(); var json = require('list.json'); |
@NghiemDuongHung Please open a new issue with your finding. Thanks. |
Currently casperjs does not handle closing a page and creating a new
page object.
If user does a casper.page.close() there is no option to create a new
page object.
With this change we can use a casper.page = casper.newPage()