Skip to content
This repository was archived by the owner on May 30, 2023. It is now read-only.

Cookiejar module #11535

Closed
wants to merge 1 commit into from
Closed

Conversation

jrollinson
Copy link

This is an implementation of the cookie jar implementation described in issue #11417

The instance static method of CookieJar was removed and the constructor was made public.
A cookiejar.js module was added that adds a API for creating a cookiejar.

To use:

var jar = require('cookiejar').create('~/cookiejar');
var page = require('webpage');
page.cookieJar = jar;
// Do Stuff...
page.close();
jar.close();

I would love any suggestions or comments you might have.
Hope this helps.

@camerondavison
Copy link

I would like to see this supported, and I think it would be particularly useful for web driver implementations like ghost driver. What will it take to get this pull request merged in?

@ariya
Copy link
Owner

ariya commented Oct 9, 2013

Has someone tried and tested this patch?

@detro
Copy link
Collaborator

detro commented Dec 18, 2013

Will try to test it soon.

@detro
Copy link
Collaborator

detro commented Dec 18, 2013

What's the format of the file is supposed to be?
I have seen a .ini format mentioned in the code, but didn't work out yet the actual format.

@camerondavison
Copy link

What format are you talking about? The cookie jar? This should be just making multiple cookie jars and using the already existing save/read from disk code.

@jrollinson
Copy link
Author

There are a few tests in the pull request as well.

@jrollinson
Copy link
Author

I would love to help in the testing of this pull request. What can I do? I couldn't find any documentation on how testing is done.

@detro
Copy link
Collaborator

detro commented Jan 5, 2014

Step 1: run against all the PhantomJS and GhostDriver tests, all pass.

@detro
Copy link
Collaborator

detro commented Jan 5, 2014

Just to double check: currently API behaviour is fully preserved - the only difference is the page.cookieJar extra property.

Then, if user want to have dedicated CookieJar for their page, they can create and attach one.

This solution is neat, let me tell you :)

static CookieJar *singleton = NULL;
if (!singleton) {
if (cookiesFile.isEmpty()) {
qDebug() << "CookieJar - Created but will not store cookies (use option '--cookies-file=<filename>' to enable persisten cookie storage)";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please re-integrate those qDebug() messages? They do help debugging...

@detro
Copy link
Collaborator

detro commented Jan 5, 2014

A side from my comments (that might require some very small re-touches) I think this is good to go.
I'll wait for your reply (and fixes?).

👍

@detro
Copy link
Collaborator

detro commented Jan 5, 2014

Some note.

Related issues:

Those issues will all be solved once we merge this.

@detro
Copy link
Collaborator

detro commented Jan 6, 2014

PING? @jtrollinson I'd really love to merge this soon.

Please review my comments :)

@jrollinson
Copy link
Author

Hi,

I commented on a few of your code comments asking a couple questions.
Also doing the other changes.

@detro
Copy link
Collaborator

detro commented Jan 6, 2014

Commented :)

@jrollinson
Copy link
Author

I believe those were all the changes that were suggested. Anything missing?

@ariya
Copy link
Owner

ariya commented Jan 6, 2014

Any reason why CookieJar needs to be passed in the constructor? There's a corresponding setter after all so if we don't need to make the constructor heavier, that's better.

For more details on why this is preferable, take a look at Qt API design guidelines.

@ariya
Copy link
Owner

ariya commented Jan 6, 2014

This doesn't seem to merge cleanly against the last master. Can you take a look at that?
Also, makes sense to just squash everything into one commit.

@detro
Copy link
Collaborator

detro commented Jan 6, 2014

@ariya the problem with not having the CookieJar as a constructor parameter for WebPage will create 2 scenarios:

  • either a WebPage, by default, has no CookieJar (not sure would even work, probably not!)
  • or has to fetch the default CookieJar internally by itself

But this will bring us back to the original coding pattern I had devised: the CookieJar was a singleton and so every WebPage could just "use it internally".

Instead we want the flexibility of preserving current behaviour (all pages share the default cookiejar) but also being able to get a new one (via the setter).

Makes sense?

@jrollinson
Copy link
Author

The conflicts during the merge seem very straight forward. Should I push a merge?

@detro
Copy link
Collaborator

detro commented Jan 6, 2014

@jtrollinson If they are simple to solve, no need. But @ariya did raise another good point: would be good if you would squash those commits as they represent a unique logical unit.

@jrollinson
Copy link
Author

@detro

That is no problem. That would require a forced git push correct? That's alright with you?

@jrollinson
Copy link
Author

Another question, how detailed would you like the squashed commit message to be?

@detro
Copy link
Collaborator

detro commented Jan 6, 2014

The git push -f will be against your own repo, in your own branch. So, no problem there.
If you push against the same branch you have generated this PR from, the PR will automatically update (at least, that's how it worked for me all the previous times I have done it).

About the message I'd make sure you are explaining the changes (what was there before and what you are introducing), the way it looks in terms of example code use and, if you want to go the extra mile, I'd add a list of the API added.

Of course, refer to the ISSUE your commit closes.
If you add a line like: fixes issue #XYZ, on merge github should also close that issue auto-magically.

@jrollinson
Copy link
Author

@detro Ok, I pushed the squashed commit. Thanks for the help.

@ariya
Copy link
Owner

ariya commented Jan 7, 2014

I don't think WebPage needs to fetch the default cookie jar by itself. Basically what I think is that instead of this:

m_page = new WebPage(this, m_defaultCookieJar, QUrl::fromLocalFile(m_config.scriptFile()));

make it something like this (which is like the original code BTW):

m_page = new WebPage(this, QUrl::fromLocalFile(m_config.scriptFile()));
m_page->setCookieJar(m_defaultCookieJar);

If the above can work, then it's the preferred style. See The Convenience Trap section at http://doc.qt.digia.com/qq/qq13-apis.html for details.

@jrollinson
Copy link
Author

@ariya

So one reason why I decided to do it in the first way rather than the second is that the cookie jar is used during the constructor to build the m_customWebPage.
Signals in m_customWebPage are also connected to the WebPage during the constructor.

@detro
Copy link
Collaborator

detro commented Jan 7, 2014

@jtrollinson what @ariya says it does make sense. After re-reading the constructor-code, I think you could easily change from a 1-line to a 2-line contruct+set.

@jrollinson
Copy link
Author

I suppose I do not completely see the benefit.

I understand that one could say that this makes the code easier to read by removing arguments from the constructor. However, in this case, I feel like in this case there is no good default value. Thus, for a period of time the webpage would be in some unstable state. It would be quite easy to forget to set the value of the cookie jar, which would not be good.

@detro
Copy link
Collaborator

detro commented Jan 7, 2014

Well, essentially what will happen is that the NetworkAccessManager won't give any cookie to the page, making it unable to hold a session.

But the idea is to stick to Qt API design principle, so it's up to the user of that API to make sure it's used properly.
And we are not REALLY exposing that to JS (where our "users" are) but to C++, where just us devs work.

So, it's actually good to preserve those Qt API principles: makes it easier for other Qt devs out there to join us (if we are so lucky to get MORE contributors).

@ariya
Copy link
Owner

ariya commented Jan 7, 2014

And once (in the near future) we want to have unit tests for our C++ code, it is really weird and annoying to provide a cookie jar instance every time you want to instantiate a WebPage for the tests.

@jrollinson
Copy link
Author

I have push a commit with that change.

Once you have read through it, I'll squash it.

One thing I noticed was that on a call to setCookieJar in the NetworkAccessManager, the cookie jar passed in has its parent set to be the phantom instance.

Removal of this line leads to segment faults in the code.
Is this exactly what we wish the code to be doing?

@jrollinson
Copy link
Author

Ok, after closer inspection I see that QNetworkAccessManager sets itself as the parent of the cookie jar. By setting the phantom singleton as the parent of the cookie jar, the cookie jar will not be deleted with the QNetworkAccessManager.

I am pushing a commit with a change to the comments reflecting that CookieJar is not a singleton anymore.

@detro
Copy link
Collaborator

detro commented Jan 8, 2014

@jtrollinson Crap! That is a bit of a pickle I think.
The fact is that the CookieJar ownership, by default, was taken by the NAM (as you figured out).

This was because, when deleted, the NAM would cleanup the memory of the Jar.
That makes sense in a "normal" browser, as you want to get rid of them at the same time: 1 NAM per browser, and that NAM used 1 jar.

Now we are introducing a much more (potentially complex) scenario.
In the current PhantomJS, closing a page will:

  • free the page's memory
  • free the nam's memory
  • let the jar stay, as it can be used by other pages

This is all driven by the QtObject parental relationship-memory handling thingy (I'm sure this thing has a name by it's 7:30am and I'm still asleep).

Now, we want that:

  • default behaviour is maintained if the the cookiejar API are not used
  • ??? when the new cookiejar API are used

I marked the new behaviour ??? for the memory because, well, we might end up creating a custom-designed-memory-leaker.

If we keep the relationship like this ("jars belong to phantom") means that those scripts that run MANY pages, and dispose of them, if they decide to use CookieJars (for session isolation), all those JARs will never be deleted, until Phantom is closed. => MEMORY LEAK

At this time of the day, I see 2 options for us:

  • either we add an API to the CookieJar to dispose of it (at least the user can avoid Leaks if she uses the API right)
  • or we assign ownership of every "new" cookiejar (aka the one created with this new API) to the Page they are assigned, while keeping the "default" cookiejar to Phantom

The second option though is bad for a couple of reasons

  1. can still leak memory (i.e. jar created but not assigned to a page)
  2. it would prevent cookiejar reuse cross-page instance

Sorry if this got too long.
What do you think?

@jrollinson
Copy link
Author

My idea was to use cookieJar.close() to delete cookie jars, because the user is the only one who knows if the cookie jar is going to be used again in the future.

I think that Phantom in this situation would still be the best parent.

@jrollinson
Copy link
Author

From what I understand. The way it is set up now works well.

If the CookieJar API is not used, then the default cookie jar will not be deleted for the entire life of the program.

If the CookieJar API is used, then new cookie jar instances have this cycle:

var jar = require('cookiejar').create();
do stuff...
jar.close();

jar.close calls deleteLater on the cookie jar meaning that the cookie jar will be deleted.
We have to rely on the user to delete user-created cookie jars, because cookie jars could be used in multiple web pages, thus we cannot make the cookie jar a parent of any web page.

This is also similar to how web pages are handled.

@detro
Copy link
Collaborator

detro commented Jan 9, 2014

I agree. This seems to be safest scenario. Ownership of Jars remains to Phantom BUT the user can delete if desired.

@ariya ?

@ariya
Copy link
Owner

ariya commented Jan 9, 2014

The lack of reference counting information through QtWebKit JS binding is the problem here. I'm fine with the solution of explicit close if the user needs to release the memory. Care must be taken so that a repeated call to close is safeguarded against e.g. possible crash.

@jrollinson
Copy link
Author

Agreed.
cookieJar.close() simple calls deleteLater. I based that off the webpage.close() function. If that is all, this should be ready to be squashed and merged.

@ariya
Copy link
Owner

ariya commented Jan 9, 2014

To ensure the cross-ref, we prefer to have the issue link in every commit. I can definitely add them as I land your patches, unless you want to do it yourself.

@jrollinson
Copy link
Author

I was planning on squashing the three commits after given the go-ahead.

Would you prefer them to be separated?

@detro
Copy link
Collaborator

detro commented Jan 10, 2014

Squash away :)

@ariya
Copy link
Owner

ariya commented Jan 10, 2014

Go for it!

@jrollinson
Copy link
Author

All merged!

@ariya
Copy link
Owner

ariya commented Jan 10, 2014

Looks good!

@detro Any further comment before I land this?

@detro
Copy link
Collaborator

detro commented Jan 10, 2014

@ariya I think we have now the best compromise for usability and stability. The worst case scenario: the user keeps creating Jar's without closing them (memory consumption goes up), but stability is preserved.

I'm eager to add support for this in GhostDriver and see if it can stand parallel testing.
:)

@ariya
Copy link
Owner

ariya commented Jan 10, 2014

Can you rebase it against the current master? Your parent commit was from a bit outdated and the merge causes some conflict (and I'd rather not make a mistake if I solve it myself).

Previously, there was a single global cookie jar shared between all web pages.
Now, one can have separate cookie jars for different web pages.

Makes CookieJar a normal class, not a singleton.
Moves many public CookieJar methods to public slots.
Adds default cookie jar to Phantom.
Adds the CookieJar module that provides access to cookie jars in javascript.
Adds cookie jar module tests.

Usage:
var jar = require('cookiejar').create();
var webpage = require('webpage').create();
webpage.cookieJar = jar;
...
webpage.close();
jar.close();

JS API changes:
Webpage:
    var jar = page.cookieJar; -- assigns 'jar' the given webpage's cookie jar.
    page.cookiejar = jar; -- sets 'jar' as the given webpage's cookie jar.
CookieJar:
    var jar = require('cookiejar').create(path)
        creates a cookie jar with persistent storage at the given file path
        (path not mandatory).
    var cookies = jar.cookies; -- assign's 'jar' the list of cookies in the
        cookie jar.
    jar.cookies = [c1, c2]; -- sets the cookie jar's cookies as the ones in the
        list.
    jar.addCookie(cookie) -- adds cookie 'cookie' to the cookie jar.

fixes issue ariya#11417
@jrollinson
Copy link
Author

Ok, it's now rebased to master.

@ariya
Copy link
Owner

ariya commented Jan 11, 2014

Landed. Thank you very much @jtrollinson!

@ariya ariya closed this Jan 11, 2014
detro added a commit to detro/ghostdriver that referenced this pull request Jan 12, 2014
Finally an API to support multi-cookie jar is available in PhantomJS:
ariya/phantomjs#11535.

This allows us to create a completely new CookieJar every time
we create a Session.

A version of PhantomJS with the new CookieJar API
hasn't been released yet: once the commit linked above
is part of a stable release, this will work.
Otherwise, stick with GhostDriver 1.1.0.

Fixes #170.
@rpoisel
Copy link

rpoisel commented Jan 21, 2014

Hi,

thank you for providing us with PhantomJS! Awesomely easy to use and very quick.

Where is this required cookiejar module located? Whenever I want my crawljax to use the PhantomJS driver, it keeps failing because it cannot find this module:

java -jar crawljax-cli-3.4.jar -b PHANTOMJS 'http://www.somewebpage.com' /tmp/output
PHANTOMJS
PHANTOMJS
Jan 21, 2014 3:06:44 PM org.openqa.selenium.phantomjs.PhantomJSDriverService <init>
INFO: executable: /tmp/phantomjs-1.9.6-linux-x86_64/bin/phantomjs
Jan 21, 2014 3:06:44 PM org.openqa.selenium.phantomjs.PhantomJSDriverService <init>
INFO: port: 24433
Jan 21, 2014 3:06:44 PM org.openqa.selenium.phantomjs.PhantomJSDriverService <init>
INFO: arguments: [--webdriver=24433, --webdriver-logfile=/tmp/crawljax-cli-3.4/phantomjsdriver.log]
Jan 21, 2014 3:06:44 PM org.openqa.selenium.phantomjs.PhantomJSDriverService <init>
INFO: environment: {}
PhantomJS is launching GhostDriver...
[INFO  - 2014-01-21T15:06:45.733Z] GhostDriver - Main - running on port 24433
[ERROR - 2014-01-21T15:06:46.204Z] RouterReqHand - _handle.error - {"message":"'undefined' is not a function (evaluating 'require('cookiejar').create()')","line":104,"sourceId":140258495098240,"sourceURL":":/ghostdriver/session.js","stack":"TypeError: 'undefined' is not a function (evaluating 'require('cookiejar').create()')\n    at :/ghostdriver/session.js:104\n    at :/ghostdriver/request_handlers/session_manager_request_handler.js:75\n    at :/ghostdriver/request_handlers/session_manager_request_handler.js:44\n    at :/ghostdriver/request_handlers/router_request_handler.js:70","stackArray":[{"sourceURL":":/ghostdriver/session.js","line":104},{"sourceURL":":/ghostdriver/request_handlers/session_manager_request_handler.js","line":75},{"sourceURL":":/ghostdriver/request_handlers/session_manager_request_handler.js","line":44},{"sourceURL":":/ghostdriver/request_handlers/router_request_handler.js","line":70}]}
Guice provision errors:

1) Error in custom provider, org.openqa.selenium.UnsupportedCommandException: TypeError - 'undefined' is not a function (evaluating 'require('cookiejar').create()')
Command duration or timeout: 726 milliseconds
Build info: version: '2.37.1', revision: 'a7c61cbd68657e133ae96672cf995890bad2ee42', time: '2013-10-21 09:08:07'
System info: host: 'N/A', ip: 'N/A', os.name: 'Linux', os.arch: 'amd64', os.version: '3.12-1-amd64', java.version: '1.7.0_51'
Driver info: org.openqa.selenium.phantomjs.PhantomJSDriver
  while locating com.crawljax.browser.WebDriverBrowserBuilder
  while locating com.crawljax.browser.EmbeddedBrowser
    for parameter 0 at com.crawljax.core.CrawlerContext.<init>(CrawlerContext.java:32)
  while locating com.crawljax.core.CrawlerContext
    for parameter 0 at com.crawljax.core.Crawler.<init>(Crawler.java:72)
  while locating com.crawljax.core.Crawler
    for parameter 2 at com.crawljax.core.CrawlTaskConsumer.<init>(CrawlTaskConsumer.java:30)
  while locating com.crawljax.core.CrawlTaskConsumer

1 error

Same for my PhantomJS 1.9.6 installation:

hostname:/tmp/phantomjs-1.9.6-linux-x86_64/bin% ./phantomjs 
phantomjs> var jar = require('cookiejar').create('~/cookiejar');
Cannot find module 'cookiejar'

Thanks for your answer,
Rainer

@ariya
Copy link
Owner

ariya commented Jan 21, 2014

@rpoisel This is not available in 1.9. PLEASE use the mailing-list as a forum support and not this issue tracker.

@rpoisel
Copy link

rpoisel commented Jan 21, 2014

Thank you for your quick answer! It seems that there is something broken in the 1.9.6 release because in session.js of the included GhostDriver the cookiejar module is already being used. If you want I can also post this message to a suitable mailing list of your project.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants