Skip to content

fix: merge cookies provided by user with session cookies#1201

Merged
B4nan merged 9 commits intomasterfrom
cookie-merging
Oct 7, 2021
Merged

fix: merge cookies provided by user with session cookies#1201
B4nan merged 9 commits intomasterfrom
cookie-merging

Conversation

@B4nan
Copy link
Copy Markdown
Member

@B4nan B4nan commented Oct 6, 2021

Session cookies are no longer saved on the Request object, but passing via requestAsBrowserOptions instead. The cookie value gets calculated based on the session cookie and Request object - cookies are merged. preNavigationHooks are now fired after this merging happens, so with requestAsBrowserOptions filled with the cookie header.

If two cookies with similar name (where only casing differs), we print a warning, but merge them as two distict cookies.

The request.headers.cookies is still possible to override via pre-nav hooks, its value difference will be merged back to the requestAsBrowserOptions after pre-nav hooks are executed. This allows to work with both request.headers and requestAsBrowserOptions.headers inside the pre-nav hooks.

Closes #1197

Comment thread src/crawlers/cheerio_crawler.js Outdated
Comment thread src/crawlers/cheerio_crawler.js Outdated
B4nan added 4 commits October 6, 2021 17:24
This way we don't need to care about merging with `requestAsBrowserOptions`,
as those will be guaranteed to be empty, and it also gives us the final
`requestAsBrowserOptions` in the pre-nav hooks (e.g. with the session cookie).
@B4nan B4nan requested review from mnmkng and szmarczak October 6, 2021 16:31
@corford
Copy link
Copy Markdown
Contributor

corford commented Oct 6, 2021

@B4nan One quick comment (not sure how much it matters): _getRequestOptions() currently will overwrite any changes to request.headers.Cookie made in a preNavHook, with the contents of requestAsBrowserOptions.headers.Cookie.

I guess altering crawlingContext.request.headers in a preNavHook (which is what we currently do) should be considered an anti-pattern? i.e. header manipulation should always be done on requestAsBrowserOptions.headers ?

So far we've been doing it on crawlingContext.request.headers because it's the same access pattern for PuppeteerCrawler preNavHooks

@B4nan
Copy link
Copy Markdown
Member Author

B4nan commented Oct 6, 2021

Thanks for the input, this would indeed be quite a breaking change, I will try to implement support for both. Should be enough to diff the cookie from context.request.headers after hooks are executed, and merge it back to requestAsBrowserOptions.

Comment thread src/crawlers/crawler_utils.js Outdated
Copy link
Copy Markdown
Contributor

@szmarczak szmarczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except that cookie names are case sensitive

B4nan added 4 commits October 7, 2021 09:42
Instead of merging cookies with similar names (where only casing differs), we now
follow the spec and merge them as separate cookies, but warn (via `log.deprecated()`
to have only a single warning printed for each error) when we found such cookies.
@B4nan B4nan merged commit cccea78 into master Oct 7, 2021
@B4nan B4nan deleted the cookie-merging branch October 7, 2021 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CheerioCrawler overwrites request cookies when persistCookiesPerSession is false

4 participants