Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies not saving to userDataDir - call chromedp.Cancel() to gracefully close the browser #818

Closed
Jleagle opened this issue May 11, 2021 · 7 comments

Comments

@Jleagle
Copy link

Jleagle commented May 11, 2021

What versions are you running?

chromedp v0.6.10
chrome 90.0.4430.93
go go1.16.3 darwin/amd64

What did you do? Include clear steps.

I am requesting a page that sets cookies. I have set the user data directory and i see it creates some caches and a Cookies db. Having a look inside the Cookies db, i see a cookies table, but no rows, the other tables have rows saved.

What did you expect to see?

I expected any saved cookies to be saved into the Cookies db and any following requests to use the Cookies db.

What did you see instead?

I have yet to see anything saved to the cookies table. If i call network.GetAllCookies() it does return cookies. I am having to save them at the end of the request and set the at the start of the next request using network.SetCookie()

Here are my actions: https://github.com/Jleagle/gym-tracker/blob/master/backend/scraper.go#L106

Thanks.

@ZekeLu
Copy link
Member

ZekeLu commented May 11, 2021

Can you start the browser from command line with --user-data-dir=your-dir, and see whether the cookies are stored without chromedp?

I guess the reason is that the browser won't store session cookies.

@Jleagle
Copy link
Author

Jleagle commented May 12, 2021

Tried running the regular version of Chrome with --user-data-dir=your-dir and it filled the Cookies db up with cookies. A session cookie and the rest third party cookies.

I was not sure if Chrome actually saved session cookies to disk, but even if it did not, i am trying to use the same Chrome instance so thought it would keep the session from the last request. Thanks.

@ZekeLu
Copy link
Member

ZekeLu commented May 13, 2021

It took some time to figure out what's wrong.

TL;DR: Just call chromedp.Cancel(ctx) to gracefully close the browser so that the browser has the chance to store the cookies.

In fact, it's well documented:

chromedp/chromedp.go

Lines 181 to 195 in 864094d

// Cancel cancels a chromedp context, waits for its resources to be cleaned up,
// and returns any error encountered during that process.
//
// If the context allocated a browser, the browser will be closed gracefully by
// Cancel. A timeout can be attached to this context to determine how long to
// wait for the browser to close itself:
//
// tctx, tcancel := context.WithTimeout(ctx, 10 * time.Second)
// defer tcancel()
// chromedp.Cancel(tctx)
//
// Usually a "defer cancel()" will be enough for most use cases. However, Cancel
// is the better option if one wants to gracefully close a browser, or catch
// underlying errors happening during cancellation.
func Cancel(ctx context.Context) error {

@Jleagle Please let me know if it works for you.

The long story

First, I reproduced the issue with this reproducer:

package main

import (
	"context"
	"log"
	"net/http"
	"net/http/httptest"
	"time"

	"github.com/chromedp/chromedp"
)

func main() {
	s := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		log.Printf("received cookies: %v", r.Cookies())

		v := time.Now().Format(time.RFC3339Nano)
		log.Println("cookie value to set:", v)
		http.SetCookie(w, &http.Cookie{
			Name:  "s-cookie",
			Value: v,
		})
		http.SetCookie(w, &http.Cookie{
			Name:    "p-cookie",
			Value:   v,
			Expires: time.Now().Add(24 * time.Hour),
		})
	}))
	defer s.Close()

	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.UserDataDir("/tmp/cdp"),
	)
	ctx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()
	ctx, cancel = chromedp.NewContext(ctx,
		//chromedp.WithDebugf(log.Printf),
	)
	defer cancel()

	if err := chromedp.Run(
		ctx,
		chromedp.Navigate(s.URL),
	); err != nil {
		log.Println(err)
	}
}

Then I checked https://github.com/puppeteer/puppeteer. It seems that puppeteer's users are frustrated with this feature too (puppeteer/puppeteer#921). And it pointed me to https://github.com/microsoft/playwright.

Interestingly, playwright works! I copied playwright's command line arguments passed to the browser, but with no luck. Then I checked the protocol messages of 'playwright`, nothing special.

During the process of testing, the reproducer did store cookies sometimes. So I turned back to it. Maybe the browser was killed too fast that it did not have the chance to store the cookies? I added time.Sleep() to the end of the reproducer, and luckily, the cookies got stored! 🎉 🎉 (In fact, I almost missed it; because it does not work if the duration is less than 31 seconds). And it becomes easy now. I just need to find a way to gracefully close the browser. So here comes crhomedp.Cancel().

@ZekeLu ZekeLu changed the title Cookies not saving to userDataDir Cookies not saving to userDataDir - call chromedp.Cancel() to gracefully close the browser May 13, 2021
@ZekeLu ZekeLu closed this as completed May 17, 2021
@Jleagle
Copy link
Author

Jleagle commented May 18, 2021

Hey @ZekeLu, sorry took a while to reply.

So Cancel() does save the cookies to disk. Except session cookies which is to be expected.

It seems odd to me that you need to do this though as it's not how normal Chrome behaves. The cookies get saved to sqlite instantly etc.

I was aiming to only launch Chrome once and reuse the process, just with different cookies on each request, so i don't think calling Cancel() is right for me. Plus i want to keep session cookies for the next request.

So is saving the cookies myself using GetAllCookies() & SetCookie() the only way to persist session cookies?

Thanks.

@ZekeLu
Copy link
Member

ZekeLu commented May 18, 2021

Except session cookies which is to be expected.

Yeah, session cookies won't be stored. After all, it's session cookies.

as it's not how normal Chrome behaves. The cookies get saved to sqlite instantly etc.

That's not the case. Chrome will store the cookies periodically (maybe every 30 seconds?). And when you start Chrome manually, and then close it, you do a gracefully shutdown, and Chrome gets the last chance to store unsaved cookies.

Plus i want to keep session cookies for the next request.

If you share the same browser instance, the session cookies will be shared.

I just checked your code. It's incorrect to create an ExecAllocator from a browser context (baseContext in your code). The following snippet shows how to share cookies in a single browser instance.

// ...
opts := append(chromedp.DefaultExecAllocatorOptions[:],
	chromedp.DisableGPU,
	chromedp.UserDataDir(abs+"/user-data/"+credential.Email),
)
allocatorCtx, allocatorCancel := chromedp.NewExecAllocator(context.Background(), opts...)
// it depends how will you manage the lifecycle of the browser, maybe you don't want to call allocatorCancel() here
defer allocatorCancel()
browserCtx, browserCancel := chromedp.NewContext(allocatorCtx)
// it depends how will you manage the lifecycle of the browser, maybe you don't want to call browserCancel() here
defer browserCancel()

// now start a browser
if err := chromedp.Run(browserCtx); err != nil {
	log.Fatal(err)
}

// now browserCtx can be used directly
// I will omit the timeout context here
if err := chromedp.Run(browserCtx,
	// tasks
); err != nil {
	log.Fatal(err)
}
// or create new tabs in the browser
tabCtx, cancel := chromedp.NewContext(browserCtx)
defer cancel()
// I will omit the timeout context here
if err := chromedp.Run(tabCtx,
	// tasks
); err != nil {
	log.Fatal(err)
}
// All the tasks above will share the cookies
// ...

@ZekeLu ZekeLu reopened this May 18, 2021
@Jleagle
Copy link
Author

Jleagle commented May 21, 2021

Thanks, makes more sense now.

I have rearranged my contexts to match your post above, but this does mean i will have to create a new browser every time i change the userDataDir, is this correct? Unless I am mistaken it has to be this way as the base context is created from chromedp.NewExecAllocator() which is where the userDataDir is set.

@ZekeLu
Copy link
Member

ZekeLu commented May 21, 2021

but this does mean i will have to create a new browser every time i change the userDataDir, is this correct?

Correct. But the reason behind that is that a single browser instance can not have multiple --user-data-dir. If you need to separate the data, you have to create multiple browsers. It's not chromedp.NewExecAllocator() that introduces the constraint, it just reflect the behavior of the browser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants