Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(puppeteer): enable new headless mode #1910

Merged
merged 3 commits into from Nov 15, 2023
Merged

feat(puppeteer): enable new headless mode #1910

merged 3 commits into from Nov 15, 2023

Conversation

B4nan
Copy link
Member

@B4nan B4nan commented May 11, 2023

https://developer.chrome.com/articles/new-headless/

To opt out of it and keep using the old headless, add headless: 'old to your puppeteer crawler options.

@B4nan B4nan marked this pull request as ready for review May 11, 2023 12:45
@Strajk
Copy link
Contributor

Strajk commented May 15, 2023

Technically, the old Headless was a separate, alternate browser implementation that happened to be shipped as part of the same Chrome binary. It doesn’t share any of the Chrome browser code in //chrome.

wowo, this new proper headless could solve a lot of small issue 🔥

@B4nan
Copy link
Member Author

B4nan commented May 22, 2023

We will probably keep this for later, from what I've read the new headless mode can be much slower. I haven't seen that in my initial testing, but we need to be more careful here. Will probably make this configurable from outside, so you can opt in to the new headless mode and give it a try yourself - now it will probably fail on some ow validation that requires a boolean, but maybe you could hack it through the launcher args.

@mtrunkat mtrunkat added the t-tooling Issues with this label are in the ownership of the tooling team. label Jul 18, 2023
@Dineshhardasani
Copy link
Contributor

any update on this?

@B4nan
Copy link
Member Author

B4nan commented Aug 1, 2023

Nope, same as my last comment. But I think you can opt-in for the new headless mode already, this PR only changes the default.

Something like this should work:

const crawler = new PuppeteerCrawler({
    launchContext: {
        launchOptions: {
            headless: 'new',
        },
    },
    // ...
});

@Dineshhardasani
Copy link
Contributor

I have tried headless new but it is taking very high memory, have you tried benchmarking it with headless true? @B4nan

@abhisheksurve45
Copy link

Can we try to test this out @B4nan. Facing similar memory issues with headless new.

@B4nan
Copy link
Member Author

B4nan commented Aug 10, 2023

I feel like you guys misunderstood this completely. This PR enables the new headless mode. That is what you can already do yourself, as I showed in my last comment. And as I already said, the new headless mode has a significant memory impact, so we don't want to adopt this just now.

Saying you have memory issues with the new headless mode just confirms what I said already - this is not something we can fix anyhow.

@github-actions github-actions bot added this to the 76th sprint - Tooling team milestone Nov 15, 2023
@B4nan B4nan added the adhoc Ad-hoc unplanned task added during the sprint. label Nov 15, 2023
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

developer.chrome.com/articles/new-headless

To opt out of it and keep using the old headless, add `headless: 'old` to your puppeteer crawler options.
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Nov 15, 2023
@B4nan
Copy link
Member Author

B4nan commented Nov 15, 2023

Potential cause of the broken tests: puppeteer/puppeteer#10017

Things are working fine locally, so might be a linux issue (but that's the main target for running crawlee in the cloud).

@B4nan B4nan merged commit 7fc999c into master Nov 15, 2023
8 checks passed
@B4nan B4nan deleted the headless-new branch November 15, 2023 16:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants