New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different behavior between { headless: false } and { headless: true } #665
Comments
There could be any number of things going on. They could be looking for the If it works in non-headless and fails in headless then the site itself is doing something to prevent automated access. So you'd need to figure out what that is and work around it or move on. Some things are easy to get around (like modifying the UA string) while others are non-trivial to bypass. |
I am also facing the same issue. When Headless is
When Headless is
|
Can anyone provide an actual example file to run that reproduces this issue? |
I will try to find something public that I can post. My example is confidential so I can't share it. |
@Garbee just FYI, I'm setting the UA, so I don't think that's it. And I'm performing things like delays and mouse movement etc. Since the only difference is the Are there other kinds of debugging maybe that can help point to where an issue might be? |
@Garbee Here is the code. This happens only for localhost if I give the actual website URL (http://www.google.com... etc) it is working for both options.
Output: Expected output: If headless is false I am getting the expected output. |
I'll thicken the plot. I've started debugging the POST requests to my amazon login. When headless is set to true, Amazon is making an additional POST request that I don't recognize. That doesn't exist when headless is set to false. So that says to me something else is changing with this setting that I don't yet know. |
I've also inspected the request and response for both headless and non-headless. They seem to be identical in nature. |
Remember the protocol is required for urls in @LoganDark that is a different issue completely. Please file your own for triage and discussion. |
Different issue? Well, I didn't know that because of the title. |
Reading the issue description as well, nothing stands out to me that would make my issue completely different. Here are the parts that made me think my issue did belong here:
|
@Garbee Yes giving the protocol in goto solves the issue.
If I don't give the protocol for google.com, am getting an error
|
@LoganDark Sorry about the poorly worded title for the issue. There is nothing I can do about that. Your issue is with screenshot functionality while this was opened about some navigational problems. They are entirely distinct separated issues. Therefore a new issue is required to focus on your problem. @kaushik-sundar Throwing an error for missing the protocol is a good idea IMO. I'll need to look into it though as it could be non-trivial to setup well due to the number of allowed protocols. |
My apologies on the title, but I do agree that protocol issue is separate. My issue is more related to something about the request from the browser is different when headless is on vs off, causing the site in question to act differently. |
Here is a gist of the problem. With Since the problem is behind an auth wall (or rather, the act of authenticating itself) I cannot share the exact code with my own credentials. However if you have or create your own vendorcentral account you should be able to see this behavior. I wrote the code in such a way that it works for some other services as well, such as imgur. For this, just change
|
Try |
@LoganDark That didn't work either. I probably shouldn't have brought it up here at all, completely unrelated. Let's ignore it. |
@optikalefx The major change is a user agent - chrome headless identifies itself as const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
console.log(await page.evaluate(() => navigator.userAgent));
browser.close();
})(); User agent is sent with every request as a user-agent header. If there's a need, user-agent could be changed with the
@LoganDark please, file a separate issue.
@rosshadden try overriding user-agent in your gist. If this doesn't help, please file a separate issue. |
From @Garbee:
From @Garbee again:
From @aslushnikov
Yeah, 3 times already I've been told to file a different issue. I haven't. And I won't right now. Stop telling me to. |
@aslushnikov we need to re-open this ticket IMO. I'm sorry that this issue had unrelated things in it. Setting the user-agent doesn't change anything - as in something is still different about the request. The result of that user-agent log after it's set is exactly what I set it to. Can you think of anything else that changes when headless is set to true? Something that Amazon is able to detect? Maybe something about cookies? Maybe you could guide me in the right direction in the code and I can look through myself. Being unfamiliar with the codebase would make having a quick guidance very helpful. |
There are a few ways Amazon can be detecting headless access. Nothing can really be done internally about them if Amazon is implementing any techniques like this. The only primary difference is the |
@Garbee super interesting. So, why can't we just define things like language, plugins etc? I can't set things on navigator, but I can polyfill other methods to prevent detection. Maybe you guys can set the navigator settings? |
It looks like I can polyfill navigator using
|
Well I polyfilled everything in that article, and it passes all of those tests after the |
@aslushnikov While my gist doesn't have a UA set, setting it was the first thing @optikalefx tried when we discovered this problem. What I can do is update my gist with setting the UA and the polyfills/workarounds we have tried since. |
@optikalefx @rosshadden Chrome headless is built atop of More on chromium architecture could be found here: |
As mentioned in the article @Garbee posted the headless version does not have languages set on the navigator object. Note also that the headless version will not have languages set in its Accept-Language Header. Some sites (ASP.NET in my experience) require this header to be set. Other sites are looking for this header specifically to identify headless browsers. I copied the value from an example request generated by my normal chrome install. There is probably a more minimal setting for this header that works. await page.setExtraHTTPHeaders({
'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
}); |
@koreus7 - Solution worked for Amazon issue reported by @optikalefx |
This is an absolute pearl. Thanks for sharing the code above. |
I would also like to add, for our implementation, we turned on 2FA, and will keep it on. We have setup a number with Twilio or a Twilio like service to receive the SMS code, and then our login script receives that code from Twilio to enter into the 2FA. We require this b/c sometimes Amazon asks for it, and rather than a re-try sometimes code, we just always assume 2fa. |
For what it's worth I've also found that adding the following user agents override can help smooth over differences in some cases:
The UA I've provided is just an example. You can use any valid UA that matches an existing browser. |
I noticed another difference, when in non-headless mode the address seems to change |
as @jondlm said, UserAgent option make headless selenium work do same with non-headless selenium. thx. |
@koreus7 setting the languages works like a charm! |
I get it works by adding this 2 await page.setExtraHTTPHeaders({
'Accept-Language': 'en-US,en;q=0.9'
});
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36'); Must thanks to @koreus7 & @jondlm , it won't if miss out any 1 of it. P/S: I was trying to access this site www.blibli.com |
I've made a fake user agent generator that works pretty fine! function* generateUserAgent() {
let webkitVersion = 10;
let chromeVersion = 1000;
const so = [
'Windows NT 6.1; WOW64',
'Windows NT 6.2; Win64; x64',
"Windows NT 5.1; Win64; x64",
'Macintosh; Intel Mac OS X 10_12_6',
"X11; Linux x86_64",
"X11; Linux armv7l"
];
let soIndex = Math.floor(Math.random() * so.length);
while (true) {
yield `Mozilla/5.0 (${so[soIndex++ % so.length]}) AppleWebKit/537.${webkitVersion} (KHTML, like Gecko) Chrome/56.0.${chromeVersion}.87 Safari/537.${webkitVersion} OPR/43.0.2442.991`;
webkitVersion++;
chromeVersion++;
}
}
const userAgents = generateUserAgent();
// ...
await page.setUserAgent(userAgents.next().value); |
So headless true/false change user agent and other stuffs? |
My case is completely the opposite of the OP's situation. I got an Amazon's robot check while headless mode:false, and bypass while headless mode:true. I solved this issue thanks to @koreus7 Many thanks 👍 |
Recently, I had the same experience of getting blocked because of using headless browser. While scraping a popular website. Even after adding proper headers and user agent it didn't work out. Finally used puppeteer-extra with stealth mode plugin which fixed the problem. This thread helped me a lot to figure out what all could go wrong. Thanks @Garbee @optikalefx |
@Bhabaranjan19966 so this https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra with this https://www.npmjs.com/package/puppeteer-extra-plugin-stealth ? i will try, thanks. |
Yes, those are the two repositories fixed my problem. @andreabisello |
I'm having this same issue with peapod.com right now. In headful mode, my program runs successfully. In headless mode, I'm screenshotting to debug and see that the link is clicked, spinner is activated, but the page never changes. How can I debug this better? @aslushnikov , could you provide me some guidance? |
The stealth mode did the trick for me too! TYVM |
None of these suggested solutions work on Mac OS X. To reproduce:
What I am trying to do, is setup testing with Puppeteer for my browser extension Spellbook. I have the first test now passing on Mac OS X (using some Finnish strings), and it is probably failing on other systems when you do |
Add this just below where page is defined. |
I am using puppeteer to make an simple automation script to login into my google account, it's working fine in headless: false mode but in case of headless: true it's showing selector not found; |
i have the same problem on grainger.com. When set --headless : false it is working but headless: true return promise handling error any help will be appreciated |
All the above methods are not working for me, I am still facing issues related to headless and normal mode. Any help will be appreciated |
I got same issue and with combination of puppeteer-extra and following lib solved the issue |
I'm curious to know what changes there are between running as headless true vs false. When I run a login to Amazon using
headless: true
I get an error from Amazon via the screenshot. But when I setheadless: false
I watch it work just fine, no error.So I'm trying to figure out what
headless: true
is doing that is different from when it's not headless.Thanks to any suggestions.
The text was updated successfully, but these errors were encountered: