Different behavior between { headless: false } and { headless: true } #665

optikalefx · 2017-09-02T19:38:17Z

I'm curious to know what changes there are between running as headless true vs false. When I run a login to Amazon using headless: true I get an error from Amazon via the screenshot. But when I set headless: false I watch it work just fine, no error.

So I'm trying to figure out what headless: true is doing that is different from when it's not headless.

Thanks to any suggestions.

The text was updated successfully, but these errors were encountered:

Garbee · 2017-09-02T20:00:17Z

There could be any number of things going on. They could be looking for the Headless added to the UA string and blocking that. Or they could be using some techniques to detect automated access and prevent it.

If it works in non-headless and fails in headless then the site itself is doing something to prevent automated access. So you'd need to figure out what that is and work around it or move on. Some things are easy to get around (like modifying the UA string) while others are non-trivial to bypass.

kaushiksundar · 2017-09-02T23:38:20Z

I am also facing the same issue.

When Headless is false

page url ===> http://lvh.me:3000/dashboard

When Headless is true

page url ===> about:blank
(node:29206) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1)

Garbee · 2017-09-03T00:10:17Z

Can anyone provide an actual example file to run that reproduces this issue?

optikalefx · 2017-09-03T02:16:20Z

I will try to find something public that I can post. My example is confidential so I can't share it.

optikalefx · 2017-09-03T02:19:14Z

@Garbee just FYI, I'm setting the UA, so I don't think that's it. And I'm performing things like delays and mouse movement etc. Since the only difference is the headless: true it leads me to believe that there is something going on in the lib, and not on the site that I'm scraping. But I will keep trying and hopefully will find an example to post.

Are there other kinds of debugging maybe that can help point to where an issue might be?

kaushiksundar · 2017-09-03T03:49:27Z

@Garbee Here is the code. This happens only for localhost if I give the actual website URL (http://www.google.com... etc) it is working for both options.

const browser = await puppeteer.launch({headless: true});
  const page = await browser.newPage();
  await page.goto('localhost:3000', {
          networkIdleTimeout: 1000,
          waitUntil: 'networkidle',
          timeout: 3000000
        });
  console.log(page.url());

Output:
about:blank

Expected output:
localhost:3000

If headless is false I am getting the expected output.

optikalefx · 2017-09-03T04:27:38Z

I'll thicken the plot. I've started debugging the POST requests to my amazon login. When headless is set to true, Amazon is making an additional POST request that I don't recognize. That doesn't exist when headless is set to false. So that says to me something else is changing with this setting that I don't yet know.

optikalefx · 2017-09-03T04:55:36Z

I've also inspected the request and response for both headless and non-headless. They seem to be identical in nature.

LoganDark · 2017-09-03T08:05:02Z

In non-headless mode, screenshots work differently because my screen is in HiDPI mode (MacBook Retina). Here's one of the 'different' screenshots:

Garbee · 2017-09-03T10:10:23Z

Remember the protocol is required for urls in goto.

@LoganDark that is a different issue completely. Please file your own for triage and discussion.

LoganDark · 2017-09-03T10:21:40Z

Different issue? Well, I didn't know that because of the title.

LoganDark · 2017-09-03T10:22:54Z

Reading the issue description as well, nothing stands out to me that would make my issue completely different. Here are the parts that made me think my issue did belong here:

I'm curious to know what changes there are between running as headless true vs false.

So I'm trying to figure out what headless: true is doing that is different from when it's not headless.

kaushiksundar · 2017-09-03T11:56:15Z

@Garbee Yes giving the protocol in goto solves the issue.

await page.goto('http://localhost:3000', {
          networkIdleTimeout: 1000,
          waitUntil: 'networkidle',
          timeout: 3000000
        });
console.log(page.url)

If I don't give the protocol for google.com, am getting an error Error: Protocol error (Page.navigate): Cannot navigate to invalid URL undefined whereas for the above case I am getting about:blank. The error handling it done differently for localhosts.. Shouldn't it be giving the protocol error?

await page.goto('www.google.com', {
          networkIdleTimeout: 1000,
          waitUntil: 'networkidle',
          timeout: 3000000
        });
console.log(page.url)

Garbee · 2017-09-03T14:53:04Z

@LoganDark Sorry about the poorly worded title for the issue. There is nothing I can do about that. Your issue is with screenshot functionality while this was opened about some navigational problems. They are entirely distinct separated issues. Therefore a new issue is required to focus on your problem.

@kaushik-sundar Throwing an error for missing the protocol is a good idea IMO. I'll need to look into it though as it could be non-trivial to setup well due to the number of allowed protocols.

optikalefx · 2017-09-03T15:27:23Z

My apologies on the title, but I do agree that protocol issue is separate. My issue is more related to something about the request from the browser is different when headless is on vs off, causing the site in question to act differently.

rosshadden · 2017-09-03T16:36:58Z

Here is a gist of the problem. With params.isHeadless as false the browser opens and the form successfully logs in, whereas with it false I get an auth error page (which I actually cannot replicate through normal means no matter what kinds of correct/incorrect credential permutations I try to use).

Since the problem is behind an auth wall (or rather, the act of authenticating itself) I cannot share the exact code with my own credentials. However if you have or create your own vendorcentral account you should be able to see this behavior.

I wrote the code in such a way that it works for some other services as well, such as imgur. For this, just change params.url (to https://imgur.com/signin for example). It works on Imgur, which implies that Amazon is doing something explicit, however we have been as of yet unable to determine what that is, because as @optikalefx has said we have tried sporadic mouse movement, delayed typing, etc.

Note: I'll open another unrelated issue for this eventually as I need to do more research and experimentation, but I found that page.press('Enter') does not actually press the enter key. At least for me and my environment.

LoganDark · 2017-09-03T23:43:10Z

but I found that page.press('Enter') does not actually press the enter key

Try page.press('Return') as well..?

rosshadden · 2017-09-04T01:13:06Z

@LoganDark That didn't work either. I probably shouldn't have brought it up here at all, completely unrelated. Let's ignore it.

aslushnikov · 2017-09-05T18:14:15Z

I'm curious to know what changes there are between running as headless true vs false.

@optikalefx The major change is a user agent - chrome headless identifies itself as HeadlessChrome. Try running the following script in headless and headful modes:

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  console.log(await page.evaluate(() => navigator.userAgent));
  browser.close();
})();

User agent is sent with every request as a user-agent header. If there's a need, user-agent could be changed with the page.setUserAgent method.

In non-headless mode, screenshots work differently because my screen is in HiDPI mode (MacBook Retina). Here's one of the 'different' screenshots:

@LoganDark please, file a separate issue.

Here is a gist of the problem.

@rosshadden try overriding user-agent in your gist. If this doesn't help, please file a separate issue.

LoganDark · 2017-09-06T12:46:04Z

From @Garbee:

@LoganDark that is a different issue completely. Please file your own for triage and discussion.

From @Garbee again:

Therefore a new issue is required to focus on your problem.

From @aslushnikov

@LoganDark please, file a separate issue.

Yeah, 3 times already I've been told to file a different issue.

I haven't. And I won't right now.

Stop telling me to.

optikalefx · 2017-09-06T14:46:18Z

@aslushnikov we need to re-open this ticket IMO. I'm sorry that this issue had unrelated things in it. Setting the user-agent doesn't change anything - as in something is still different about the request. The result of that user-agent log after it's set is exactly what I set it to.

Can you think of anything else that changes when headless is set to true? Something that Amazon is able to detect? Maybe something about cookies? Maybe you could guide me in the right direction in the code and I can look through myself. Being unfamiliar with the codebase would make having a quick guidance very helpful.

Garbee · 2017-09-06T15:00:19Z

There are a few ways Amazon can be detecting headless access. Nothing can really be done internally about them if Amazon is implementing any techniques like this.

The only primary difference is the Headless in the UA string. Beyond that, everything should be functioning the same from the user perspective of headless, as stated before.

optikalefx · 2017-09-06T15:32:45Z

@Garbee super interesting. So, why can't we just define things like language, plugins etc? I can't set things on navigator, but I can polyfill other methods to prevent detection. Maybe you guys can set the navigator settings?

optikalefx · 2017-09-06T15:35:17Z

It looks like I can polyfill navigator using

Object.defineProperties(navigator, {
	 'plugins': {
	     value: ['adBlock'],
	      writable: true
	 }
});

optikalefx · 2017-09-06T15:47:44Z

Well I polyfilled everything in that article, and it passes all of those tests after the goto statement. But it still is getting caught. quite interesting.

rosshadden · 2017-09-06T19:20:32Z

@aslushnikov While my gist doesn't have a UA set, setting it was the first thing @optikalefx tried when we discovered this problem. What I can do is update my gist with setting the UA and the polyfills/workarounds we have tried since.

aslushnikov · 2017-09-13T18:58:09Z

@optikalefx @rosshadden Chrome headless is built atop of content/ layer and doesn't include chrome/ layer, whereas chrome headful includes both content/ and chrome/ layers. So naturally, there might be multiple subtle ways to detect headless.

More on chromium architecture could be found here:

koreus7 · 2018-01-10T15:23:25Z

As mentioned in the article @Garbee posted the headless version does not have languages set on the navigator object.

Note also that the headless version will not have languages set in its Accept-Language Header. Some sites (ASP.NET in my experience) require this header to be set. Other sites are looking for this header specifically to identify headless browsers.

I copied the value from an example request generated by my normal chrome install. There is probably a more minimal setting for this header that works.

await page.setExtraHTTPHeaders({
    'Accept-Language': 'en-GB,en-US;q=0.9,en;q=0.8'
});

hvaoc · 2018-04-20T20:12:43Z

@koreus7 - Solution worked for Amazon issue reported by @optikalefx

mercmobily · 2018-05-30T03:04:58Z

This is an absolute pearl. Thanks for sharing the code above.

optikalefx · 2018-05-30T11:24:05Z

I would also like to add, for our implementation, we turned on 2FA, and will keep it on. We have setup a number with Twilio or a Twilio like service to receive the SMS code, and then our login script receives that code from Twilio to enter into the 2FA. We require this b/c sometimes Amazon asks for it, and rather than a re-try sometimes code, we just always assume 2fa.

jondlm · 2018-06-07T04:17:52Z

For what it's worth I've also found that adding the following user agents override can help smooth over differences in some cases:

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36')

The UA I've provided is just an example. You can use any valid UA that matches an existing browser.

felixfbecker · 2018-06-08T02:26:27Z

I noticed another difference, when in non-headless mode the address seems to change localhost to 127.0.0.1 which means it's difficult to assert on the URL.

roeniss · 2018-08-21T15:59:21Z

as @jondlm said, UserAgent option make headless selenium work do same with non-headless selenium. thx.

stefpe · 2018-11-12T09:42:13Z

@koreus7 setting the languages works like a charm!

jslim89 · 2019-04-09T04:12:24Z

I get it works by adding this 2

await page.setExtraHTTPHeaders({
    'Accept-Language': 'en-US,en;q=0.9'
});
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36');

Must thanks to @koreus7 & @jondlm , it won't if miss out any 1 of it.

P/S: I was trying to access this site www.blibli.com

puppeteer/puppeteer#665

endel · 2019-09-28T15:19:14Z

I've made a fake user agent generator that works pretty fine!

function* generateUserAgent() {
  let webkitVersion = 10;
  let chromeVersion = 1000;

  const so = [
    'Windows NT 6.1; WOW64',
    'Windows NT 6.2; Win64; x64',
    "Windows NT 5.1; Win64; x64",
    'Macintosh; Intel Mac OS X 10_12_6',
    "X11; Linux x86_64",
    "X11; Linux armv7l"
  ];
  let soIndex = Math.floor(Math.random() * so.length);

  while (true) {
    yield `Mozilla/5.0 (${so[soIndex++ % so.length]}) AppleWebKit/537.${webkitVersion} (KHTML, like Gecko) Chrome/56.0.${chromeVersion}.87 Safari/537.${webkitVersion} OPR/43.0.2442.991`;

    webkitVersion++;
    chromeVersion++;
  }
}

const userAgents = generateUserAgent();

// ...
await page.setUserAgent(userAgents.next().value);

andreabisello · 2019-11-26T10:27:20Z

So headless true/false change user agent and other stuffs?
i have two different test that works on headless:false mode but fails on headless:true mode due to rendering differences of fonts and due to time needed to make a button clickable, but i cannot share due to confindential website.
I think headless true/false should not change rendering process.
Should i consider to set a common user agent to make behaviour more consistent?
thanks.

heathera2016 · 2019-11-27T02:08:42Z

My case is completely the opposite of the OP's situation. I got an Amazon's robot check while headless mode:false, and bypass while headless mode:true. I solved this issue thanks to @koreus7 Many thanks 👍

gdossant · 2019-12-14T00:07:17Z

Using @koreus7 and @jondlm comments solved my problem

Bhabaranjan19966 · 2019-12-24T12:07:33Z

Recently, I had the same experience of getting blocked because of using headless browser. While scraping a popular website. Even after adding proper headers and user agent it didn't work out.

Finally used puppeteer-extra with stealth mode plugin which fixed the problem.

This thread helped me a lot to figure out what all could go wrong.

Thanks @Garbee @optikalefx

andreabisello · 2019-12-24T14:22:26Z

@Bhabaranjan19966 so this https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra with this https://www.npmjs.com/package/puppeteer-extra-plugin-stealth ? i will try, thanks.

andreabisello · 2019-12-24T16:04:01Z

not working for me : headless and gui mode renders page in a little different way

Bhabaranjan19966 · 2019-12-27T11:34:33Z

@Bhabaranjan19966 so this https://github.com/berstend/puppeteer-extra/tree/master/packages/puppeteer-extra with this https://www.npmjs.com/package/puppeteer-extra-plugin-stealth ? i will try, thanks.

Yes, those are the two repositories fixed my problem. @andreabisello

pgibler · 2020-04-18T09:08:37Z

I'm having this same issue with peapod.com right now. In headful mode, my program runs successfully. In headless mode, I'm screenshotting to debug and see that the link is clicked, spinner is activated, but the page never changes. How can I debug this better? @aslushnikov , could you provide me some guidance?

mewtcor · 2020-04-24T11:55:11Z

Recently, I had the same experience of getting blocked because of using headless browser. While scraping a popular website. Even after adding proper headers and user agent it didn't work out.

Finally used puppeteer-extra with stealth mode plugin which fixed the problem.

This thread helped me a lot to figure out what all could go wrong.

Thanks @Garbee @optikalefx

The stealth mode did the trick for me too! TYVM

peterhil · 2020-05-07T07:22:43Z

None of these suggested solutions work on Mac OS X. To reproduce:

Change your system language to something other than en-US or en, so that applications use that locale.
Test a browser extension or web site that is internationalised by selected user locale.
It is impossible to test or change the browser locale to en-US on non-headless mode at least.

What I am trying to do, is setup testing with Puppeteer for my browser extension Spellbook.

I have the first test now passing on Mac OS X (using some Finnish strings), and it is probably failing on other systems when you do yarn run test:puppeteer, because I use every method of setting the locale: peterhil/spellbook@3480a73

harshvats2000 · 2020-12-27T13:26:22Z

For what it's worth I've also found that adding the following user agents override can help smooth over differences in some cases:
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36')
The UA I've provided is just an example. You can use any valid UA that matches an existing browser.

Add this just below where page is defined.

mishra5047 · 2021-06-04T05:32:46Z

I'm curious to know what changes there are between running as headless true vs false. When I run a login to Amazon using headless: true I get an error from Amazon via the screenshot. But when I set headless: false I watch it work just fine, no error.

So I'm trying to figure out what headless: true is doing that is different from when it's not headless.

Thanks to any suggestions.

I am using puppeteer to make an simple automation script to login into my google account, it's working fine in headless: false mode but in case of headless: true it's showing selector not found;

sajjadafridi · 2021-08-04T07:29:55Z

Can anyone provide an actual example file to run that reproduces this issue?

i have the same problem on grainger.com. When set --headless : false it is working but headless: true return promise handling error

any help will be appreciated

UsmanGhani-Emumba · 2022-01-19T08:12:57Z

All the above methods are not working for me, I am still facing issues related to headless and normal mode. Any help will be appreciated

JavedBoqo · 2022-09-28T08:23:07Z

I got same issue and with combination of puppeteer-extra and following lib solved the issue
https://www.npmjs.com/package/puppeteer-extra-plugin-stealth

aslushnikov closed this as completed Sep 5, 2017

iNeoO mentioned this issue Apr 9, 2019

Blank screenshot with headless true, else fine. #1755

Closed

andresriancho added a commit to andresriancho/w3af that referenced this issue May 25, 2019

Adding accept-language to prevent issues while crawling sites

d3b67d6

puppeteer/puppeteer#665

Swatinem mentioned this issue Jul 30, 2019

Do not throw on invalid language tags eversport/intl-codegen#39

Closed

andreabisello mentioned this issue Dec 24, 2019

headless and not headless mode are rendering the same page in a little different mode #5214

Closed

leadscloud mentioned this issue Jul 12, 2020

Puppeteer stealth mode still being detected by Datadom berstend/puppeteer-extra#182

Closed

simoconfa mentioned this issue Aug 30, 2020

Headless mode not working sup3rgiu/PoliDown#39

Open

lancejpollard mentioned this issue Aug 3, 2021

What does puppeteer do differently than a normal browser? #7456

Closed

Different behavior between { headless: false } and { headless: true } #665

Different behavior between { headless: false } and { headless: true } #665

Comments

optikalefx commented Sep 2, 2017

Garbee commented Sep 2, 2017

kaushiksundar commented Sep 2, 2017

Garbee commented Sep 3, 2017

optikalefx commented Sep 3, 2017

optikalefx commented Sep 3, 2017

kaushiksundar commented Sep 3, 2017 • edited

optikalefx commented Sep 3, 2017

optikalefx commented Sep 3, 2017

LoganDark commented Sep 3, 2017 • edited

Garbee commented Sep 3, 2017

LoganDark commented Sep 3, 2017

LoganDark commented Sep 3, 2017

kaushiksundar commented Sep 3, 2017

Garbee commented Sep 3, 2017

optikalefx commented Sep 3, 2017

rosshadden commented Sep 3, 2017 • edited

LoganDark commented Sep 3, 2017 • edited

rosshadden commented Sep 4, 2017

aslushnikov commented Sep 5, 2017

LoganDark commented Sep 6, 2017 • edited

optikalefx commented Sep 6, 2017

Garbee commented Sep 6, 2017

optikalefx commented Sep 6, 2017

optikalefx commented Sep 6, 2017 • edited

optikalefx commented Sep 6, 2017

rosshadden commented Sep 6, 2017

aslushnikov commented Sep 13, 2017

koreus7 commented Jan 10, 2018 • edited

hvaoc commented Apr 20, 2018

mercmobily commented May 30, 2018

optikalefx commented May 30, 2018

jondlm commented Jun 7, 2018

felixfbecker commented Jun 8, 2018

roeniss commented Aug 21, 2018

stefpe commented Nov 12, 2018

jslim89 commented Apr 9, 2019

endel commented Sep 28, 2019

andreabisello commented Nov 26, 2019

heathera2016 commented Nov 27, 2019

gdossant commented Dec 14, 2019

Bhabaranjan19966 commented Dec 24, 2019

andreabisello commented Dec 24, 2019

andreabisello commented Dec 24, 2019

Bhabaranjan19966 commented Dec 27, 2019

pgibler commented Apr 18, 2020

mewtcor commented Apr 24, 2020

peterhil commented May 7, 2020 • edited

harshvats2000 commented Dec 27, 2020

mishra5047 commented Jun 4, 2021

sajjadafridi commented Aug 4, 2021

UsmanGhani-Emumba commented Jan 19, 2022

JavedBoqo commented Sep 28, 2022

kaushiksundar commented Sep 3, 2017 •

edited

LoganDark commented Sep 3, 2017 •

edited

rosshadden commented Sep 3, 2017 •

edited

LoganDark commented Sep 3, 2017 •

edited

LoganDark commented Sep 6, 2017 •

edited

optikalefx commented Sep 6, 2017 •

edited

koreus7 commented Jan 10, 2018 •

edited

peterhil commented May 7, 2020 •

edited