[BUG] Headless 'new' mode uses a headless user agent #90

cernadasjuan · 2023-04-21T14:01:19Z

Environment

chromium Version: 112.0.2
puppeteer / puppeteer-core Version: 19.9.1
Node.js Version: 18
Lambda / GCF Runtime: 18

Expected Behavior

Using headless: 'new' should use a non headless user agent (for example, Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36)

Current Behavior

I'm using those launch options

launchOptions: {
    args: [
      '--allow-pre-commit-input',
      '--disable-background-networking',
      '--disable-background-timer-throttling',
      '--disable-backgrounding-occluded-windows',
      '--disable-breakpad',
      '--disable-client-side-phishing-detection',
      '--disable-component-extensions-with-background-pages',
      '--disable-component-update',
      '--disable-default-apps',
      '--disable-dev-shm-usage',
      '--disable-extensions',
      '--disable-hang-monitor',
      '--disable-ipc-flooding-protection',
      '--disable-popup-blocking',
      '--disable-prompt-on-repost',
      '--disable-renderer-backgrounding',
      '--disable-sync',
      '--enable-automation',
      '--enable-blink-features=IdleDetection',
      '--export-tagged-pdf',
      '--force-color-profile=srgb',
      '--metrics-recording-only',
      '--no-first-run',
      '--password-store=basic',
      '--use-mock-keychain',
      '--disable-domain-reliability',
      '--disable-print-preview',
      '--disable-speech-api',
      '--disk-cache-size=33554432',
      '--mute-audio',
      '--no-default-browser-check',
      '--no-pings',
      '--single-process',
      '--disable-features=Translate,BackForwardCache,AcceptCHFrame,MediaRouter,OptimizationHints,AudioServiceOutOfProcess,IsolateOrigins,site-per-process',
      '--enable-features=NetworkServiceInProcess2,SharedArrayBuffer',
      '--hide-scrollbars',
      '--ignore-gpu-blocklist',
      '--in-process-gpu',
      '--window-size=1920,1080',
      '--use-gl=angle',
      '--allow-running-insecure-content',
      '--disable-setuid-sandbox',
      '--disable-site-isolation-trials',
      '--disable-web-security',
      '--no-sandbox',
      '--no-zygote',
      '--headless=new',
    ],
    executablePath: '/tmp/chromium',
    headless: 'new'
  }

and when I obtain the user agent, it has the value Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/112.0.5614.0 Safari/537.36. I tried to override the user agent, it works but I'm still being detected by an antibot, so I suspect the headless 'new' mode is not working well (In local environment, using a local chromium, it's working ok)

The text was updated successfully, but these errors were encountered:

ckosmic · 2023-04-21T16:13:30Z

I'm experiencing the same issue as well.

Sparticuz · 2023-04-21T18:58:40Z

I wonder if it's because i'm building using the headless.gn build

cernadasjuan · 2023-04-21T19:07:43Z

@Sparticuz it makes sense, according to https://developer.chrome.com/articles/new-headless/#whats-new-in-headless, until now the headless mode was a separate implementation.

jacobi973 · 2023-05-01T23:25:45Z

I am as well experiencing some issues running the new headless on lambda. It seems not to apply it. @Sparticuz Are there other builds besides the headless.gn build?

Sparticuz · 2023-05-02T00:14:21Z

I've been working on moving to a normal build, but each compilation takes over an hour and a few bucks on AWS using an 8xl memory instance. There is a dbus error I've been trying to pin point. It's slow going.

cernadasjuan · 2023-05-02T16:39:43Z

Thanks @Sparticuz ! As a workaround, I created a custom docker image with the last chromium build installed, and using lambda container images it's working! This is the Dockerfile:

FROM public.ecr.aws/lambda/nodejs:18

RUN yum install -y unzip && \
  curl -Lo "/tmp/chrome-linux.zip" "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F1129993%2Fchrome-linux.zip?alt=media" && \
  unzip /tmp/chrome-linux.zip -d /opt/

RUN yum install atk cups-libs gtk3 libXcomposite alsa-lib \
    libXcursor libXdamage libXext libXi libXrandr libXScrnSaver \
    libXtst pango at-spi2-atk libXt xorg-x11-server-Xvfb \
    xorg-x11-xauth dbus-glib dbus-glib-devel -y

RUN mv /opt/chrome-linux /opt/chrome
 
# Copy handler function and package.json
ADD dist/ ./
ADD node_modules ./node_modules
 
 
# Set the CMD to your handler
CMD [ "/var/task/app.handler" ]

Then, in the puppeteer project, you should use /opt/chrome/chrome as executablePath

jacobi973 · 2023-05-02T22:39:43Z

@cernadasjuan I am trying your work around solution and seem to be getting errors while running the lambda image container. Would you mind sharing an example repo if you could? Thanks

cernadasjuan · 2023-05-03T14:12:13Z

Hey @jacobi973! which error are you facing? Maybe I can help you (I don't have a public repo 😢 )

jacobi973 · 2023-05-03T22:14:22Z

@cernadasjuan I am getting
Failed to launch the browser process!\n[0502/224222.049000:WARNING:crashpad_client_linux.cc(364)] prctl: Operation not permitted (1)\nprctl(PR_SET_NO_NEW_PRIVS) failed\n[23:23:0502/224223.263811:FATAL:zygote_communication_linux.cc(270)] Cannot communicate with zygote\.....

To be honest I haven't worked with docker/lambda images and so I am probably doing something wrong while building the container potentially. Here is a repository I just created showing the steps I tried to get it up and running. puppeteer-docker . Maybe its something simple? I am not sure.

If you could show us how to get a docker/lambda image running for us on lambda we would be happy to hire you for that! You can contact my boss here to set that up. jordan@cobaltintelligence.com

Sparticuz · 2023-05-04T15:06:20Z

@cernadasjuan In doing some research about the new headless mode, it's not meant to evade bot detection. In fact, I've seen comments where it specifically marks itself as headless. It's meant to close the gap between how headless and headful mode render content. I'm leaning towards this is not a bug, however, I'd like to know how you are determining that it's not working. Is there a page that will tell if you are using old or new headless mode?

Sparticuz · 2023-05-04T15:13:44Z

Another thing that might be affecting this is args. I've seen some flags, especially the --single-process, affect 'bot detection'.

jacobi973 · 2023-05-04T18:58:15Z

@Sparticuz Just chiming in some ways that I have noticed that I believe that 'new' headless may not be working correctly on lambda. Here is a screenshot that on the left that shows a lambda function run with 'headless new' and then on the right is 'headless new' locally. The plugins and the user agent bot differ from what they should be with the 'headless new'

Here is one more screenshot of it locally run with the headless: false option. It shows the same results as headless: 'new' run in lambda.

Here is my code for lambda that I am running. Perhaps I am doing it wrong?
`export async function handler() {

chromium.setHeadlessMode = 'new';

chromium.setGraphicsMode = false;

const browser = await puppeteer.launch({
	args: chromium.args,
	defaultViewport: chromium.defaultViewport,
	executablePath: await chromium.executablePath(),
	headless: chromium.headless,
});
const page = await browser.newPage();

console.log('page created');
const url = 'http://pizza.com/'
console.log('going to the page');

await page.goto(url)
console.log('made it to the page', page.url());
await page.evaluate(() => console.log(Notification.permission));
// show webdrivers
await page.evaluate(() => console.log('navigator.webdriver', navigator.webdriver));

const userAgentB = await page.evaluate(() => navigator.userAgent);
console.log('userAgent: ', userAgentB);
// check navigator plugins
const plugins = await page.evaluate(() => navigator.plugins);
console.log('plugins: ', plugins);

// check webdrivers
const webdrivers = await page.evaluate(() => navigator.webdriver);
console.log('webdrivers: ', webdrivers);
await page.waitForTimeout(2000);

return {
	statusCode: 200,
	body: JSON.stringify({
		message: 'success',
		// location: res.data.trim()
	}),
}}`

Here's a blog post about it as well from someone in data dome.
https://antoinevastel.com/bot%20detection/2023/02/19/new-headless-chrome.html#:~:text=The%20first%20difference%20we,111.0.0.0%20Safari/537.36.

b5414 · 2023-05-04T23:35:52Z

Same bot_detection (WEBDRIVER) with:

const browser_cfg = {
	headless: true,
	channel: 'chrome',
	timeout: 0,
	ignoreHTTPSErrors: true,
	args: [
		...chromium_sparticuz.args,
		'--single-process',
		'--disable-dev-shm-usage',
	],
},

jacobi973 · 2023-05-22T21:29:20Z

Just checking back in on this issue. I don't know if I quite understand the process of getting all of this to work, but I'd be willing to help out anyway I could.

Sparticuz · 2023-05-26T01:52:07Z

It would be nice to have a website that can determine which headless version is being run. At this point, I believe the issue lies with the headless_shell build target, however, I'm not able to get the chrome build target to build on AL2023. Getting an error about dbus.

piercefreeman · 2023-08-26T18:05:25Z

Looked into this last week, there are a few different things going on here:

1. Headless arguments

The new headless mode is pretty particular about what CLI arguments it expects. It only launches with --headless=new, whereas the current flag generation logic uses headless='new'.

2. User Agent

As @jacobi973 pointed out, the user agent does reveal whether we're using headless v1 or v2. V1 will have:

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/116.0.5845.82 Safari/537.36

While V2 has:

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/116.0.0.0 Safari/537.36

If you're testing for the user agent remotely in lambda, you can either add some logging or ping a user agent detection website that echos back the content of this header.

3. Build target

The combination of headless.gn / the headless_shell target will only build the old headless codebase. I have a fork going where I'm trying to build the full chromium payload, which should support V1 and V2. Will keep this thread posted on progress there. /cc @Sparticuz

iwaduarte · 2024-02-19T15:44:29Z

@Sparticuz hey mate. I have been testing recently the setup configurations and still Headless chrome is showing in the header:

Node 20. AL2023.
Are we still building headless.gn?

Is there anything I should be doing? Thanks!

cernadasjuan added the bug Something isn't working label Apr 21, 2023

This comment was marked as off-topic.

Sign in to view

Sparticuz added the help wanted Extra attention is needed label Jun 2, 2023

This comment was marked as off-topic.

Sign in to view

Sparticuz mentioned this issue Feb 22, 2024

Chromium 122 #233

Merged

Sparticuz mentioned this issue Apr 3, 2024

Can't bypass Cloudflare protection using headless browser in AWS Lambda. #107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Headless 'new' mode uses a headless user agent #90

[BUG] Headless 'new' mode uses a headless user agent #90

cernadasjuan commented Apr 21, 2023

ckosmic commented Apr 21, 2023

Sparticuz commented Apr 21, 2023

cernadasjuan commented Apr 21, 2023

jacobi973 commented May 1, 2023

Sparticuz commented May 2, 2023

cernadasjuan commented May 2, 2023 •

edited

jacobi973 commented May 2, 2023

cernadasjuan commented May 3, 2023

jacobi973 commented May 3, 2023

Sparticuz commented May 4, 2023

Sparticuz commented May 4, 2023

jacobi973 commented May 4, 2023

b5414 commented May 4, 2023 •

edited

jacobi973 commented May 22, 2023

Sparticuz commented May 26, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

piercefreeman commented Aug 26, 2023

iwaduarte commented Feb 19, 2024

[BUG] Headless 'new' mode uses a headless user agent #90

[BUG] Headless 'new' mode uses a headless user agent #90

Comments

cernadasjuan commented Apr 21, 2023

Environment

Expected Behavior

Current Behavior

ckosmic commented Apr 21, 2023

Sparticuz commented Apr 21, 2023

cernadasjuan commented Apr 21, 2023

jacobi973 commented May 1, 2023

Sparticuz commented May 2, 2023

cernadasjuan commented May 2, 2023 • edited

jacobi973 commented May 2, 2023

cernadasjuan commented May 3, 2023

jacobi973 commented May 3, 2023

Sparticuz commented May 4, 2023

Sparticuz commented May 4, 2023

jacobi973 commented May 4, 2023

b5414 commented May 4, 2023 • edited

jacobi973 commented May 22, 2023

Sparticuz commented May 26, 2023

This comment was marked as off-topic.

This comment was marked as off-topic.

piercefreeman commented Aug 26, 2023

iwaduarte commented Feb 19, 2024

cernadasjuan commented May 2, 2023 •

edited

b5414 commented May 4, 2023 •

edited