Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer detected via variable name #209

Closed
prescience-data opened this issue May 28, 2020 · 13 comments · Fixed by #273
Closed

Puppeteer detected via variable name #209

prescience-data opened this issue May 28, 2020 · 13 comments · Fixed by #273
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed plugin: stealth ㊙️ Detection evasion related

Comments

@prescience-data
Copy link
Collaborator

This very slick detection just popped up, and my tests with both base Puppeteer and Puppeter-Extra with Extra-Stealth plugin are failing it.

https://github.com/digitalhurricane-io/puppeteer-detection-100-percent

Would be great to get that variable renamed to something opaque...

@prescience-data prescience-data changed the title Puppeteer detected via Puppeteer detected via variable name May 28, 2020
@StevenVeshkini
Copy link

Yikes...how would you do that though? Wouldn't you need to recompile puppeteer from source?

@prescience-data
Copy link
Collaborator Author

The only workarounds I have so far are:
a) forking original puppeteer, or;
b) running something like this:

const files = [
	'../../node_modules/puppeteer/lib/ExecutionContext.js',
	'../../node_modules/puppeteer/node6/lib/ExecutionContext.js',
];

const fs = require('fs');

files.forEach(file => {
	fs.readFile(file, 'utf8', function (err, data) {
		if (err) {
			return console.log(err);
		}
		let result = data.replace(/__puppeteer_evaluation_script__/g, '__jquery__');

		fs.writeFile(file, result, 'utf8', function (err) {
			if (err) {
				return console.log(err);
			}
		});
	});
});

@m4eba
Copy link

m4eba commented May 28, 2020

You could proxy the websocket connection that puppeteer uses to control browser and modify the sourceURL there.
I made proof of concept, it's ugly but works:
https://github.com/m4eba/puppeteer-detection-100-percent

But that is out of scope for this plugin, the proxy needs to go into the core library and then activated with the requirements variable, maybe?

@prescience-data
Copy link
Collaborator Author

Hopefully if this gets implemented it may fix it:

puppeteer/puppeteer#2671

@kensnyder
Copy link

Adding this bit of trickery avoids detection:

await page.evaluateOnNewDocument(() => {
  const errors = { Error, EvalError, RangeError, ReferenceError, SyntaxError, TypeError, URIError };
  for (const name in errors) {
    globalThis[name] = (function(NativeError) {
      return function(message) {
        const err = new NativeError(message);
        const stub = {
          message: err.message,
          name: err.name,
          toString: () => err.toString(),
          get stack() {
            const lines = err.stack.split('\n');
            lines.splice(1, 1); // remove anonymous function above
            lines.pop(); // remove puppeteer line
            return lines.join('\n');
          },
        };
        if (this === globalThis) {
          // called as function, not constructor
          stub.__proto__ = NativeError;
          return stub;
        }
        Object.assign(this, stub);
        this.__proto__ = NativeError;
      };
    })(errors[name]);
  }
});

@a10kiloham
Copy link

This is a good addition to the extra stealth plug in. Has anyone done a PR yet? Might try and find time this week otherwise

@digitalhurricane-io
Copy link

digitalhurricane-io commented Jun 26, 2020

@kensnyder That is some nice trickery!

One suggestion. You're missing part of the stack trace with this code though, as written. That would be pretty easy to detect.

I would suggest adding these 2 lines after "'lines.pop(); // remove puppeteer line"

const inRange = (max, min) => (Math.round(Math.random() * (max - min) + min)).toString();
lines.push(`    at jQuery.js:${inRange(5, 200)}:${inRange(1, 80)}`);

@berstend
Copy link
Owner

berstend commented Jul 10, 2020

What's the latest on this? I'm a bit worried that simply replacing __puppeteer_evaluation_script__ with something else doesn't fix the issue and the other side will simply add whatever string we use/generate to their blacklist. :-)

Has someone checked out what vanilla Chrome is using in their stack traces so we can emulate this properly?

An expected vs actual comparison would be very useful here.

@berstend berstend added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed plugin: stealth ㊙️ Detection evasion related labels Jul 10, 2020
@digitalhurricane-io
Copy link

@berstend Every script that runs on the page has a name. I don't think you can really blacklist something like "main.js" as that's pretty generic. However there may be better solutions that forking the library.

@prescience-data
Copy link
Collaborator Author

What's the latest on this? I'm a bit worried that simply replacing __puppeteer_evaluation_script__ with something else doesn't fix the issue and the other side will simply add whatever string we use/generate to their blacklist. :-)

Has someone checked out what vanilla Chrome is using in their stack traces so we can emulate this properly?

An expected vs actual comparison would be very useful here.

After working / researching this over the last month it's actually a deeper problem than just the string. It's the ability of the detection script to monitor execution.

I've got a solution working on Puppeteer 1.19 as mentioned above that rewrites some core Puppeteer logic to force all Puppeteer execution to run within it's own context, meaning even if the detection script is attempting to override functions, they can't access the isolated world, so it's not even able to be aware of the script.

#224

The only way I figure they could detect execution is to watch the DOM for unexpected changes, but even then, they would get a ton of false positives from things like password managers etc writing DOM for functionality.

Thoughts?

@berstend
Copy link
Owner

berstend commented Jul 11, 2020

@berstend Every script that runs on the page has a name. I don't think you can really blacklist something like "main.js" as that's pretty generic. However there may be better solutions that forking the library.

This is what I was hinting at, we should be able to derive the proper "originator" script dynamically and use that instead of something hardcoded. But this is all still too theoretical, a proper comparison of vanilla vs. puppeteer behavior would add a lot to the discussion.

edit: For better or worse puppeteer-extra seems to be used as a benchmark by the other teams, similar to how we use their various bot-detection test pages. We can't really hardcode anything and expect to get away with it. E.g. if my site isn't using jQuery.js that makes it very simple to detect the presence of a headless browser in disguise. :-)

edit2: Out of curiosity, has anyone checked how Playwright behaves in these scenarios?

@berstend
Copy link
Owner

berstend commented Aug 2, 2020

Fixed in #273

berstend added a commit that referenced this issue Aug 3, 2020
* chore: Update yarn.lock

* feat(plugin-stealth): Add sourceurl evasion

* chore(plugin-stealth): Add more sourceurl evasion test

* chore(plugin-stealth): Add docs for sourceurl evasion

* chore(plugin-stealth): Cleanup
@berstend
Copy link
Owner

berstend commented Aug 3, 2020

Fix published in puppeteer-extra-plugin-stealth@2.4.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed plugin: stealth ㊙️ Detection evasion related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants