Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(plugin-stealth): Add support for UA hints #413

Merged
merged 8 commits into from
Feb 2, 2021
Merged

feat(plugin-stealth): Add support for UA hints #413

merged 8 commits into from
Feb 2, 2021

Conversation

Niek
Copy link
Collaborator

@Niek Niek commented Jan 30, 2021

This PR adds support for UA hints to the user-agent-override evasion. Note that this is a breaking change, I opted to change the functionality of the evasion a bit and detect the platform (as well as the now required version, model, architecture, etc) all from the provided user agent string.

Unfortunately UA hints only work in headful (yet another reason not to use headless...), so the test needs to run in headful.

@github-actions github-actions bot added plugin: puppeteer-extra PuppeteerExtra Plugin related plugin: stealth ㊙️ Detection evasion related labels Jan 30, 2021
@berstend berstend self-assigned this Jan 30, 2021
Copy link
Owner

@berstend berstend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff! Added few comments/suggestions :-)

@berstend
Copy link
Owner

berstend commented Jan 30, 2021

I just noticed a more common use-case we might want to optimize for:
Say I'm running stealth/pptr in docker and I just want to spoof the platform, I'd need to get the UA string myself (at a point where page.browser().userAgent() doesn't exist yet), string replace the platform and then pass it to the plugin, correct?

I think deriving the platform from the UA string is neat and was wondering if we can additionally provide a platform override as well (in case the rest of the UA isn't relevant to the user). Maybe there's a lightweight UA string manipulation library out there that allows us to just replace UA.platform (similar to the URL object) which run before the rest of the code?

@Niek
Copy link
Collaborator Author

Niek commented Jan 31, 2021

I just noticed a more common use-case we might want to optimize for:
Say I'm running stealth/pptr in docker and I just want to spoof the platform, I'd need to get the UA string myself (at a point where page.browser().userAgent() doesn't exist yet), string replace the platform and then pass it to the plugin, correct?

I think deriving the platform from the UA string is neat and was wondering if we can additionally provide a platform override as well (in case the rest of the UA isn't relevant to the user). Maybe there's a lightweight UA string manipulation library out there that allows us to just replace UA.platform (similar to the URL object) which run before the rest of the code?

That's a good one, let me look into this. Another thing I thought of is skipping the locale parameter and converting that into the language preference in case headful is used - WDYT?

@berstend
Copy link
Owner

Another thing I thought of is skipping the locale parameter and converting that into the language preference in case headful is used - WDYT?

How do you mean? :-) We do set acceptLanguage based on the locale which affects both headless/headful?

@Niek
Copy link
Collaborator Author

Niek commented Jan 31, 2021

Another thing I thought of is skipping the locale parameter and converting that into the language preference in case headful is used - WDYT?

How do you mean? :-) We do set acceptLanguage based on the locale which affects both headless/headful?

Correct, and that leads to wrong order for the Accept-Language header. So in case of headful it's probably better to skip that and use the preference setting for language instead (same effect but not messing up the headers).

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Yeah, that makes sense. I'm a bit concerned as the user-data-dir plugin is quite old and not as well tested but overall it should do the trick :-) It'd be good to detect if we're using connect, so we can fallback to the old method to set the language header.

@Niek
Copy link
Collaborator Author

Niek commented Feb 1, 2021

Yeah, that makes sense. I'm a bit concerned as the user-data-dir plugin is quite old and not as well tested but overall it should do the trick :-) It'd be good to detect if we're using connect, so we can fallback to the old method to set the language header.

I pushed some changes in 04445d5: it always sets the preferences - which are ignored on headless. On headless, it uses the acceptLanguage parameter.

Good point about connect(), I pushed a fix for that in fe37796

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Looks good! Do we want to fix the Linux use-case before merging this in? I can't think of a scenario where it'd be desired to signal linux as the platform 🤔

A quick fix could be:

  • Detect if the UA has parenthesis and is linux (search for Linux in UA string) - examples
  • If Linux then replace the first parenthesis with Windows 10 data (Windows NT 10.0; Win64; x64)
  • Make this opt-out with a maskLinux: true option or so

WDYT?

@Niek
Copy link
Collaborator Author

Niek commented Feb 1, 2021

Hmmm, while it definitely makes sense UX-wise, it can also be confusing. I assume that many users run macOS or Windows to test and Linux to deploy - having the default of this evasion act different depending on the host is a bit strange IMHO. Maybe it makes more sense to output a big fat warning in case the UA contains Linux, something in the lines of "Warning: your user agent signals the Linux OS. This could be a signal to some sites to block your browsing session."

In any case it's an improvement to the situation before this PR, where the default was to use the host UA + hardcoded Win32 platform.

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Replacing Linux as the platform has been the default since the first version of this evasion (in a broken way though lol) :-)

There's still no good UX story here when upgrading to this version of the evasion and running things in a docker container (and wanting to hide that) - it'd still be necessary to provide a full UA string which is hard to do when initializing the plugin and if hardcoded will result in outdated/mismatching version numbers quickly 🤔

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Small thing, we might want to add the user-preferences to the stealth dependencies or things will initially fail for users

@Niek
Copy link
Collaborator Author

Niek commented Feb 1, 2021

Replacing Linux as the platform has been the default since the first version of this evasion (in a broken way though lol) :-)

Are you sure about that? I can't find any place where "Linux" in the UA is replaced with "Windows NT XX" for example. Only the platform value was hardcoded to Win32, but the UA still signaled Linux or macOS or whatever the user has. This was initially one of my main reasons to fix the evasion - some people complained about the mismatch between the UA string and navigator.platform 😄 edit: I see it's in puppeteer-extra-plugin-anonymize-ua, maybe it makes sense to merge these?

There's still no good UX story here when upgrading to this version of the evasion and running things in a docker container (and wanting to hide that) - it'd still be necessary to provide a full UA string which is hard to do when initializing the plugin and if hardcoded will result in outdated/mismatching version numbers quickly 🤔

Hmm yes I agree - it's kind of a breaking change. What do you suggest? There's no way to make "required" options, right?

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Apologies, I didn't formulate that clear enough: Yeah, initially we used the anonymize-ua plugin for this stuff, which did anonymize the UA string. Eventually we switched to using CDP methods and ended up doing it half baked (only spoofing the platform), not replacing Linux in the UA was an oversight really 😄

Overall I think it'd be good to replace Linux (both in the UA and the platform) by default, with an opt-out (also something we didn't have before) or automatic opt-out when providing a custom UA string.
I feel that would satisfy most use-cases while staying backwards compatible with the previous approach (or at least intent lol).

As for the implementation: Not sure if it's worthwhile pulling in the anonymize-ua plugin for this, as opposed to just copy pasting the necessary code to this evasion. Overall the stealth plugin is used the most, so limiting the number of dependencies (and things that can go wrong) might be a better approach. :-)

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

I agree that masking Linux is a fair bit "opinionated" though :-)

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Let's approach this differently, the previous evasion was broken in behavior (spoofing Win32 platform on macOS as well, etc).

What I would want an ideal/proper user-agent-override evasion to do:

  • Remove notions of Headless from the UA by default
  • Ability to set a custom locale in the best way possible (user prefs vs CDP)
  • Ability to set a completely custom UA string
  • Ability to spoof navigator.platform (automatically based on the UA string is a bonus)
  • Ability to spoof client hints matching all the other data
  • Ability to retain the original UA string/versions (sans headless) while replacing Linux with Windows (also in navigator.platform), with opt-out

Except for the last thing we're already there 😄 The reason I think masking Linux by default is worthwhile: Everyone running pptr in docker will have do it anyway and the current ergonomics (removing the user-agent-override evasion first, etc) are pretty verbose

We eventually will refactor the stealth plugin to allow for a single config object without needing to remove evasions first - if we had that already I wouldn't mind making this behavior opt-in :-)

@Niek
Copy link
Collaborator Author

Niek commented Feb 1, 2021

OK cool, that sounds good to me. I'll add a spoofWindows boolean option (default true) - I'll spoof it on macOS + Linux I guess (not just Linux).

@berstend
Copy link
Owner

berstend commented Feb 1, 2021

Sounds good :-) No strong preference regarding macOS > Linux, though I think it'd be fine not spoofing macOS (as it's considered a regular user platform of sorts lol)

@Niek
Copy link
Collaborator Author

Niek commented Feb 2, 2021

Sounds good :-) No strong preference regarding macOS > Linux, though I think it'd be fine not spoofing macOS (as it's considered a regular user platform of sorts lol)

Good point - added Linux-only spoofing now with a maskLinux option (default true). I excluded Android devices because if an Android UA is set that's very likely on purpose.

@berstend
Copy link
Owner

berstend commented Feb 2, 2021

Looking good 👍

@berstend berstend changed the title Add support for UA hints feat(plugin-stealth): Rewrite user-agent-override, add support for UA hints Feb 2, 2021
@berstend berstend changed the title feat(plugin-stealth): Rewrite user-agent-override, add support for UA hints feat(plugin-stealth): Add support for UA hints Feb 2, 2021
@berstend berstend merged commit 5802c6a into master Feb 2, 2021
@berstend berstend deleted the ua-hints branch February 2, 2021 18:18
@berstend
Copy link
Owner

berstend commented Feb 2, 2021

Successfully published:
 - puppeteer-extra-plugin-stealth@2.7.3

@berstend
Copy link
Owner

berstend commented Feb 7, 2021

Since the user-preferences plugin became much more relevant with this change I added tests here: fdae8b0 (#303) (also to check everything works with the rewrite)

@ushuz
Copy link
Contributor

ushuz commented May 23, 2022

The dependency on user-preferences has some side effects.

user-preferences overrides certain user prefs on each launch, making it pretty hard to persist those prefs when needed. And since it's an inexplicit dependency, it's hard to debug either. I have to dig into each evasion's code to get to know that user-agent-override evasion is somehow messing my user prefs.

Since this dependency is meant to avoid initial failure #413 (comment), it may be better to have an option to avoid overriding existing user prefs, thus avoid unexpected side effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin: puppeteer-extra PuppeteerExtra Plugin related plugin: stealth ㊙️ Detection evasion related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants