Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Umbrella) Anti-fingerprinting features #101

Open
4 tasks
berstend opened this issue Dec 10, 2019 · 16 comments
Open
4 tasks

(Umbrella) Anti-fingerprinting features #101

berstend opened this issue Dec 10, 2019 · 16 comments
Labels
enhancement New feature or request idea

Comments

@berstend
Copy link
Owner

berstend commented Dec 10, 2019

The current goal of the stealth plugin is to hide headless browser usage by mocking/spoofing missing functionality in headless to emulate its headful counterpart.

Another nice feature (either as part of stealth or a dedicated plugin) would be to add anti-fingerprinting measures, which could mean that we emulate the most common data, mock certain things in more detail or shuffle data on each request (or triggered by user) to make fingerprinting harder.

Things to look into:

@berstend
Copy link
Owner Author

We need to keep in mind the basic issue about anti-fingerprinting:

CanvasBlocker actually increases your track-ability because the consistent factor is now that you have a changing canvas fingerprint (which almost no one does).
This is why Safari tries to give a universal canvas fingerprint so you can "blend in" with other users.

(https://news.ycombinator.com/item?id=20054831)

@berstend
Copy link
Owner Author

Look into this FF addon for inspiration:

https://addons.mozilla.org/en-US/firefox/addon/canvasblocker/

The different block modes are:

  • fake: Canvas Blocker's default setting, and my favorite! All websites not on the white list or black list can use the protected APIs. But values obtained by the APIs are altered so that a consistent fingerprinting is not possible
  • ask for permission: If a website is not listed on the white list or black list, the user will be asked if the website should be allowed to use the protected APIs each time they are called.
  • block everything: Ignore all lists and block the protected APIs on all websites.
  • allow only white list: Only websites in the white list are allowed to use the protected APIs.
  • block only black list: Block the protected APIs only for websites on the black list.
  • allow everything: Ignore all lists and allow the protected APIs on all websites.

Protected "fingerprinting" APIs:

canvas 2d
webGL
audio
history
window (disabled by default)
DOMRect
navigator (disabled by default)

@berstend
Copy link
Owner Author

Panopticlick's numbers are extremely confusing and borderline useless.

On my initial run, I got an overall entropy of 17.63. My two biggest identifiers were screen resolution (1000x595x24 which was approx 1/22000 browsers) and webgl hash (approx 1/3800 browsers). I fixed screen resolution to 1000x600x24 (approx 1/85 browsers) and disabled webgl hashing (approx 1/6 browsers) and the overall entropy did not change one iota, despite also closing browser, flushing cache and cookies, etc. I gave it another run with a deliberately weird resolution (1420x701 which was something like 1/105000 browsers) and once again, the overall entropy was exactly 17.63. So based on my experiment, it seems that screen resolution and webgl hash have no effect whatsoever on [Panopticlick's] overall entropy score.

An update on last night's experiment, if anyone cares. The next largest identifier was system fonts (approx 1/1300 browsers). I set browser.display.use_document_fonts=0 which hid the system fonts (now the same as approx 1/10 browsers) and my overall entropy dropped to just below 11 bits. At this point, none of the metrics were less common than 1/10 browsers, so I figured I wouldn't be able to do better than that.

As a side note, I ended up re-enabling system fonts because disabling them broke a large percentage of web sites' CSS.

@berstend berstend mentioned this issue Dec 10, 2019
@berstend berstend added the enhancement New feature or request label Dec 14, 2019
@Vittitow
Copy link

Vittitow commented Jan 7, 2020

Would love to see this feature. I believe recaptcha v3 is somehow factoring in browser fingerprint when calculating your overall score and this could potentially mitigate that.

@yalexx
Copy link

yalexx commented Mar 27, 2020

We need random fingerprint plugin :)

@brunogaspar
Copy link
Collaborator

I don't mind to work on such plugin, but i would need help as there a lot of things that would require to be randomised for it to work more efficiently.

@StevenVeshkini
Copy link

StevenVeshkini commented May 28, 2020

I volunteer to help you @brunogaspar ! Instead of trying to make a "common" fingerprint, I think it would be a lot easier to make it possible for each browser instance to have a unique fingerprint (by adding different fonts, etc. )

@itsdarrylnorris
Copy link
Contributor

@brunogaspar , @StevenVeshkini , I am happy to help out if needed. Do you guys have an idea of what data needs to be randomised, and how?

@evading-bot-detection
Copy link

evading-bot-detection commented Jul 15, 2020

@brunogaspar , @StevenVeshkini , I am happy to help out if needed. Do you guys have an idea of what data needs to be randomised, and how?

I think the user agent and webgl vendor and renderer should be selected from a list of random up-to-date values. Currently there is only a single value which may be flagged as suspicious by being the default.

@berstend
Copy link
Owner Author

berstend commented Jul 15, 2020

It could be nice to separate this into two steps:

  • A more generic browser "persona" or fingerprint data generator, similar in spirit to e.g. faker.js
    • The job of this library is to generate new convincing browser data/fingerprints
    • Ideally with coherent data, e.g. webgl vendor matching the platform
    • There are lists with most commonly used viewports/user-agent as an initial data source
    • An even better data source would be to write a small fingerprint.js utility which will sniff realistic/full fingerprints from a website (can be hosted by a supporter with a bit of traffic)
  • A plugin for puppeteer-extra which will apply this generated data
    • Could potentially be used to seed the stealth plugin as well, otherwise the defaults will be used
    • Devs need some control about when to refresh a fingerprint and which properties to skip (e.g. if the generated locale doesn't fit the proxy geo)

This is one of the most commonly used things being developed by companies using puppeteer-extra, often with outdated or hardcoded lists of user-agents and mixed with non matching other fingerprint data.

It'd be worthwhile creating a more quality and re-usable plugin here. This would also tie in neatly with a future proxy-manager (luminati/oxylabs) plugin (another thing I've seen being built in-house countless times). ;-)

@itsdarrylnorris
Copy link
Contributor

Hey @berstend

There are lists with most commonly used viewports/user-agent as an initial data source

I have a primitive version of this for a project that I am working on right now. I could easily abstract it into a separate project, and we could move to use it over here.

@andersonaguiar
Copy link

In order to create anti-fingerprinting, you need to understand what strategy they are using for fingerprinting, which nowadays are several, two of them are:

@Hypnos999
Copy link

Any update on the anti-fingerprint feature?

@JaneJeon
Copy link

JaneJeon commented Jan 2, 2022

Wouldn't something like this already do the job? https://github.com/apify/fingerprint-injector

@kaliiiiiiiiii
Copy link

kaliiiiiiiiii/Selenium-Driverless#207 (comment) might be relevant//helpfull regarding the implementation for Keyboard layout

@ComplexProjects
Copy link

Wouldn't something like this already do the job? https://github.com/apify/fingerprint-injector

Unfortunately, it will only work to some extent. While, yes it will change your device info, your canvas fingerprint will remain the same in all instances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request idea
Projects
None yet
Development

No branches or pull requests