Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the user data json file #2716

Closed
aryan096 opened this issue Nov 18, 2020 · 12 comments
Closed

Understanding the user data json file #2716

aryan096 opened this issue Nov 18, 2020 · 12 comments
Labels
documentation & specs question Further information is requested

Comments

@aryan096
Copy link

Hello!

I am a student working with privacy badger user downloadable data for a project. I needed to understand what the json file represents and couldn't find any information online. Could someone explain what action_map, settings_map, and snitch_map store?

@ghostwords ghostwords added the question Further information is requested label Nov 18, 2020
@ghostwords
Copy link
Member

Hello!

I just updated our design document to explain action_map and snitch_map (under Further Details).

As for settings_map, it contains various Privacy Badger settings, both internal (seenComic stores whether the user interacted with the new user welcome page already) and user-facing (typically available on the options page).

Let me know if you have any questions.

@aryan096
Copy link
Author

Hi! Thank you for the update! I had a quick question, is the info stored in the snitch_map based on the user's activity? It seems to store data about websites that I have never visited. For example, the domain value that occurs the most is informationweek.com, a website I haven't visited.

@ghostwords
Copy link
Member

ghostwords commented Nov 19, 2020

Privacy Badger includes pre-trained tracker data gathered by Badger Sett. If you enable learning to block trackers from your browsing in options, your Privacy Badger will then build on that pre-trained data as you browse the Web.

You can clear the pre-trained data under the Manage Data tab on the options page.

@ghostwords
Copy link
Member

ghostwords commented Nov 19, 2020

By the way, there are a few problems with using a stock Privacy Badger to answer questions like which websites have the most trackers.

For example, trackers often load other trackers dynamically. When a tracker that brings in other trackers is blocked, Privacy Badger never sees the trackers that would have been loaded otherwise.

Moreover, snitch_map records up to three sites per tracker only.

So if you want to use Privacy Badger to figure out which sites have the most trackers, which trackers appear on the most sites, etc., you should use a modified Privacy Badger, one that doesn't block requests (nor modify them in any way?), and one that doesn't cap snitch_map entries to three sites.

@aryan096
Copy link
Author

That is really helpful, thank you! I have been looking at the code to figure out how to remove the three sites cap on snitch_map entries, but couldn't really find how to do that. Do you have any suggestions?
Also, if blocking is entirely disabled, do you think the dynamically loaded trackers problem be solved?

@ghostwords
Copy link
Member

I think your best bet is to set TRACKING_THRESHOLD to a high number so that Privacy Badger effectively never decides to block anything. This should take care of both snitch_map (capped to TRACKING_THRESHOLD), and no longer blocking ("red" slider) nor modifying any requests/denying JS storage access ("yellow" slider).

Don't forget to enable local learning (under General Settings > Advanced) and clear all pre-trained data (under the Manage Data tab).

You may also want to disable sending "Global Privacy Control" and "Do Not Track" signals (under General Setttings), as some websites do respect them and serve fewer or no trackers in response.

@aryan096
Copy link
Author

aryan096 commented Nov 28, 2020

Gotcha! Thank you so much. From there, the process to install this modified version should just be to create an xpi file using the manifest.json? (I have no experience with extensions haha)
EDIT: nevermind, i found the develop doc! Thanks a lot for your help!

@aryan096
Copy link
Author

Hey! So it seems like raising the threshold doesn't actually do anything. PB still blocks trackers after 3 spots. I commented out the code from line 331-335 in heuristicblocking.js and nothing changed. Do you have any more pointers?

@ghostwords
Copy link
Member

Did you update TRACKING_THRESHOLD to let's say 3000000 and then reload the unpacked extension? You have to reload the extension after every code change you make, or your changes do not get applied.

@aryan096
Copy link
Author

I did. I am using Chrome, and I load the src everytime I change something. I can try Firefox to see if that is any different.

@ghostwords
Copy link
Member

The browser shouldn't make a difference. I think there is something off with your workflow. Did you clear the pre-trained data, for example?

If you're still having trouble, post the exact steps you followed and we could go from there. For example:

  1. git clone https://github.com/EFForg/privacybadger.git
  2. Edited constants.js to set TRACKING_THRESHOLD to 300
  3. Loaded the unpacked extension into Chrome
  4. Clicked "Remove all" under Manage Data on the options page
    ...

@aryan096
Copy link
Author

This is probably the silliest mistake I have made. I had a packed release of PB installed on Chrome, which was overriding the unpacked one I installed. It works as intended now. Thank you, and sorry for the confusion!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation & specs question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants