Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Radio Silence by Default for Browser Startup and Background Connections aka "Disable Phone Home" #1807

Closed
adrelanos opened this issue Feb 14, 2024 · 2 comments
Labels

Comments

@adrelanos
Copy link

Introduction

Radio Silence: No background connections should be initiated without the user's knowledge and explicit consent. If the user opens the browser and does not visit a website, no data should be transmitted. This, in my opinion, should be the core philosophy of a privacy-focused browser. The concept of 'Radio Silence' aligns with this principle, ensuring that users have complete control over their data and privacy. A comparable perspective advocates for this stringent approach to user privacy. Similarly, Mullvad Browser echoes a similar sentiment in its blog post: Telemetry: don’t let your browser ‘call home’ and betray you.

Radio Silence simply put for the user means "no Phone Home". This should be the browser default setting.

LibreFox (nowadays abandoned) also had this feature.

IJWY (I Just Want You To Shut Up)

This is a set of settings that aim to remove all the server links embedded in Firefox and other calling home functions in the purpose of blocking un-needed connections. The objective is zero unauthorized connection (ping/telemetry/Mozilla/Google...).

// Section : IJWY To Shut Up
// I Just Want You To Shut Up : Closing all non necessary communication to mozilla.org etc.
//                              Thoses settings are not used in gHacks for the moment.
//                              Will be upstreamed once stable in final version.

Unfortunately, after LibreFox was deprecated, the LibreWolf developers (fork of LibreFox) removed this feature.

This is a development goal / feature request for users who want privacy not only on random websites but want also privacy from the browser vendor, Mozilla.

Connections upon browser startup

See these screenshots of OpenSnitch. It was done using LibreWolf but this would be similar for user.js. Since I wanted to find out if no phone home would be considered a worthwhile development goal, I didn't re-create this for user.js specifically yet, which could be done later if there is any interest.

This happens just after starting the browser.

librewolf-connections-at-startup-1

librewolf-connections-at-startup-2

Pretty ironic, Firefox sends ~ 13 tracking signals to tracking-protection.cdn.mozilla.net.

Why are these connections a privacy issue?

Let's consider the implications of a simple network connection from an IP address and its potential impact on user privacy:

  1. Data Conveyed by IP Connections: Companies like Google, Mozilla, and Cloudflare receive the IP addresses of users every time a browser (primarily Firefox or Chromium-based) is launched. Every time a browser is initiated, it generally makes an connection from a specific IP address. This action is typically human-driven, suggesting someone is active at that IP location.

  2. Behavioral Patterns: Regular habits, like opening a browser at 7:00 AM to read news, create recognizable patterns. Any deviation from this routine, like not accessing the browser at the usual time, could indicate a change in the user's behavior or circumstances. This kind of personal detail is none of the companies business.

  3. Browser Usage as a Routine Activity: Most people tend to start a browser as one of the first activities when they use a computer. The only exception might be if the computer is left on with the browser continuously open.

  4. IP Addresses and User Identification: Services like Google, which receive connections from numerous browsers worldwide, might be able to convert an IP address from just "some computer or network" to a "particular user." This capability depends on factors like Network Address Translation (NAT) and the extensive databases that can link IP addresses to complete user information.

  5. Determining Individual Users: Even in cases of shared IP addresses, metadata sent with browser queries can help in distinguishing individual users from a small set of potential users. Cloud providers have technologies to differentiate between real persons and bots.

  6. The Nature of 'Pings': Imagine if these companies openly stated, "We want to collect everyone's IP address each time they start their browser." That would sound creepy but this is what is happening. A simple ping sent each time a browser starts can equate to telling a company, "Hey, it's me [User's Name], currently at [specific location], using [specific computer]." Over time, this data reveals patterns of behavior, like where a person lives and works, their habits, and their social interactions.

  7. Statistical Analysis: Even if the data isn't always specific, statistical analysis over time can reveal a lot about a user, even from sporadic data.

  8. Privacy Concerns and Trust Issues: There's a general distrust towards companies regarding their use of user data. The safest assumption is that they will utilize whatever data they receive to its fullest extent, regardless of their public statements on privacy.

  9. Protecting Privacy: The only way to ensure privacy is to avoid sending sensitive data in the first place. Pings that are specific to certain user events and carry uniquely identifiable information are extremely detrimental to privacy.

  10. Geolocation and Device Mapping: Open Source and private databases can convert IP addresses to near-exact geographical positions. For instance, Android phones with "enhanced" GPS actively scan and map WiFi and Bluetooth networks, which can lead to an invasion of privacy on a massive scale. This is mentioned because it is prudent to assume that all of this data is being merged.

  11. The Siege of Modern Technology: The widespread use of devices that constantly scan for other devices, akin to wardriving, means companies like Google have extensive data on our movements and interactions.

  12. No Harmless Network Connections: All outgoing signals and network connections provide specific information about the user. This data, when correlated with information in massive databases, can reveal intimate details about a person's life.

  13. Implications for Technology Users: The reality that every web interaction divulges significant information about a user challenges the notion of deploying technology that automatically does things for the user without their explicit interaction.

  14. Combination with other Tracking Technologies: This data, when combined with other information gathered during normal browsing and browser fingerprinting, becomes quite revealing.

  15. Long-Term Data Collection and Monetization: Such data collection can continue for decades, with dedicated employees constantly thinking of ways to monetize and (ab)use this information, not for the benefit of the user. It's wise to assume that this data will be, or could be, used to its maximum potential.

  16. Preventing Data Misuse: The primary defense against this is to prevent such data from falling into the wrong hands. Certainly no privacy policies should be relied upon.

  17. The Only True Privacy Measure: Radio Silence: If privacy is a genuine concern, the only effective approach is complete radio silence. At the very least, disabling default browser signals that are sent upon opening can help protect privacy.

Excursion on Cloudflare's Capabilities

Cloudflare is relevant because many of the browser's background connections are probably proxies by Cloudflare. The CDN provider Cloudflare is used by millions of websites. [1] [2] This is 18.9% of all websites. Or almost 1 in 5 websites. If we look at these https://trends.builtwith.com/cdn/Cloudflare it's 32 % of the top 100 K websites. Most websites today utilize Cloudflare or some different CDN (content delivery network) with similar features and tracking capabilities for its anti spambot features, anti-DDOS and performance enhancements. CDNs acts as man-in-the-middle (MiTM) between a website and its visitors.

Here are selected excerpts that illustrate the determination and sophistication of full-time professionals dedicated to advancing tracking technologies. These examples highlight the challenges and complexities faced when developing privacy-preserving software, emphasizing the level of expertise and resources one must contend with in this field.

Quote Cloudflare blog posts:

By adding multi-user IP address detection to Cloudflare products, we're improving the quality of our detection techniques and reducing false positives for our customers.

There are some tradeoffs to this approach: some users may use multiple web browsers and some other users may have exactly the same user agent. Nevertheless, past research has shown that the number of unique web browser user agents is the best tradeoff to most accurately determine CG-NAT usage.

When an Internet user visits a website, the underlying TCP stack opens a number of connections in order to send and receive data from remote servers. Each connection is identified by a 4-tuple (source IP, source port, destination IP, destination port). Repeating requests from the same web client will likely be mapped to the same source port, so the number of distinct source ports can serve as a good indication of the number of distinct client applications. By counting the number of open source ports for a given IP address, you can estimate whether this address is shared by multiple users.

User agents provide device-reported information about themselves such as browser and operating system versions. For multi-user IP detection, you can count the number of distinct user agents in requests from a given IP.

Our service also uses other publicly available data sources to further refine the accuracy of our identification and to classify the type of multi-user IP address. For example, we collect data from PeeringDB, which is a database where network operators self-identify their network type, traffic levels, interconnection points, and peering policy. This data only covers a fraction of the Internet's autonomous systems (ASes). To overcome this limitation, we use this data and our own data (number of requests per AS, number of websites in each AS) to infer AS type. We also use external data sources such as IRR to identify requests from VPNs and proxy servers.

One method we used was running traceroute queries through RIPE Atlas, from each RIPE Atlas probe to the probe's public IP address. By examining the traceroute hops, we can determine if an IP is behind a CG-NAT or another middlebox.

  • Cloudflare Bot Management: machine learning and more (product]
    • machine learning: "Cloudflare's Machine Learning trains on a curated subset of hundreds of billions of requests per day to create a reliable bot score for every request."
    • heuristic engine
    • behavioral analysis: "Cloudflare analyzes behavior and detects anomalies in your Internet property's specific traffic, scoring every request by how different it is from the baseline."
    • verified bots
    • (JS) fingerprinting: "Cloudflare uses fingerprinting from millions of Internet properties to accurately classify bots. They do not generate or store device fingerprints, eliminating the risk of user privacy being compromised."

Also:

  • Characteristics of the home router, software version, which users often do not update (or no updates being provided by the hardware producer).
  • Characteristics of the user's network hardware, operating system

It's reasonable to assume that Cloudflare takes advantage of the pings sent by nearly all browsers globally at the start of each browsing session.

JS Fingerprinting

When it comes to Bot Management detection quality it's all about the signal quality and quantity. All previously described detections use request attributes sent over the network and analyzed on the server side using different techniques. Are there more signals available, which can be extracted from the client to improve our detections?

As a matter of fact there are plenty, as every browser has unique implementation quirks. Every web browser graphics output such as canvas depends on multiple layers such as hardware (GPU) and software (drivers, operating system rendering). This highly unique output allows precise differentiation between different browser/device types. Moreover, this is achievable without sacrificing website visitor privacy as it's not a supercookie, and it cannot be used to track and identify individual users, but only to confirm that request's user agent matches other telemetry gathered through browser canvas API.

[1] https://community.cloudflare.com/t/statistically-speaking-whats-the-percentage-of-total-sites-that-use-cf/372054
[2] https://blog.cloudflare.com/application-security/

Excursion on Mobile Phone Data Harvesting

  • Google operates servers dedicated to testing Android connectivity.
  • Android devices routinely send requests to these servers in response to certain events related to phone connectivity. Although the specifics of these events may not be widely known, it is common knowledge that these devices frequently "call home" to verify connectivity.
  • Through this process, Google collects user data, including IP addresses and other personal information.
  • It is prudent to assume that data brokers engage in daily transactions, buying and selling this kind of information.
  • There is massive data harvesting by phones.

Conclusions from this data:

  • Google is most likely capable of associating an IP address with a specific individual in many cases.
  • By analyzing the timing and frequency of these requests, especially when they correlate with specific user events, the recipient of this information can deduce significant insights into user behavior.

It is prudent to assume that this data is (or at least can be) combined with what browsers reveal during startup due to the default background connections. This is why it is mentioned here.

Blocklist Downloads as a Security Risk

Downloading blocklists from web servers all over the place increases the attack surface and makes the user more vulnerable to targeted attacks. There is a risk that an attacker could compel or hack Mozilla to selectively target particular users with malware. Parsing downloaded blocklists is a security risk due to string parsing.

Likelihood of Complex Targeted Attacks

The recent iPhone backdoor was not necessarily a state-sponsored adversary. I highly recommend the video Operation Triangulation: What You Get When Attacking iPhones of Researchers because it:

  • shows the lengths attackers go to and how much effort they spend to compromise user services.
  • serves as a reminder of how important it is to have as few as possible code for connection handling and string parsing.

Compromising a browser is a high reward for the attacker and a high risk for the user.

But Mozilla is trusted?

Even if you trust Mozilla and other servers being contacted, any good actor with lots of trust and privileges can potentially be transformed into a bad actor at any point. So, let's reduce the attack surface.

Local Blocklists as a Security Feature

The blog post Revocation Checking is Pointless explains that a man-in-the-middle capable of swapping out certificates obviously is also capable of disallowing or redirecting revocation list downloads. Therefore, if these lists were locally available and updated through the normal system package management, there would be no additional reliance on yet another web service providing these.

But the phone home features are part of the normal browser operation?

There is way too much browser source code. Literally millions of lines. Has all the phone home code been thoroughly reviewed? Does the community have this ability? Even if someone had this capability, one would need to keep up with all the constantly ongoing changes.

Tickets such as user.js: Can't stop Firefox background connections show how difficult it is to disable all browser phone home features. If the community even has trouble with that, why would one assume that the browser's phone home features are free of privacy and/or security issues? The safe choice is to disable these features by default.

But this is anonymized data only?

Please refer to the blog post Commercial mass surveillance: The collected data can’t be kept anonymous, Organizations that collect data often claim it’s anonymous. Research shows this is impossible..

Open Source in Name Only

Downloading a lot of blocklists, in my opinion, is only Open Source in name but not in spirit. Since the blocklists are in different repositories, there is no audit trail in the source code versioning system (git) to show which exact blocklist file was shipped with a particular browser version. Perhaps all the needed lists should be in a separate repository or package, which gets updated more frequently than the browser itself. Then everything would be in a reasonable number of places and not downloaded by each user from many different servers all the time.

Lots of file downloads also complicate software forks as they would not only have to provide software but also host a lot of web services with all the different downloaded files.

The ideal state would be if the browser was fully set up after initial installation, not requiring any additional background connections.

The Principled Approach

Even if one disregards all privacy and security arguments, a user may simply not want their browser to connect to GitHub, Microsoft, Cloudflare, or similar services upon startup. The question then arises: How difficult can it be to accommodate such a preference?

Practical Implementation

From a practical standpoint, implementing this feature would require bundling all necessary components for a basic startup with the browser itself. This approach ensures that the browser does not need to make external connections to retrieve essential resources during or immediately after installation.

Centralization vs. Decentralization in Privacy

The debate on whether to download filter lists from multiple sources encapsulates a larger conversation about centralization versus decentralization related to privacy. While decentralization offers the advantage of spreading trust among a variety of entities, it paradoxically introduces complexities into privacy management. Users find themselves in the challenging position of evaluating the credibility of numerous parties, each with their own privacy philosophies.

In contrast, centralization through a transparent, unified entity with clearly defined privacy and value policies offers a more streamlined approach. A project like this one could serve as a trusted intermediary, aligning disparate privacy standards into a cohesive, dependable privacy framework. The introduction of a Radio Silence feature, designed to minimize unsolicited background communications, represents a significant step towards achieving a centralized privacy model.

Distributing and updating filter lists through the browser vendor’s own software delivery mechanisms—rather than relying on third-party downloads—simplifies the process considerably. This approach necessitates securing and auditing only one updating mechanism: the system's package manager, thereby streamlining the security and trust model.

Disclaimer

These views are my personal opinions only.

Conclusion

The website privacytests.org highlights a crucial issue: despite significant efforts invested in enhancing privacy features, browsers continue to expose users to centralized services upon each launch. The Radio Silence feature represents an attempt to address this concern by offering users greater control over their privacy. Its implementation poses challenges, yet it embodies an ambitious goal to elevate privacy standards in web browsing.

Feasibility and Importance

While the implementation of Radio Silence could be considered a daunting task, potentially verging on impractical, its significance cannot be understated for a browser committed to respecting user privacy. An honest acknowledgment of the feature's complexity, coupled with an invitation for community contributions ("Patches welcome"), might be a pragmatic approach to advancing this initiative.

Question for Consideration

Given the complexities and potential benefits, do you agree that pursuing the Radio Silence feature is a laudable goal for a browser dedicated to enhancing user privacy?

@Thorin-Oakenpants
Copy link
Contributor

https://codeberg.org/librewolf/issues/issues/1779

I'm not going to bother explaining why I'm closing this as invalid, but I suggest you get a better handle on what privacy and security are

@arkenfox arkenfox locked and limited conversation to collaborators Feb 14, 2024
@Thorin-Oakenpants
Copy link
Contributor

just going to link back to Kicksecure/security-misc#192 so people can follow along

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Development

No branches or pull requests

2 participants