Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unauthenticated Firehose API exposes user data without consent and likely violates privacy regulations #3166

Open
adamedx opened this issue Dec 3, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@adamedx
Copy link

adamedx commented Dec 3, 2024

The Firehose API provides a mechanism to obtain updates to the protocol stream via websockets. With the BlueSky app, this currently means you can, without signing in to any application or agreeing to follow privacy regulations, see the BlueSky handle (as DiD) of every person who posts, the time they posted, and even the content of the post, along with other metadata.

This allows negligent and / or hostile parties to cause harm in addition to violating laws, and because the API allows anonymous access, there is no recourse for BlueSky, regulators, or violators to stop harmful actions and / or upload laws or provide recourse for victims.

To fix this, access to any personal data or customer content must be authenticated with an enforceable agreement from anyone granted access to follow any relevant privacy policies including applicable regulations like GDPR, Washington State My Health Data Act, and the California Consumer Privacy Act (CCPA.

Harms that are currently viable via this unauthenticated API include:

  • BlueSky's privacy policy and terms of service mean nothing with regard to policy since even if BlueSky itself handles data according to policy, everyone in the world can use the unauthenticated API and just copy all users' data everywhere. BlueSky may as well not have a privacy policy as such offending copiers cannot even tracked since access is anonymous!
  • Mass cyberstalking BlueSky account users, including monitoring their activity via times that they post, the people they talk to, and learning their current or upcoming physical location
  • Mass harassment of people suspected of exercising their reproductive rights by looking for posts indicating particular health conditions
  • Identification of large groups of people as members of the LGBTQIA+ community in order to fuel mass harassment campaigns
  • State actors can perform all of the above and may use this to undermine the security of sovereign nations or track political dissidents

It should be noted that privacy regulations do apply at the individual level, so even when unauthenticated access is allowed for a single BlueSky account, this likely violates such regulations. But even outside of regulations, the availability of such an API at scale poses a significant risk in its own right. Comparable services require app developers to obtain authenticated access and agree to proper uses and at times even enforce correct behavior by cutting off access when policies are violated; at a minimum BlueSky should follow their lead.

Please let me know if you need help testing the fix!

To Reproduce

Steps to reproduce the behavior:

  1. Build an app that uses a websocket to access the either the CBOR com.atproto.sync.subscribeRepos or (much simplier) JSON-based JetStream feed and outputs all events. Sample app given below
  2. Run the app!

Expected behavior

Expect the app to fail because I didn't sign in. That's not what happened -- let me see all the posts in BlueSky, likes, follows, etc. I didn't even have to be a BlueSky user to see it!

Details

  • Operating system: Windows (and Linux, and Mac, probably AmigaOS and Sinclair Basic if it had enough RAM)
  • Node version: I used .Net actually, no node, see below

Additional context

Here's the code I threw together for .net (Windows / Linux / Mac):

using System.Text;
using System.Net.WebSockets;

Uri uri = new("wss://jetstream1.us-west.bsky.network/subscribe");

using ClientWebSocket ws = new();
await ws.ConnectAsync(uri, default);

Exception? readFault = null;
var closed = false;
int maxMessages = 100;
int currentMessages = 0;
var bytes = new byte[65536];

while ( currentMessages < maxMessages && ! closed )
{
    var messageString = "";
    var messageComplete = false;

    while ( ! messageComplete )
    {
        var result = await ws.ReceiveAsync(bytes, default);

        if ( result.MessageType == WebSocketMessageType.Close )
        {
            closed = true;
            break;
        }
        else if ( result.MessageType != WebSocketMessageType.Text )
        {
            readFault = new NotSupportedException($"The specified web socket returned a message of type {result.MessageType} which is not supported -- only the Text type is supported.");
            break;
        }

        messageString += Encoding.UTF8.GetString(bytes, 0, result.Count);

        messageComplete = result.EndOfMessage;
    }

    if ( messageComplete )
    {
        Console.WriteLine(messageString);
        currentMessages++;
    }
}

await ws.CloseAsync(WebSocketCloseStatus.NormalClosure, "Client closed", default);

if ( readFault is not null )
{
    throw readFault;
}

Console.WriteLine("End of line.");
@adamedx adamedx added the bug Something isn't working label Dec 3, 2024
@bnewbold
Copy link
Collaborator

Hi Adam,

The permissionless and open nature of the atproto firehose is by design, not an oversight. Content in the atproto network is "manifestly public". The fact that all posts are public is communicate directly to users during onboarding if they sign up with Bluesky. The data being public is not a grant of rights, and parties consuming from the firehose are not exempt from laws or regulations.

This follows the norm and precedent of the open public web: browsers, HTML, and HTTP. It is distinct from private digital messaging protocols and services like email, Matrix, private forums, and the like. It is not our stance that atproto is appropriate for all, or even a plurality of, digital messaging use cases. Many folks don't want some or all of their posts publicly broadcast to the entire world, and that is fine, they should not use the public aspects of Bluesky or atproto.

Digital public broadcast is an important medium however. We think that they are particularly prone to network effects, and that the ability to exclude participants (such as competitors) leads towards centralization, and ultimately to exploitation of the network and individual users, to the detriment of society.

On a pragmatic level, the concerns you raise about potential harm to users and communities are legitimate. But many of those harms are just as present on proprietary social media platforms, where bad actors have broad access to sensitive and private information despite platform ToS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants