# BlueSky Starter Pack Data Collection

Team name: ___

Team members:
- Trang Kieu
- Terresa Tran
- Wynne Tseng 
- Vivien Wang
- Mei Wu


#### Concept Definition: Echo Chamber

An echo chamber refers to a social environment in which users are primarily exposed to information, accounts, and viewpoints that reinforce similar perspectives, with limited exposure to diverse or opposing viewpoints.

In the context of Bluesky starter packs, echo chambers may form when starter packs repeatedly recommend accounts that are highly similar in terms of social connections, interests, views, or topical focus. This can lead to clustered communities where information circulates within the same group.

Starter packs may contribute to echo chambers because they act as curated recommendation systems that introduce users to specific communities rather than the broader network.


##### We are going to define echo chamber through:
• Follower overlap percentage between accounts in the same pack


#### Context and Motivation

Bluesky starter packs are curated lists of accounts and feeds (up to 150 people and up to 3 custom feeds) designed to help users discover communities and content when joining the platform. Because Bluesky is decentralized and lacks traditional algorithmic recommendation systems, starter packs play a critical role in shaping user discovery and social network formation.

According to "Bootstrapping Social Networks: Lessons from Bluesky Starter Packs", starter packs often circulate within communities, creating clusters or "social bubbles" where users promote and follow others within the same social groups. This suggests that starter packs may reinforce community structures and amplify visibility of certain accounts.

Understanding these patterns helps answer broader questions about influence, discovery, and community formation in decentralized social networks.

#### Hypothesis

There are shared patterns across Bluesky starter packs. Specifically:

- Some accounts appear frequently across many starter packs, indicating higher visibility or influence
- Certain feeds are repeatedly included, suggesting they play a central role in content discovery
- Starter packs reflect community clusters or "social bubbles," where users promote accounts within their own communities

#### Research Questions
This phase focuses on four primary questions:

1. User Influence: Which accounts appear most frequently across starter packs?

This helps identify influential or highly recommended users within the Bluesky ecosystem.

2. Feed Popularity: Which feeds appear most frequently across starter packs?

Feeds act as discovery mechanisms, so frequently included feeds may serve as important information hubs.

3. Thematic Categorization

Do starter pack descriptions reveal distinct themes such as:

- journalism
- sports
- politics
- art
- technology

Starter packs may be organized around shared interests or communities.


In [105]:
# Import Libraries
import json

from atproto import Client, models
from atproto import exceptions
from password import BSKY_USERNAME, BSKY_APP_PASSWORD
import pandas as pd
from typing import List, Dict



In [106]:
# Enter your Bluesky Username and password for authentication
# Note: You can also create a file name password in the same directory and then store your user name as BSKY_USERNAME and password as BSKY_APP_PASSWORD.
# This Jupyter Notebook will import password file and your BSKY_USERNAME and BSKY_APP_PASSWORD variables automately
USERNAME = BSKY_USERNAME
APP_PASSWORD = BSKY_APP_PASSWORD

# Authenticate steps:
client = Client()
client.login(USERNAME, APP_PASSWORD)

ProfileViewDetailed(did='did:plc:lmc4xbbyqqyui7m6ptolv3lb', handle='tkieu137.bsky.social', associated=ProfileAssociated(activity_subscription=ProfileAssociatedActivitySubscription(allow_subscriptions='followers', py_type='app.bsky.actor.defs#profileAssociatedActivitySubscription'), chat=None, feedgens=0, labeler=False, lists=0, starter_packs=0, py_type='app.bsky.actor.defs#profileAssociated'), avatar='https://cdn.bsky.app/img/avatar/plain/did:plc:lmc4xbbyqqyui7m6ptolv3lb/bafkreig5n2ooeo3ixz4yfygzcuablb4ovfm7fijbq7dorzbsfu2ezxrefy@jpeg', banner=None, created_at='2026-01-13T22:59:12.525Z', debug=None, description=None, display_name='', followers_count=6, follows_count=75, indexed_at='2026-01-13T22:59:52.725Z', joined_via_starter_pack=None, labels=[], pinned_post=None, posts_count=4, pronouns=None, status=None, verification=None, viewer=ViewerState(activity_subscription=None, blocked_by=False, blocking=None, blocking_by_list=None, followed_by=None, following=None, known_followers=KnownFol

In [107]:
# Read the starter packs dataset provided by Martin as a list of SP uri to gather data about the accounts within starter packs
df = pd.read_json("starterpacks.jsonl", lines=True)
df

Unnamed: 0,list,name,$type,createdAt,cid,author,uri,rkey,collection_time,feeds,updatedAt,description,descriptionFacets,image
0,at://did:plc:zdtz65xortdlxi7d6hlyz2j5/app.bsky...,@‪motesandbeams.bsky.soci…'s Starter Pack,app.bsky.graph.starterpack,2024-11-16T18:42:13.188Z,bafyreie3wdnxditj3sbinyctdum23xk3x6luvdy6voy3o...,did:plc:zdtz65xortdlxi7d6hlyz2j5,at://did:plc:zdtz65xortdlxi7d6hlyz2j5/app.bsky...,3lb3k7fkymr2f,2026-02-11 08:00:03.164000+00:00,,,,,
1,at://did:plc:zdtz65xortdlxi7d6hlyz2j5/app.bsky...,@‪motesandbeams.bsky.soci…'s Starter Pack,app.bsky.graph.starterpack,2024-11-16T18:49:24.216Z,bafyreihb7o3slrv5i4uzl3uarlpb4gu2rluqsk6wzxhzv...,did:plc:zdtz65xortdlxi7d6hlyz2j5,at://did:plc:zdtz65xortdlxi7d6hlyz2j5/app.bsky...,3lb3kmamw462k,2026-02-11 08:00:03.164000+00:00,,,,,
2,at://did:plc:oextljnuf4ix335o7aapym55/app.bsky...,Knowledge,app.bsky.graph.starterpack,2024-12-21T16:45:49.695Z,bafyreidb7dfysn45jtudxaro6n6alqiemb5irmv2ztoan...,did:plc:oextljnuf4ix335o7aapym55,at://did:plc:oextljnuf4ix335o7aapym55/app.bsky...,3ldtdzitbxs2z,2026-02-11 08:00:17.963000+00:00,[],2025-01-09T12:32:19.214Z,Education is the most powerful weapon which yo...,,
3,at://did:plc:anumzyo4b5gclvho6uqkrpap/app.bsky...,‪rngmom03.bsky.social‬'s Starter Pack,app.bsky.graph.starterpack,2024-11-21T18:32:11.874Z,bafyreibbdck4dxrmulfafmwai2rul3co52zqedrsxiuad...,did:plc:anumzyo4b5gclvho6uqkrpap,at://did:plc:anumzyo4b5gclvho6uqkrpap/app.bsky...,3lbi3y3hdgx23,2026-02-11 08:00:24.338000+00:00,[{'uri': 'at://did:plc:z72i7hdynmk6r22z27h6tvu...,,,,
4,at://did:plc:d42nr7dwbfh4vfvduimuk7j5/app.bsky...,‪pietve.bsky.social‬'s Starter Pack,app.bsky.graph.starterpack,2024-11-16T11:22:54.573Z,bafyreicggizjorcm6jldsbdrnlysxmwqgij4ywtvd7ljm...,did:plc:d42nr7dwbfh4vfvduimuk7j5,at://did:plc:d42nr7dwbfh4vfvduimuk7j5/app.bsky...,3lb2rnu76p42x,2026-02-11 08:00:25.661000+00:00,[{'uri': 'at://did:plc:5rw2on4i56btlcajojaxwca...,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
354690,at://did:plc:uos7xqdk7ggroepqnlzl7zev/app.bsky...,‪rockmetal68.bsky.social‬'s Starter Pack,app.bsky.graph.starterpack,2025-05-12T21:50:04.900Z,bafyreiecae6zmpsrvn4aspjdccmidq6byxapydls5zw24...,did:plc:uos7xqdk7ggroepqnlzl7zev,at://did:plc:uos7xqdk7ggroepqnlzl7zev/app.bsky...,3loyxac7ew325,2026-02-01 07:59:12.704000+00:00,[],,,,
354691,at://did:plc:huo5yychqhf6ifzoxek7bs4y/app.bsky...,Cen - Creative & Candid �…'s Starter Pack,app.bsky.graph.starterpack,2025-05-14T12:31:52.439Z,bafyreihty74mzmeubzcbuj4idgvbnsubwm26rj344wof3...,did:plc:huo5yychqhf6ifzoxek7bs4y,at://did:plc:huo5yychqhf6ifzoxek7bs4y/app.bsky...,3lp4yxxptdh2j,2026-02-01 07:59:13.744000+00:00,,,,,
354692,at://did:plc:n2noxvecqcig4lvhthwcyp3q/app.bsky...,‪mazzystar906.bsky.social‬'s Starter Pack,app.bsky.graph.starterpack,2025-01-08T01:29:23.125Z,bafyreibfc3btykpkhyou74pdto4pv47lrbo4mmvr2zsr6...,did:plc:n2noxvecqcig4lvhthwcyp3q,at://did:plc:n2noxvecqcig4lvhthwcyp3q/app.bsky...,3lf6z7du7dx2e,2026-02-01 07:59:13.932000+00:00,[],,,,
354693,at://did:plc:gaeodbykn5riavylcovou3mh/app.bsky...,Ian Greenberg's Starter Pack,app.bsky.graph.starterpack,2025-01-12T21:17:20.652Z,bafyreidkco5kixpxxkemx3exwuynniprwt4f7ayjwl2x6...,did:plc:gaeodbykn5riavylcovou3mh,at://did:plc:gaeodbykn5riavylcovou3mh/app.bsky...,3lfl5hb5zee23,2026-02-01 07:59:14.393000+00:00,,,,,


## Question 1: Which account appear most in BlueSky Starter Packs?

Bluesky starter packs are curated lists of recommended accounts meant to help new users quickly find communities and high-quality content upon joining. Starter packs are created to mitigate "cold start" and were responsible for up to 43% of daily follow operations at their peak. Because each pack is created by a different user and focuses on a different theme or interest group, the accounts that appear most frequently across many starter packs are likely to be:

- broadly influential,

- highly visible across communities,

- Central hubs in the network.

Identifying these frequently-included accounts helps us understand:

- What kinds of voices are most prominent on Bluesky,

- Which users cross community boundaries,

- Whether certain media outlets, journalists, organizations, or personalities act as “anchor nodes” in the platform’s social ecosystem.

In [108]:
def get_accounts_info(account_list: List) -> List[Dict]:
    """
    Takes a list of account objects (each one is one entry from a starter pack)
    and extracts just the important identity information we care about.

    Parameters
    ----------
    account_list : list
        A list of accounts. Each account contains a 'subject' field with
        information about the actual user (like DID and handle).

    Returns
    -------
    list[dict]
        A list of simple dictionaries. Each dictionary has:
        - "Account DID": the unique ID for the account
        - "Account Handler": the user's handle (e.g. 'nytimes.com')
    """
    list_accounts = []
    account_dict = {}
    for account in account_list:
        # Each "account" is actually a dictionary-like structure
        # pulled from the starter pack. Inside it, "subject" stores
        # the actual user profile. We extract the two fields we care about.
        account_did = account["subject"]["did"]
        account_handler = account["subject"]["handle"]
        account_dict = {"Account DID": account_did,
                        "Account Handler": account_handler}
        list_accounts.append(account_dict)
    return list_accounts

In [109]:
def get_starter_pack_info(starter_pack) -> dict:
    starter_pack_dict = {"SP DID": starter_pack["starter_pack"]["cid"],                            # unique ID for this starter pack record
                        "SP Creator DID": starter_pack["starter_pack"]["creator"]["did"],          # DID of the user who created the pack
                        "SP Creator Handle": starter_pack["starter_pack"]["creator"]["handle"],    # their handle (username)
                        "SP Description": starter_pack["starter_pack"]["record"]["description"]}   # text description of the pack}
    return starter_pack_dict

In [110]:
def get_feed_info(starting_feed_list: List) -> List[Dict]:
    feed_list = []
    for feed in starting_feed_list:
        feed_list.append({"Feed DID": feed["cid"],
                        "Feed CID": feed["did"],
                        "Feed Description": feed["description"], 
                        "Feed Creator DID": feed["creator"]["did"],
                        "Feed Creator Handle": feed["creator"]["handle"],
                        "Feed Like Count": feed["like_count"]})
    return feed_list

In [None]:
# def get_all_accounts(list_uri: str) -> List[Dict]:
#     accounts = []
#     cursor = None

#     while True:
#         if cursor is not None:
#             cursor = cursor

#         res = client.app.bsky.graph.get_list(params= {"list":list_uri,"limit":100})

        
#         accounts.append(res["items"])

#         cursor = res["cursor"]
#         if not cursor:
#             break

#     return accounts

In [None]:
from typing import List, Dict, Any

def get_all_accounts(list_uri: str) -> List[Any]:
    """
    Fetch ALL accounts in a list (app.bsky.graph.list) and return a flat list
    of profile objects (subjects).

    Each returned element is either:
    - a dict with keys like "did", "handle", ...
    - or a ProfileView/ProfileViewDetailed object from atproto_client.
    """
    accounts: List[Any] = []
    cursor = None

    while True:
        params = {"list": list_uri, "limit": 100}
        if cursor:
            params["cursor"] = cursor

        res = client.app.bsky.graph.get_list(params=params)

        # Handle both dict-like and model-style responses
        items = res.get("items") if isinstance(res, dict) else res.items
        if not items:
            break

        for item in items:
            # item is a ListItemView or dict; its "subject" is the account
            if isinstance(item, dict):
                subject = item.get("subject")
            else:
                subject = item.subject
            accounts.append(subject)

        cursor = res.get("cursor") if isinstance(res, dict) else res.cursor
        if not cursor:
            break

    return accounts

In [137]:
#@ return -> list[list[dict]]
def get_starter_packs_info(uris: List) -> List[List[Dict]]:
    """
    Given a list of starter pack URIs, download information about each starter pack
    and the (sample of) accounts included in it.

    For each starter pack URI, we:
      1. Call the Bluesky API to get a detailed view of that starter pack.
      2. Extract metadata about the starter pack (who created it, description, etc.).
      3. Extract a sample list of accounts that appear in that starter pack.
      4. Flatten this into one row per (starter pack, account) pair.

    Parameters
    ----------
    uris : list
        A list of starter pack AT-URIs (strings). Each URI identifies one starter pack.

    Returns
    -------
    list[dict]
        A list of dictionaries. Each dictionary is one row linking:
        - a specific starter pack
        - one account that appears in that pack (from the sample list)
    """
    # this will hold all rows across all starter packs
    starter_packs_list = []
    feed_list = []

    sb_dict = {}

    for uri in uris:
        try:
            # 1. Ask the Bluesky API for detailed information about this starter pack
            starter_pack = client.app.bsky.graph.get_starter_pack(params={"starterPack": uri})
        except exceptions.BadRequestError as e:
            # If the server says "starter pack not found", we skip this URI and continue.
            print("Skipping URI (starter pack not found):", uri)
            continue
        
        # 2) Extract some basic metadata about this starter pack with get_starter_pack_info()
        starter_pack_info = get_starter_pack_info(starter_pack)
        
        # 3) Get the sample of accounts that appear in this starter pack
        unproccessed_account_list_from_sb = starter_pack["starter_pack"]["list_items_sample"]
        #unproccessed_account_list_from_sb = get_all_accounts(starter_pack["starter_pack"]["list"]["uri"])
        
        # Use our helper to extract just DID + handle for each account in the sample
        proccessed_list = get_accounts_info(unproccessed_account_list_from_sb)

        if len(starter_pack["starter_pack"]["feeds"]) != 0:
            feed_info = get_feed_info(starter_pack["starter_pack"]["feeds"])
            for feed in feed_info:
                feed_list.append({"SP URI": uri,
                        "SP DID": starter_pack_info["SP DID"],
                        "SP Creator DID": starter_pack_info["SP Creator DID"],
                        "SP Creator Handle": starter_pack_info["SP Creator Handle"],
                        "SP Description": starter_pack_info["SP Description"],
                        "Feed DID": feed["Feed DID"],
                        "Feed CID": feed["Feed CID"],
                        "Feed Description": feed["Feed Description"], 
                        "Feed Creator DID": feed["Feed Creator DID"],
                        "Feed Creator Handle": feed["Feed Creator Handle"],
                        "Feed Like Count": feed["Feed Like Count"]
                        })

        # 4) For each account in this starter pack's sample, create one flat row
        for account in proccessed_list:
            sb_dict = {"SP URI": uri,
                        "SP DID": starter_pack_info["SP DID"],
                        "SP Creator DID": starter_pack_info["SP Creator DID"],
                        "SP Creator Handle": starter_pack_info["SP Creator Handle"],
                        "SP Description": starter_pack_info["SP Description"],
                        "AccountDID": account["Account DID"],
                        "Account Handler": account["Account Handler"]}
            starter_packs_list.append(sb_dict)
    
    return [starter_packs_list,feed_list]
            

### Creating Testing Dataset

In [125]:
test_100 = df[:100]
test_1000 = df[:1000]
    

### Run the Script to Get Starter Packs Accounts and Starter Packs Feeds Dataset

In [138]:
result = get_starter_packs_info(test_100["uri"])
accounts_df = pd.DataFrame(result[0])
feeds_df = pd.DataFrame(result[1])


### Top 10 Accounts Appear Most Frequenly within Starter Packs

In [None]:
#Count number of appearances within starter packs and store the result into DataFrame
account_appearance_count = accounts_df["Account Handler"].value_counts()
df_counts = account_appearance_count.reset_index()
df_counts.columns = ["Account Handler", "Count"]

# Add frequencies column
total = account_appearance_count.sum()
df_counts["Frequency"] = df_counts["Count"] / total
df_counts.head(10)

Unnamed: 0,Account Handler,Count,Frequency
0,bsky.app,27,0.02415
1,aoc.bsky.social,10,0.008945
2,stephenking.bsky.social,8,0.007156
3,georgetakei.bsky.social,8,0.007156
4,nytimes.com,8,0.007156
5,marcelias.bsky.social,6,0.005367
6,npr.org,6,0.005367
7,pchone.bsky.social,5,0.004472
8,washingtonpost.com,5,0.004472
9,theonion.com,5,0.004472


In [None]:
#TODO: Right now the result only include partial of the accounts included within the bluesky. 
# Need to get all to add into the dataset

# Question 2: What feeds appear the most among all starter packs with feeds?

In [121]:
#Count number of appearances within starter packs and store the result into DataFrame
feeds_appearance_count = feeds_df["Feed Creator Handle"].value_counts()
df_counts = feeds_appearance_count.reset_index()
df_counts.columns = ["Feed Creator Handle", "Count"]

# Add frequencies column
total = feeds_appearance_count.sum()
df_counts["Frequency"] = df_counts["Count"] / total
df_counts.head(10)

Unnamed: 0,Feed Creator Handle,Count,Frequency
0,bsky.app,5,0.15625
1,skyfeed.xyz,4,0.125
2,colinbaines15.bsky.social,3,0.09375
3,eepy.bsky.social,2,0.0625
4,timmersionmedia.com,2,0.0625
5,shultzman.com,2,0.0625
6,clarabelle.xyz,2,0.0625
7,aendra.com,2,0.0625
8,bsky.art,2,0.0625
9,bossett.social,1,0.03125
