<div style="border-left: 6px solid #00356B; padding-left: 15px; margin-bottom: 20px;">
  <h1 style="margin-bottom: 5px; color: #00356B"><strong>Assignment 2:</strong> Part 1 (Feed Analysis)</h1>
  <span style="font-size: 1.2em; color: #444; font-weight: bold">S&DS 5350 | Social Algorithms</span>
  <br><br>
  <strong>Primary:</strong> Cailey Bobadilla (cjb239)
  <br>
  <strong>Partner:</strong> Brandon Tran (bat53)
</div>

---

*Mood for this part:*

<iframe data-testid="embed-iframe" style="border-radius:12px" src="https://open.spotify.com/embed/track/6NEoeBLQbOMw92qMeLfI40?utm_source=generator&theme=0" width="40%" height="152" frameBorder="0" allowfullscreen="" allow="autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture" loading="lazy"></iframe>

In [None]:
import ssl

# SSL error fix tells Python to ignore certificate errors
ssl._create_default_https_context = ssl._create_unverified_context

When a senator opens Bluesky, they see a reverse-chronological feed of posts from accounts they follow. This feed represents their information environment on the platform.

#### I.1 Data Collection

Write code to collect the feed for each senator:

1. For each senator, retrieve the **list of accounts they follow** using `getFollows`
2. For **each account in that follow list**, fetch that account's posts from the last 24 hours using `getAuthorFeed`
3. Combine these posts into a single reverse-chronological feed for each senator
4. Save the data as JSON (one file per senator, or one combined file)

Note: Look ahead at Section I.3. You’ll need to track both the follow relationships and the posts (for computing Jaccard similarities). Consider saving follow data and feed data separately.

**API endpoints** (see `bluesky_helpers.py` for wrapper functions):

- `app.bsky.graph.getFollows` — list of followed accounts (paginated, up to 100 per request)
- `app.bsky.feed.getAuthorFeed` — recent posts from an account

For API documentation see: Bluesky API documentation

**Rate limits**: Add `time.sleep(0.1)` between requests. The public API allows ~3,000 requests per 5 minutes.

In [None]:
import time
import bluesky_helpers

In [None]:
# Load the list of senators from the senators_bluesky.csv
senators = bluesky_helpers.load_senators('data/senators_bluesky.csv')

# Iterate through each senator
for senator in senators:
    # Get the handle of the current senator
    senator_handle = senator['handle']

    # ========================================================================
    # 1. For each senator, retrieve the list of accounts they follow using 
    #    getFollows
    # ========================================================================

    # Get the list of accounts the current senator follows
    follows_list = bluesky_helpers.get_all_follows(senator_handle)

    # ========================================================================
    # 2. For each account in that follow list, fetch that account's posts from 
    #    the last 24 hours using getAuthorFeed
    # ========================================================================

    # Collect the posts from each of the accounts the current senator follows
    raw_feed_posts = []

    # Iterate through each account the current senator follows
    for i, followed_account in enumerate(follows_list):
        # Get the handle of the current followed account
        followed_handle = followed_account['handle']

        # Get the recent posts for the current account
        result = bluesky_helpers.get_author_feed(followed_handle, limit=50)

        # Need to add between requests (publi ACI allows ~3,000 requests per 5 
        # minutes)
        time.sleep(0.1)

        # Check if the recent posts from the current account actually contain a
        # list of posts
        # NOTE: We check if recent_posts is in recent_posts because 
        # get_author_feed() can return None if the account no longer exists
        if result and 'feed' in result:
            # Iterate through each post from the feed
            for item in result['feed']:
                # Get the post from the feed
                post = item['post']

                # Get the time the post was posted
                created_at = post['record']['createdAt']

                # Check if the post was posted within the last 24 hours
                if bluesky_helpers.is_within_hours(created_at, hours=24):
                    # If it is, then add the current post to the list of posts 
                    # for the current senator
                    raw_feed_posts.append(post)

    # ========================================================================
    # 3. Combine these posts into a single reverse-chronological feed for each 
    #    senator
    # ========================================================================

    # Sort the posts from the current senator's feed in reverse-chronological
    # order
    # NOTE: The time is in ISO format, raw_feed_posts is a list of dictionaries
    # so the key uses the lambda function to extract the exact things to order 
    # by
    sorted_feed = sorted(
        raw_feed_posts, 
        key=lambda x: x['record']['createdAt'],
        reverse=True
    )

    # ========================================================================
    # 4. Save the data as JSON (one file per senator, or one combined file)
    # ========================================================================

    # Save the followed account and reverse-chronological feed for the current 
    # senator
    # NOTE: follows is a separate category for computing Jaccard similarities in
    # I.3
    senator_data = {
        'senator_info': senator, 
        'follows': follows_list, 
        'feed': sorted_feed
    }

    # Define the file name for the current senator's json file
    filename = f'part1_data/{senator_handle}_feed_data.json'

    # Save the current senator's data to the json file
    bluesky_helpers.save_json(senator_data, filename)

#### I.2 "Senators You May Know" Recommendations

Build a simple recommendation system to suggest which senators each senator should follow, based on the follow graph among senators. Use a **triangle-counting** approach: for each senator X who doesn’t follow senator Y, count how many senators that X *does* follow also follow Y. This is the "mutual follows" count — the more senators in X’s network who follow Y, the stronger the recommendation.

`recommendation_score(A, B) = |{C : A follows C and C follows B}|`

This is analogous to main feature in "people you may know" recommendations on various social networks.

Task 

For each senator:

1. Identify which other senators they do **not** currently follow
2. For each non-followed senator, compute the recommendation score (number of followed senators who follow them)
3. Report the **top 3 recommendations** with their scores

If a senator already follows all other senators, note this in your output. If a senator follows no other senators, note this in your output.

In [5]:
import json
import pandas as pd

In [7]:
# ============================================================================
# 1. Identify which other senators they do not currently follow
# ============================================================================

# Load the list of senators from senators_bluesky.csv
senators = bluesky_helpers.load_senators('data/senators_bluesky.csv')

# Get the handles for all of the senators
all_senator_handles = set([s['handle'] for s in senators])

# Build a graph of which senators follow which senators using a dictionary
# NOTE: The dictionary will have the format {'sen_A': {'sen_B', 'sen_C'}}
senator_follows_graph = {}

# Iterate through each senator
for senator in senators:
    # Get the handle of the current senator
    senator_handle = senator['handle']

    # Store the path to the saved json file for this senator
    filename = f'part1_data/{senator_handle}_feed_data.json'

    # Get the data from the current senator's json file
    with open(filename, 'r') as f:
        data = json.load(f)

        # Get the list of all accounts the current senator follows
        all_follows = data.get('follows', [])

        # Set of senator handles to make it quicker to lookup who the current 
        # senator follows
        followed_senators = set()

        # Iterate through all of the accounts the current senator follows
        for account in all_follows:
            # Get the handle of the current account
            account_handle = account.get('handle')

            # Check if that account handle belongs to a senator
            if account_handle in all_senator_handles: 
                # If it does, then add it to the list
                followed_senators.add(account_handle)

        # Add all of the senators that the current senator follows to the graph
        senator_follows_graph[senator_handle] = followed_senators

In [11]:
# ============================================================================
# 2. For each non-followed senator, compute the recommendation score (number 
#    of followed senators who follow them)
# ============================================================================

# List that will store the senator and their top 3 recommendations
results = []

# Dictionary that will contains senators that meet any of the conditions
special_notes = {
    'follows_all': [],
    'follows_zero': [],
    'followed_by_all': [],
    'followed_by_zero': []
}

# Calculate the number of accounts following each senator
followed_by_counts = {h: 0 for h in all_senator_handles}

# Iterate through all of senators (followers) and the set of people they follow
for follower, following_set in senator_follows_graph.items():
    # Iterate through each person they follow
    for followed in following_set: 
        # Add one to the followed by count (number of people who follows them) 
        # for that senator
        followed_by_counts[followed] += 1

# Iterate through each senator
for A in senators:
    # Get the current senator's handle and name
    a_handle = A['handle']
    a_name = A['name']

    # Get the set of people senator A currently follows
    following_set = senator_follows_graph.get(a_handle, set())

    # Check if senator A follows all of the senators
    # NOTE: We need to do -1 because a senator cannot follow themself
    if len(following_set) == len(all_senator_handles) - 1:
        # Add senator A to the follows all list
        special_notes['follows_all'].append(a_name)

        # Format the response for the output recommendations table
        recs_formatted = ['Follows all senators', '', '']
    # Check if senator A follows no other senators    
    # NOTE: If this happens, there will be no recommendations since they have no
    # mutuals to generate recommendations from
    elif len(following_set) == 0:
        # Add senator A to the follows zero list
        special_notes['follows_zero'].append(a_name)

        # Format the response for the output recommendations table
        recs_formatted = ['Follows zero senators', '', '']
    # If senator A does not meet the conditions above, then scores will be 
    # calculated for the group of senators senator A doesn't follow
    else:
        # List will store the scores for the senators senator A doesn't follow
        scores = []

        # Iterate through each senator
        for B in senators:
            # Get senator B's handle
            b_handle = B['handle']

            # Check if senator A is the same as seator B or if senator A already 
            # follows senator B
            # NOTE: continue skips the rest of the current iteration of the for 
            # loop and immediately goes to the next iteration 
            if a_handle == b_handle or b_handle in following_set:
                continue
            
            # Initialize the score
            # NOTE: score represents the number of senator C's that exists where
            # A follows C and C follows B
            score = 0

            # Iterate through every senator A follows (this will be senator C)
            for c_handle in following_set:
                # Get the set of senators C follows
                c_following_set = senator_follows_graph.get(c_handle, set())

                # Check if B is in C's following set of senators
                if b_handle in c_following_set:
                    # Add one to the score since this meets the recommendation 
                    # score requirements
                    score += 1

            # Check if the score is greater than 0
            if score > 0:
                # Add the name of senator B and their recommendation score for 
                # senator A 
                scores.append((B['name'], score))

        # Sort the scores in descending order 
        # NOTE: A higher score means that senator A and senator B have more 
        # mutuals, index of 1 in scores contains the score for sorting key
        scores.sort(key=lambda x: x[1], reverse=True)

        # List will contain the format the top 3 recommendations for the output 
        # table
        recs_formatted = []

        # Get the top 3 recommendations for senator A
        top_3 = scores[:3]

        # Iterate through the name and score for the top 3 recommendations
        for name, score in top_3:
            # Add them to the formatted recommendations
            recs_formatted.append(f'{name} ({score})')

        # If there are less than 3 top recommendations, add empty strings to the 
        # recommendation format
        while len(recs_formatted) < 3:
            recs_formatted.append('')

    # Add the recommendations as a dictionary to the results list for senator A
    results.append({
        'Senator': a_name,
        'Recommendation 1': recs_formatted[0], 
        'Recommendation 2': recs_formatted[1],
        'Recommendation 3': recs_formatted[2]
    })

# Iterate through the handle and number of accounts following that sentator
for handle, count in followed_by_counts.items():
    # Get the name of the senator from their handle
    # NOTE: next() stops iterating through the for loop once the handle matches
    name = next(s['name'] for s in senators if s['handle'] == handle)

    # Check if the number of accounts following the current senator is the same 
    # as the number of senators on Bluesky
    # NOTE: We need to do -1 because a senator cannot follow themself
    if count == len(all_senator_handles) - 1:
        # Add the senator's name to the followed by all list
        special_notes['followed_by_all'].append(name)
    # Check if the number of accounts following the current senator is 0
    if count == 0:
        # Add the senator's name to the followed by zero list
        special_notes['followed_by_zero'].append(name)

Output

Generate a **table** in your report (programmatically, not by hand) with the following format:

| **Senator** | **Recommendation 1** | **Recommendation 2** | **Recommendation 3** |
| :--- | :--- | :--- | :--- |
| Bernie Sanders | Chuck Schumer (8) | Amy Klobuchar (7) | ... |
| Elizabeth Warren | *follows all senators* | &nbsp; | &nbsp; |
| ... | ... | ... | ... |

The number in parentheses is the recommendation score (mutual follows count).

Additionally, identify:

- Which senators (if any) **follow all** or **follow zero** other senators in the dataset
- Which senators (if any) are **followed by all** or **followed by zero** other senators in the dataset

(Note: patterns in recommendations, network centrality, and disconnected senators are the subject of questions in Section I.4.)

In [15]:
# ============================================================================
# 3. Report the top 3 recommendations with their scores
# ============================================================================

# Create a dataframe using the results from 2 and output the table
df_recs = pd.DataFrame(results)

# Change the indexing in the table to start at 1 instead of 0
df_recs.index = df_recs.index + 1

# Output the table
display(df_recs)

# Separate the list of senators for each of the edge cases
# NOTE: join() takes a list and puts a specific separator between each item
follows_all_str = ', '.join(special_notes['follows_all'])
follows_zero_str = ', '.join(special_notes['follows_zero'])
followed_by_all_str = ', '.join(special_notes['followed_by_all'])
followed_by_zero_str = ', '.join(special_notes['followed_by_zero'])

# Print out the findings from the edge cases
# NOTE: If the list is empty, we print None
print(f'Senators who follow everyone: ' +
      f'{follows_all_str if follows_all_str else 'None'}')
print(f'Senators who follow no one: ' +
      f'{follows_zero_str if follows_zero_str else 'None'}')
print(f'Senators followed by everyone: ' +
      f'{followed_by_all_str if followed_by_all_str else 'None'}')
print(f'Senators followed by no one: ' +
      f'{followed_by_zero_str if followed_by_zero_str else 'None'}')

Unnamed: 0,Senator,Recommendation 1,Recommendation 2,Recommendation 3
1,Tammy Baldwin,John Fetterman (27),Dick Durbin (13),Jack Reed (12)
2,Patty Murray,Dick Durbin (13),Jack Reed (11),Mazie Hirono (11)
3,Maria Cantwell,Chuck Schumer (22),Chris Murphy (16),Amy Klobuchar (15)
4,Mark Warner,Patty Murray (18),Martin Heinrich (18),Cory Booker (18)
5,Tim Kaine,Mazie Hirono (13),Jeff Merkley (5),Maggie Hassan (4)
6,Bernie Sanders,Follows zero senators,,
7,Sheldon Whitehouse,Elissa Slotkin (17),Jeff Merkley (5),Maggie Hassan (4)
8,Jack Reed,Andy Kim (15),Elissa Slotkin (14),Dick Durbin (13)
9,John Fetterman,Follows zero senators,,
10,Ron Wyden,Elissa Slotkin (15),Mazie Hirono (11),Chris Coons (11)


Senators who follow everyone: None
Senators who follow no one: Bernie Sanders, John Fetterman, Jeff Merkley, Kirsten Gillibrand, Maggie Hassan, Chris Murphy, Mark Kelly
Senators followed by everyone: None
Senators followed by no one: Kirsten Gillibrand


#### I.3 Echo Chamber Analysis

Examine the degree to which senators’ feeds overlap using two complementary measures:

Follow Jaccard Similarity

For each pair of senators, compute the Jaccard similarity of the accounts they follow (all accounts, not just senators):

`Jaccard(A, B) = |follows_A ∩ follows_B| / |follows_A ∪ follows_B|`

This measures overlap in terms of who the senators follow, treating all followed accounts equally regardless of how active they are.

Post Jaccard Similarity

The follow Jaccard treats a dormant account the same as one posting 50 times per day, and doesn’t necessarily reflect similarities in their user experience. To capture similarity in what senators *actually* see, define and compute a suitable Jaccard coefficient over post in their feed in the last 24 hours.

Each of these measures (follow Jaccard, post Jaccard) can be interpreted as similarity measures between the senators. Computing these measures for all the senators in the dataset forms two similarity matrices.

Visualization and analysis

1. Create **two heatmaps**: one for follow Jaccard similarity, one for post Jaccard similarity. You may present these as two separate figures or side-by-side in a single figure for easy comparison.
2. **Sort the rows and columns** so that similar senators are adjacent. Use hierarchical clustering to determine the ordering (see `scipy.cluster.hierarchy.linkage` and `dendrogram`), or (if you want to explore alternative visualization techniques) explore some other sorting algorithm. An unsorted heatmap with arbitrary row/column order is much harder to interpret — clusters appear scattered rather than as visible blocks along the diagonal. Use the same ordering for both heatmaps to enable direct comparison.
3. **Compare and contrast the two matrices**: What do they show? Interpret any clustering patterns you see. Where do the two matrices differ? Senators with similar follows but different posts might follow the same accounts but experience very different information environments based on posting frequency.

**Suggested questions to consider** (you don’t need to answer all or any of these, but they may guide your interpretation): Do senators from the same state follow similar accounts? Do female senators follow systematically different accounts than male senators? Are there identifiable “information bubbles”? How does weighting by post volume change the similarity structure?