<div style="background-color: #002676; padding: 20px;">
<img src="https://live-masters-in-computational-social-science.pantheon.berkeley.edu/wp-content/uploads/2025/04/image-3-2.png" alt="MaCSS" width="200">
</div>

# **Notebook 3:** Interacting with the Bluesky API

[wdtmacss@berkeley.edu](mailto:wdtmacss@berkeley.edu)\
**Computational Social Science 1A**\
[Human Psychology and Social Technologies](https://classes.berkeley.edu/content/2025-fall-compss-214a-001-lec-001) 
Fall 2025\
UC Berkeley [Masters in Computational Social Science](https://macss.berkeley.edu/about/)

**Week 3:** Welcome! Decentralized social media and the trade-offs of open data; getting to know the [Bluesky API](https://docs.bsky.app/).

📱🫣

---
**Table of Contents**

- [Class Summary](#class-summary)
- [Class Discussion](@class-discussion)
- [Introduction to Bluesky](#introduction-to-Bluesky)
- [Today's Lab Session](#lab-session)
---

<div style="padding: 20px;">
<img src="https://upload.wikimedia.org/wikipedia/commons/7/7a/Bluesky_Logo.svg" alt="" width="200">
</div>

# Class Summary
Today we will continue our investigation into the modern landscape of social media platforms and the ways that we can interact with them as data analysts and computational social scientists. 

In [last week's class](https://github.com/ccs-ucb/CSS1AF25/blob/main/notebooks/notebook-2-social-media-tiktok.ipynb) we discussed the challenges assosciated with accessing data from modern social media platforms, and evaluating aspects of their content prioritization algorithms. We explored the use of external audits through experimentation with sock puppet accounts, by diving into a [recent study of TikTok](https://arxiv.org/pdf/2503.20231v1), and complications around use of (even publicly-available) data retreived from social media platforms, by diving into [Reddit's terms of service for research practices](https://www.reddit.com/r/reddit/comments/1co0xnu/sharing_our_public_content_policy_and_a_new/) and the underlying motivations for these changes.

Today, we will begin by talking about the conclusions people reached in the lab session last week. Question 5 of last week's lab posed a question: is the class project I proposed in line with Reddit's policies? We will talk about your answers.

After our discussion, I will introduce Bluesky, a relatively recent addition to the social media landscape notable for its open, distributed design protocols and radically open data acess policies. Most of today's class will be a hands-on lab session in which we get to know [Bluesky's public API](https://docs.bsky.app/docs/category/http-reference) and how to interact with it via http requests in Python.  

---

# Class Discussion
Retreive your notebook from last week's class. Recall Question 5:

> Q5: For this class, I was hoping to make your class projects a combination of Reddit data and LLM analyses. In particular, I was planning to ask everybody to choose a subreddit they enjoy, to download some posts from that subreddit using the API, then pass these posts to a Large Language Model AI system such as ChatGPT, so that we can study how the language model responds to real world questions.

> **Under the current Reddit terms, would such a project be allowable? if not, why not?**

What was your conclusion? What did you learn about Reddit's policy changes? Do you agree with current policies? 

---

# Introduction to Bluesky
<div style="padding: 20px;">
<img src="https://bsky.social/about/editorial-1.webp" alt="" width="200">
<img src="https://bsky.social/about/editorial-2.webp" alt="" width="200">
<img src="https://bsky.social/about/editorial-3.webp" alt="" width="200">
</div>

Bluesky ([https://bsky.app/](https://bsky.app/)) is a decentralized social media platform built around principles of openness. It was initially created by a research team within Twitter, as a way to establish whether in theory, a social media platform with the qualities of Twitter could be run in a decentralized fashion.

The platform is built on an open-source model called the [AT protocol](https://github.com/bluesky-social/atproto). In this model, every user can own their data and choose what algorithms select the contents of their feed (more information here: [https://bsky.social/about]https://bsky.social/about). In principle, the entire platform can function in a decentralized way where no one entity controls data, algorithm, or user expereince. In practice, the current platform depends heavily on services provided by [Bluesky Social](https://bsky.social/about).

Under this model, each user's data is publicly available. It can be a surprise to learn that *everything* you do on Bluesky is publically available -- the people you follow, the people you block or mute, the things you like and repond to, and all of your profile information. There are clear benefits to this openness, but are there also potentially challenging consequences? Here is [an interesting thread on Reddit](https://www.reddit.com/r/DefendingAIArt/comments/1h1pfwh/alpin_releasing_a_dataset_of_two_million_bluesky/) around the ethics of collecting data at scale from Bluesky to train machine learning models. What do you think?

[Bluesky.social](https://en.wikipedia.org/wiki/Bluesky#Corporate_structure) provides a comprehensive public API for data access. Today, we will get to know that API concretely and learn to write code that interacts with the API to retreive different kinds of data.

<div style="padding: 20px;">
<img src="https://upload.wikimedia.org/wikipedia/commons/7/71/Bluesky%E2%80%93AT_Protocol_federation_architecture.svg" alt="" width="800">
</div>

---

# Lab Session: Interacting with the Bluesky API
Today we will get the know the Bluesky Public API. You can consult the [documentation for the HTTP API here](https://docs.bsky.app/docs/category/http-reference). Work through the questions and examples below, editing the notebook and adding your own code. The end goal is to use the API to write a brief summary of a specific Bluesky user (more detail below).

* For this exercise, you are **not allowed to use AI systems for support whatsoever**. This is a coding excersise and the goal is to provide you with an opportunity to struggle with difficult coding problems yourself, to practice writing Python functions, and to understand how to consult documentation to resolve uncertainties. 

* You do not need to complete allo the goals below, focus on doing what you can at a level that is appropriate to your coding experience. If you are new to coding, it is better to spend all of your efforts working on one or two goals to the point that you really understand them, than to go through all the goals superficially. 

* In addiiton to Excersise 1, there is an Advanced excersise that I don't expect many people to attempt. You can ignore this if you are new to coding or don't feel comfortable with the preceding excersises.

## Excersise 1: Acessing information about a user
Today's excersise will revolve around using the API to extract and process information about a specific user on Blusky. To get started, I have provided an example below of a function that makes a call to a public endpoint in the API to retreive a specific user's posts. Try to understand this function line by line. Take time to research any lines that you don't understand. Once you understand how this function works, try to complete some of the goals below.  

In [None]:
import requests

def fetch_recent_posts_by_user(handle, limit=20):
    """
    Fetch up to `limit` most recent posts by a Bluesky user.
    
    Args:
      handle (str): Bluesky handle, e.g., 'alyankovic.bsky.social'.
      limit (int): Number of posts to retrieve.
    
    Returns:
      dict: JSON response containing the user's posts.
    """
    url = "https://public.api.bsky.app/xrpc/app.bsky.feed.getAuthorFeed"
    params = {"actor": handle, "limit": limit}
    resp = requests.get(url, params=params)
    resp.raise_for_status()
    return resp.json()


### Example
Let's collect the most recent 20 posts by Mark Hamill (aka The Joker, Luke Skywalker: https://en.wikipedia.org/wiki/Mark_Hamill)

In [None]:
mark_hamil = "markhamillofficial.bsky.social"
hamil_posts = fetch_recent_posts_by_user(mark_hamil, limit=20)

And print out the response object to look at it's structure. 

In [None]:
import json
print(json.dumps(hamil_posts, indent=2))

### Goal 1: Choose your user and get their posts
Select another Bluesky user you are interested in and access their most recent 10 posts. You can use the function provided above (`fetch_recent_posts_by_user`) to achieve this. 

In [None]:
# Your code goes here.

### Goal 2: Get your user's proflie information
Use the public API endpoint `app.bsky.actor.getProfile` to retreive your user's display name and follower count. Write a new function to achieve this. The function name should be `fetch_profile_for_user`.

In [None]:
# Your code goes here.

### Goal 3: Who does your user follow?
Now let's access the "Social Graph". Use the public API endpoint `app.bsky.graph.getFollows` to retreive a list of the people that your user follows. Write a new function to achieve this. The function name should be `fetch_follows_for_user`. This list could be quite large! If it is, find a way to simplify or to acess only a subset of the people your user follows.    

In [None]:
# Your code goes here.

### Goal 4: Extract the text of your user's posts
Earlier you extracted the N most recent posts by your user. Extract the text of each post and save it into a list. Print out the list so we can see what they've been saying. You can consult [the Bluesky HTTP API Documntation](https://docs.bsky.app/docs/api/app-bsky-feed-get-author-feed) to understand the structure of the response object. You will likely need several functions to achieve this. I suggest splitting up this problem into distinct parts.

In [None]:
# Your code goes here.

## User Report
Who was your user? How active are they on Blusky? What kinds of topics do they post about recently? How many people do they follow? Do the people they follow also follow them? Are the people they follow popular users? 

Write a brief report on your user, referencing the code you used to answer specific questions. 

* Your report goes here.

## Advanced Excersise
Using the API, find 10 users who post about Psychology (or another topic of your choosing). Extract and visualize the network relations between these users. Who follows who? Who reposts who? Using NetworkX or another library, visualize the graph that describes their network of interactions.