# Getting Bsky Data
Sara Rosenau 2/21/2025  
This notebook will feature me trying to learn the Bsky API and collect some data from it

**Contents**
- [1 Logging into the api](#1-Logging-into-the-api)
- [2 Getting data](#2-Getting-data)
- [3 Making it into a CSV](#3-Making-it-into-a-CSV)
- [4 Data Stats](#4-Data-Stats)

## 1 Logging into the api
- First, import the api (`atproto`) and start a new client session

In [1]:
from atproto import Client, client_utils

In [2]:
# doing this for some measure of privacy with regards to my password
fname = r'private\pw.txt'
file = open(fname, 'r')
pw = file.read()
file.close()

In [None]:
client = Client()
client.login('sararosenauling.bsky.social', pw)

## 2 Getting data
- Building a function called `search2df_top()` that is pretty similar to the `print_search()` function above
    - makes a list of dictionaries for each post containing the metadata I might want for each search term such as author, date, likes, reposts, etc.
    - then converts it into a pandas DataFrame!
    - This function just looks at the top 30 responses, might make a version to get the latest 30 as well, if I feel I need more data
- The terms I use in my algospeak paper are as follows:
   - *unalive* - kill, murder, suicide
   - *seggs* - sex
   - *grape* - rape
   - *palm-colored* - white
   - *watermelon* - Palestine  
Since I'm already working with these terms, I might as well keep going with this set

In [10]:
import pandas as pd
def search2df_top(query):
    results = client.app.bsky.feed.search_posts({'q': query, 'limit': 30, 'sort': 'top'})
    results_dict = results.model_dump()
    query_data = []
    for post in results_dict['posts']:
        metadata = {}
        metadata['text'] = post['record']['text']
        metadata['author'] = post['author']['handle']
        metadata['display_name'] = post['author']['display_name']
        metadata['date'] = post['record']['created_at']
        metadata['likes'] = post['like_count']
        metadata['quotes'] = post['quote_count']
        metadata['replies'] = post['reply_count']
        metadata['reposts'] = post['repost_count']
        metadata['uri'] = post['uri']
        metadata['query'] = query
        query_data.append(metadata)
    query_df = pd.DataFrame(query_data)
    return query_df

In [11]:
unalive_top_df = search2df_top('unalive')

In [12]:
unalive_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"""Orange Man Bad"" and ""Unalive the Boer"" are un...",maiamindel.bsky.social,Maia,2025-02-23T17:07:01.256Z,92,1,1,18,at://did:plc:ur77nun2q74loi34r2e6r43u/app.bsky...,unalive
1,please for the love of god can we NOT normalis...,adrierising.bsky.social,adrie rose 🇯🇲,2025-02-25T03:25:40.832Z,83,0,1,10,at://did:plc:uajcdhsabyf4t7a3qgclm55x/app.bsky...,unalive
2,Good morning to everyone except Asian Doll who...,authorreneeb.bsky.social,ReneeB,2025-02-24T12:32:05.544Z,19,1,1,2,at://did:plc:ng6mdz23xa3jae4yr2crgocy/app.bsky...,unalive
3,Why did I marry someone who picks the WORST FU...,vitaminpac1.bsky.social,Vitamin Bee 🐝,2025-02-24T01:18:42.171Z,22,0,7,1,at://did:plc:37nxbbnnrozlllwyfngsalya/app.bsky...,unalive
4,postponing my unalive,baratiddys.bsky.social,Luis 🤍,2025-02-20T16:47:35.943Z,21,0,1,1,at://did:plc:dluxclbmnsh3bt6wyih5l6ds/app.bsky...,unalive


In [13]:
seggs_top_df = search2df_top('seggs')

In [14]:
seggs_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"Two fishies, one seggs 🐟🐟",presialexander.bsky.social,PresiAlexander 🐬🐍🔞,2025-02-23T20:53:43.308Z,207,0,2,41,at://did:plc:icwzpfib5cfcflnwlmlhu7kc/app.bsky...,seggs
1,Ruff seggs - Comm from the end of 2024 ❣,sinnamonlatte.bsky.social,Sinnamon,2025-02-21T22:11:08.273Z,551,0,4,85,at://did:plc:3e76i34nzifdoqnk2z2cxp3t/app.bsky...,seggs
2,need some intimacy soon & im not talking about...,naturallylauraj.bsky.social,✨ Optimus Fine ✨,2025-02-25T05:47:02.401Z,21,1,0,6,at://did:plc:dvx2lh53zsz3nxuxp5gomqiu/app.bsky...,seggs
3,Lucille and Aurelia get a very interesting mes...,cslucaris.bsky.social,cslucaris,2025-02-23T17:20:50.775Z,66,0,1,2,at://did:plc:pkibjkrefiicsbzviu4tnvax/app.bsky...,seggs
4,FOXGIRL SEGGS,ciosart.bsky.social,Cios,2025-02-19T22:54:52.685Z,82,0,2,21,at://did:plc:ncvjakureeu2dqzvf3ub72uc/app.bsky...,seggs


In [15]:
grape_top_df = search2df_top('grape')
grape_top_df.head()
# this one might be harder to find uses as algospeak

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"It’s 65 degrees out, Portia is under the weath...",angryblacklady.bsky.social,Imani Gandy,2025-02-23T20:05:34.921Z,1081,5,39,12,at://did:plc:fvzkql2aqtbk7qmqjkoo2lv2/app.bsky...,grape
1,Punishing myself by eating Grape Nuts for brea...,steamymac.bsky.social,Tony ChocoLonely,2025-02-23T15:30:52.500Z,224,4,33,52,at://did:plc:qt4x3vjjrllbji532qk4yxig/app.bsky...,grape
2,triscuits are the grape nuts of crackers,ygrene.bsky.social,Ygrene,2025-02-24T21:17:18.159Z,109,0,15,17,at://did:plc:mkkvzj3q3pegqam2n7yuxbwy/app.bsky...,grape
3,My grape dragon still needs a name!!! He guard...,gummyforrest.bsky.social,Forrest,2025-02-23T16:56:14.975Z,1129,0,12,234,at://did:plc:cvvqe2swzug4z5cemdcisrtt/app.bsky...,grape
4,Grape Nuts - grape nuts is trans,kennedytcooper.bsky.social,Themperor Kennedy🐸🏳️‍🌈,2025-02-21T14:16:23.785Z,104,0,1,7,at://did:plc:shk3rptrsj34wkl34djcub4o/app.bsky...,grape


In [16]:
palm_colored_top_df = search2df_top('palm colored')
palm_colored_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,IBC Palm Springs fit check! Getting ready to l...,simonyoung.bsky.social,Simon Young 🏳️‍🌈🎮,2025-02-21T19:39:40.110Z,59,0,6,1,at://did:plc:2xlpftmc4nmp7qo4ma3fqds4/app.bsky...,palm colored
1,"Respectfully submitted for your approval for, ...",paulmollon.bsky.social,Straight Jacket Dreamer,2025-02-20T23:34:56.331Z,59,0,4,3,at://did:plc:kqv5ikhundppmkd2qmgbxk5r/app.bsky...,palm colored
2,Last night tonight\n\n“And there were so many ...,mariannedenton.bsky.social,Marianne Denton,2025-02-17T23:52:20.944Z,64,0,2,9,at://did:plc:xrxbycbachtxi4gjqk3f4rzb/app.bsky...,palm colored
3,"And just like that, our first egg of the year!",itsjustjustin.com,Justin,2025-02-18T23:47:52.352Z,49,1,2,3,at://did:plc:7awbrbkbjnbwjz73bevgtk6l/app.bsky...,palm colored
4,Sniff sniff and nite nite from me and Ember! #...,auntienome.bsky.social,Naomi J.,2025-02-20T07:39:44.032Z,19,0,0,2,at://did:plc:37sgocwq62qjaavnbwekf6e5/app.bsky...,palm colored


In [17]:
palm_colored_top_df.text[0] # might have the same problem here

'IBC Palm Springs fit check! Getting ready to leave soon 🌴'

In [18]:
watermelon_top_df = search2df_top('watermelon')
watermelon_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,RASPBERRY WATERMELON LEMONADE REFRESHER FROM D...,pawrincess.on.computer,vera ᰔ,2025-02-22T19:10:57.849Z,66,0,6,1,at://did:plc:pm6drkowdy4ex3zftltyhi7i/app.bsky...,watermelon
1,Sandia (Watermelon) Mountains ~\nAfter a fresh...,swphotographer.bsky.social,Susan,2025-02-24T18:29:34.197Z,29,0,1,1,at://did:plc:ejmqr7phzk2knkvor5cit7tt/app.bsky...,watermelon
2,A farm worker shared this photo with us from a...,ufw.bsky.social,United Farm Workers,2025-02-21T15:00:35.298Z,567,9,14,114,at://did:plc:53pxyzwrw4zx5ft67czyyjyy/app.bsky...,watermelon
3,Watermelon Dinosaur Plushie\n\n#plushie #handm...,shoplalisa.bsky.social,Shop LaLisa,2025-02-22T16:44:55.172Z,25,0,2,1,at://did:plc:2ao34vz33cquabmhevrc2jub/app.bsky...,watermelon
4,And if I make one of those watermelon videos?,arcbreak.bsky.social,Arc,2025-02-22T15:31:46.390Z,46,0,3,1,at://did:plc:tbtvvcvghlofiyz7r2vmlpwm/app.bsky...,watermelon


## 3 Making it into a CSV
- Concatenating all the dfs into one, then exporting into a csv

In [19]:
frames = [unalive_top_df, seggs_top_df, grape_top_df, palm_colored_top_df, watermelon_top_df]

In [20]:
algospeak_top_df = pd.concat(frames)

In [21]:
algospeak_top_df.shape

(148, 10)

In [22]:
# not sure why this happened?
for frame in frames:
    print(frame.shape)

(29, 10)
(30, 10)
(30, 10)
(30, 10)
(29, 10)


In [23]:
algospeak_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"""Orange Man Bad"" and ""Unalive the Boer"" are un...",maiamindel.bsky.social,Maia,2025-02-23T17:07:01.256Z,92,1,1,18,at://did:plc:ur77nun2q74loi34r2e6r43u/app.bsky...,unalive
1,please for the love of god can we NOT normalis...,adrierising.bsky.social,adrie rose 🇯🇲,2025-02-25T03:25:40.832Z,83,0,1,10,at://did:plc:uajcdhsabyf4t7a3qgclm55x/app.bsky...,unalive
2,Good morning to everyone except Asian Doll who...,authorreneeb.bsky.social,ReneeB,2025-02-24T12:32:05.544Z,19,1,1,2,at://did:plc:ng6mdz23xa3jae4yr2crgocy/app.bsky...,unalive
3,Why did I marry someone who picks the WORST FU...,vitaminpac1.bsky.social,Vitamin Bee 🐝,2025-02-24T01:18:42.171Z,22,0,7,1,at://did:plc:37nxbbnnrozlllwyfngsalya/app.bsky...,unalive
4,postponing my unalive,baratiddys.bsky.social,Luis 🤍,2025-02-20T16:47:35.943Z,21,0,1,1,at://did:plc:dluxclbmnsh3bt6wyih5l6ds/app.bsky...,unalive


In [24]:
algospeak_top_df.tail()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
24,🧀 This in-progress moment gave me a giggle. 🧀,sewfresh.bsky.social,Antonia Keithahn,2025-02-24T03:57:01.198Z,26,0,3,0,at://did:plc:ysv76nbh2f3rssqtlwheysij/app.bsky...,watermelon
25,The Disregarded Watermelon.\n\nOne of my favor...,silverystars.bsky.social,Silverystars,2025-02-22T04:57:50.866Z,98,0,2,2,at://did:plc:ubjspcly5oqn54rzvbz2gbaf/app.bsky...,watermelon
26,This is the sexiest watermelon I’ve ever seen\...,xenionx.bsky.social,Leonardo Decapitator,2025-02-24T03:39:58.719Z,7,0,2,1,at://did:plc:zs7ipxozo7uk44alreempiiy/app.bsky...,watermelon
27,Sour watermelon slices seem to have not been m...,localswampgay.bsky.social,localswampgay,2025-02-20T17:57:30.577Z,45,0,4,0,at://did:plc:7ssingpiouvbnyjlvtd4ivcn/app.bsky...,watermelon
28,Watermelon coded for @/TheEvieka,nise-loftsteinn.bsky.social,Nise Loftsteinn,2025-02-19T22:48:37.728Z,60,0,1,8,at://did:plc:nybwkmje4bss3vna7lwoo3l5/app.bsky...,watermelon


In [25]:
# algospeak_top_df.to_csv('algospeak_top_posts.csv', index=False)

## 4 Data Stats

In [26]:
algospeak_top_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 148 entries, 0 to 28
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   text          148 non-null    object
 1   author        148 non-null    object
 2   display_name  148 non-null    object
 3   date          148 non-null    object
 4   likes         148 non-null    int64 
 5   quotes        148 non-null    int64 
 6   replies       148 non-null    int64 
 7   reposts       148 non-null    int64 
 8   uri           148 non-null    object
 9   query         148 non-null    object
dtypes: int64(4), object(6)
memory usage: 12.7+ KB


In [27]:
algospeak_top_df.describe()

Unnamed: 0,likes,quotes,replies,reposts
count,148.0,148.0,148.0,148.0
mean,87.851351,0.777027,3.837838,10.925676
std,209.262728,3.799071,9.867871,30.184938
min,3.0,0.0,0.0,0.0
25%,14.0,0.0,0.0,0.0
50%,30.5,0.0,2.0,2.0
75%,76.5,0.0,3.25,9.0
max,1845.0,42.0,106.0,234.0


In [28]:
algospeak_top_df.groupby('query')[['likes', 'quotes', 'replies', 'reposts']].mean()

Unnamed: 0_level_0,likes,quotes,replies,reposts
query,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
grape,210.533333,1.1,10.2,25.633333
palm colored,48.366667,0.3,2.1,4.133333
seggs,62.133333,1.566667,2.0,9.033333
unalive,26.172414,0.413793,1.413793,3.586207
watermelon,90.068966,0.482759,3.37931,12.034483


In [29]:
algospeak_top_df.groupby('query')['likes'].describe()

Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
query,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
grape,30.0,210.533333,407.893642,18.0,35.75,69.5,114.25,1845.0
palm colored,30.0,48.366667,97.600694,3.0,9.0,18.5,43.25,517.0
seggs,30.0,62.133333,104.945547,3.0,12.25,29.5,62.5,551.0
unalive,29.0,26.172414,31.03117,3.0,5.0,14.0,24.0,117.0
watermelon,29.0,90.068966,113.56368,7.0,26.0,53.0,98.0,567.0


Even though these are just the "top" posts, there's still quite a bit of variance between the different terms. It seems like "grape" has the most amount of response to it by far, though I will have to look through it and see if it's being used as an algospeak term in any of these posts. There's a good possibility it isn't, and in that case I may have to rethink which terms I want to use for this project.