# Getting Bsky Data
Sara Rosenau 2/21/2025  
This notebook will feature me trying to learn the Bsky API and collect some data from it

## Playing around with the api a bit
- First, import the api (`atproto`) and start a new client session

In [1]:
from atproto import Client, client_utils

In [2]:
# doing this for some measure of privacy with regards to my password
fname = r'private\pw.txt'
file = open(fname, 'r')
pw = file.read()
file.close()

In [None]:
client = Client()
client.login('sararosenauling.bsky.social', pw)

### Making a post from the api!

In [4]:
#client.send_post(text='testing api') # this works but commenting it out so I don't post every time I run it

### Getting posts from my timeline
- Following [this example](https://github.com/MarshalX/atproto/blob/main/examples/home_timeline.py)
- Getting 10 posts from my following feed

In [5]:
def main() -> None:
    print('Home (Following):\n')

    timeline = client.get_timeline(limit=10, algorithm='reverse-chronological')
    for feed_view in timeline.feed:
        action = 'New Post'
        if feed_view.reason:
            action_by = feed_view.reason.by.handle
            action = f'Reposted by @{action_by}'

        post = feed_view.post.record
        author = feed_view.post.author

        print(f'[{action}] {author.display_name}: {post.text}')

In [6]:
main()

Home (Following):

[New Post] Dave Kamper: Indeed.
[New Post] Fredo Fabrucci: The Poster's Poster™: No but if he dies I will do my duty
[Reposted by @gymclasswarfare.bsky.social] Lily: Ursula K. Le Guin on the use of language. She never missed
[Reposted by @olufemiotaiwo.bsky.social] The 51st: It's been one year since DCist was shut down. We're proud of what we've built in that time with all of you. If you'd indulge us in some reminiscing, plus dreaming of the future, we'd love to hear your thoughts: docs.google.com/forms/d/e/1F...
[Reposted by @olufemiotaiwo.bsky.social] jamelle: RIP to one of the bona fide greats
[New Post] Fredo Fabrucci: The Poster's Poster™: 
[New Post] Godspeed You! Woke Moralists: oh so then yeah that's totally likely actually, especially since she's already deep in shit with funky missing money that went to X-Wings for domestic terrorists
[Reposted by @paleofuture.bsky.social] John Rogers: It is truly amazing how much of the US right now is driven by people who

### Trying a search
- just trying to get a hang of what is going on with the `client.app.bsky.feed.search_posts()`
  - turns out that it returns a Response type object and the api is built on something called "pydantic"
  - `object.model_dump()` returns a dictionary with a list of posts and then a dictionary for each post containing a lot of metadata
  - with a for loop and some indexing we can get all the info we need!

In [7]:
import requests

In [8]:
def print_search(query):
    print(query, 'Search Results:\n')
    results = client.app.bsky.feed.search_posts({'q': query, 'limit': 10, 'sort': 'top'})
    results_dict = results.model_dump()
    for post in results_dict['posts']:
        text = post['record']['text']
        author = post['author']['handle']
        print(author+':', text+'\n')

In [9]:
print_search('unalive')

unalive Search Results:

maiamindel.bsky.social: "Orange Man Bad" and "Unalive the Boer" are universal truths that all living beings are innately attuned to. Like a message from the Creator

baratiddys.bsky.social: postponing my unalive

kingserpentico.bsky.social: I *AM* on vacation and people still wanna unalive me at work.

authorreneeb.bsky.social: Good morning to everyone except Asian Doll who openly admitted to trying to unalive Kash Doll over a damn name during BHM. 🤦🏿‍♀️

aliothfox.ursamajorartworks.com: Y'all, please stop saying "unalive" and "self-delete" and so forth. There's no algorithm to fight here and a lot of people have the word "suicide" muted for a reason. The euphemisms are neither cute nor helpful.

vitaminpac1.bsky.social: Why did I marry someone who picks the WORST FUCKING MOVIES

My god I either want to die of boredom, want to unalive myself, or am too confused to decide

usakoing.bsky.social: Chat if I unalive myself

verynormalguy.bsky.social: wait, people ac

## Getting some actual data
- Building a function called `search2df_top()` that is pretty similar to the `print_search()` function above
    - makes a list of dictionaries for each post containing the metadata I might want for each search term
    - then converts it into a pandas DataFrame!
    - This function just looks at the top 30 responses, might make a version to get the latest 30 as well, if I feel I need more data
- The terms I use in my algospeak paper are as follows:
   - *unalive* - kill, murder, suicide
   - *seggs* - sex
   - *grape* - rape
   - *palm-colored* - white
   - *watermelon* - Palestine  
Since I'm already working with these terms, I might as well keep going with this set

In [10]:
import pandas as pd
def search2df_top(query):
    results = client.app.bsky.feed.search_posts({'q': query, 'limit': 30, 'sort': 'top'})
    results_dict = results.model_dump()
    query_data = []
    for post in results_dict['posts']:
        metadata = {}
        metadata['text'] = post['record']['text']
        metadata['author'] = post['author']['handle']
        metadata['display_name'] = post['author']['display_name']
        metadata['date'] = post['record']['created_at']
        metadata['likes'] = post['like_count']
        metadata['quotes'] = post['quote_count']
        metadata['replies'] = post['reply_count']
        metadata['reposts'] = post['repost_count']
        metadata['uri'] = post['uri']
        metadata['query'] = query
        query_data.append(metadata)
    query_df = pd.DataFrame(query_data)
    return query_df

In [11]:
unalive_top_df = search2df_top('unalive')

In [12]:
unalive_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"""Orange Man Bad"" and ""Unalive the Boer"" are un...",maiamindel.bsky.social,Maia,2025-02-23T17:07:01.256Z,91,1,1,17,at://did:plc:ur77nun2q74loi34r2e6r43u/app.bsky...,unalive
1,postponing my unalive,baratiddys.bsky.social,Luis 🤍,2025-02-20T16:47:35.943Z,21,0,1,1,at://did:plc:dluxclbmnsh3bt6wyih5l6ds/app.bsky...,unalive
2,I *AM* on vacation and people still wanna unal...,kingserpentico.bsky.social,SNAKEMAN,2025-02-20T02:26:55.167Z,61,0,3,1,at://did:plc:pv4626x5y7zxxsupflujztqb/app.bsky...,unalive
3,Good morning to everyone except Asian Doll who...,authorreneeb.bsky.social,ReneeB,2025-02-24T12:32:05.544Z,19,1,1,1,at://did:plc:ng6mdz23xa3jae4yr2crgocy/app.bsky...,unalive
4,"Y'all, please stop saying ""unalive"" and ""self-...",aliothfox.ursamajorartworks.com,Alioth Daddyfox,2025-02-21T17:55:36.580Z,115,2,7,28,at://did:plc:5mkojgjmjfhdpd5lvepg2q6h/app.bsky...,unalive


In [13]:
seggs_top_df = search2df_top('seggs')

In [14]:
seggs_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"Two fishies, one seggs 🐟🐟",presialexander.bsky.social,PresiAlexander 🐬🐍🔞,2025-02-23T20:53:43.308Z,191,0,2,35,at://did:plc:icwzpfib5cfcflnwlmlhu7kc/app.bsky...,seggs
1,Ruff seggs - Comm from the end of 2024 ❣,sinnamonlatte.bsky.social,Sinnamon,2025-02-21T22:11:08.273Z,544,0,4,85,at://did:plc:3e76i34nzifdoqnk2z2cxp3t/app.bsky...,seggs
2,FOXGIRL SEGGS,ciosart.bsky.social,Cios,2025-02-19T22:54:52.685Z,82,0,2,21,at://did:plc:ncvjakureeu2dqzvf3ub72uc/app.bsky...,seggs
3,Lucille and Aurelia get a very interesting mes...,cslucaris.bsky.social,cslucaris,2025-02-23T17:20:50.775Z,66,0,1,2,at://did:plc:pkibjkrefiicsbzviu4tnvax/app.bsky...,seggs
4,Drawing some nasty furry seggs...,dicsaw.bsky.social,Dicsaw (COMMS CLOSED 3/3) 🇧🇷🔞,2025-02-22T16:40:27.962Z,19,0,1,0,at://did:plc:oyzj4pdrxg66qpuycyxwl7zh/app.bsky...,seggs


In [15]:
grape_top_df = search2df_top('grape')
grape_top_df.head()
# this one might be harder to find uses as algospeak

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"It’s 65 degrees out, Portia is under the weath...",angryblacklady.bsky.social,Imani Gandy,2025-02-23T20:05:34.921Z,1079,5,39,12,at://did:plc:fvzkql2aqtbk7qmqjkoo2lv2/app.bsky...,grape
1,Punishing myself by eating Grape Nuts for brea...,steamymac.bsky.social,Tony ChocoLonely,2025-02-23T15:30:52.500Z,221,4,33,51,at://did:plc:qt4x3vjjrllbji532qk4yxig/app.bsky...,grape
2,My grape dragon still needs a name!!! He guard...,gummyforrest.bsky.social,Forrest,2025-02-23T16:56:14.975Z,1058,0,12,222,at://did:plc:cvvqe2swzug4z5cemdcisrtt/app.bsky...,grape
3,Grape Nuts - grape nuts is trans,kennedytcooper.bsky.social,Themperor Kennedy🐸🏳️‍🌈,2025-02-21T14:16:23.785Z,101,0,1,7,at://did:plc:shk3rptrsj34wkl34djcub4o/app.bsky...,grape
4,Just opened a box of Grape-Nuts. I haven’t bee...,saltymactavish.bsky.social,Salty MacTavish,2025-02-23T13:23:32.167Z,196,0,17,45,at://did:plc:eklnz5ml4ge3tnpj4cmkpbgm/app.bsky...,grape


In [16]:
palm_colored_top_df = search2df_top('palm colored')
palm_colored_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,IBC Palm Springs fit check! Getting ready to l...,simonyoung.bsky.social,Simon Young 🏳️‍🌈🎮,2025-02-21T19:39:40.110Z,59,0,6,1,at://did:plc:2xlpftmc4nmp7qo4ma3fqds4/app.bsky...,palm colored
1,"Respectfully submitted for your approval for, ...",paulmollon.bsky.social,Straight Jacket Dreamer,2025-02-20T23:34:56.331Z,50,0,2,2,at://did:plc:kqv5ikhundppmkd2qmgbxk5r/app.bsky...,palm colored
2,Last night tonight\n\n“And there were so many ...,mariannedenton.bsky.social,Marianne Denton,2025-02-17T23:52:20.944Z,64,0,2,9,at://did:plc:xrxbycbachtxi4gjqk3f4rzb/app.bsky...,palm colored
3,"And just like that, our first egg of the year!",itsjustjustin.com,Justin,2025-02-18T23:47:52.352Z,49,1,2,3,at://did:plc:7awbrbkbjnbwjz73bevgtk6l/app.bsky...,palm colored
4,"I’m here, in spirit 🏝️\n#photography\n#nature",tony421.bsky.social,TonyM,2025-02-16T20:52:25.782Z,193,0,5,9,at://did:plc:726oqz4xcs3be2y7xn4wvtg3/app.bsky...,palm colored


In [17]:
palm_colored_top_df.text[0] # might have the same problem here

'IBC Palm Springs fit check! Getting ready to leave soon 🌴'

In [18]:
watermelon_top_df = search2df_top('watermelon')
watermelon_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,RASPBERRY WATERMELON LEMONADE REFRESHER FROM D...,pawrincess.on.computer,vera ᰔ,2025-02-22T19:10:57.849Z,66,0,6,1,at://did:plc:pm6drkowdy4ex3zftltyhi7i/app.bsky...,watermelon
1,A farm worker shared this photo with us from a...,ufw.bsky.social,United Farm Workers,2025-02-21T15:00:35.298Z,567,9,14,114,at://did:plc:53pxyzwrw4zx5ft67czyyjyy/app.bsky...,watermelon
2,Watermelon Dinosaur Plushie\n\n#plushie #handm...,shoplalisa.bsky.social,Shop LaLisa,2025-02-22T16:44:55.172Z,24,0,2,1,at://did:plc:2ao34vz33cquabmhevrc2jub/app.bsky...,watermelon
3,And if I make one of those watermelon videos?,arcbreak.bsky.social,Arc,2025-02-22T15:31:46.390Z,45,0,3,1,at://did:plc:tbtvvcvghlofiyz7r2vmlpwm/app.bsky...,watermelon
4,#DailyBat 🖤🦇🖤,afizgig.bsky.social,AlTheColorsOfTheDark🖤🦇🪦,2025-02-23T13:57:15.347Z,61,0,0,6,at://did:plc:sfichlc25a4wuzsqyqauuvq3/app.bsky...,watermelon


## Making it into a CSV
- Concatenating all the dfs into one, then exporting into a csv

In [19]:
frames = [unalive_top_df, seggs_top_df, grape_top_df, palm_colored_top_df, watermelon_top_df]

In [20]:
algospeak_top_df = pd.concat(frames)

In [21]:
algospeak_top_df.shape

(148, 10)

In [28]:
# not sure why this happened?
for frame in frames:
    print(frame.shape)

(29, 10)
(30, 10)
(30, 10)
(30, 10)
(29, 10)


In [25]:
algospeak_top_df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
0,"""Orange Man Bad"" and ""Unalive the Boer"" are un...",maiamindel.bsky.social,Maia,2025-02-23T17:07:01.256Z,91,1,1,17,at://did:plc:ur77nun2q74loi34r2e6r43u/app.bsky...,unalive
1,postponing my unalive,baratiddys.bsky.social,Luis 🤍,2025-02-20T16:47:35.943Z,21,0,1,1,at://did:plc:dluxclbmnsh3bt6wyih5l6ds/app.bsky...,unalive
2,I *AM* on vacation and people still wanna unal...,kingserpentico.bsky.social,SNAKEMAN,2025-02-20T02:26:55.167Z,61,0,3,1,at://did:plc:pv4626x5y7zxxsupflujztqb/app.bsky...,unalive
3,Good morning to everyone except Asian Doll who...,authorreneeb.bsky.social,ReneeB,2025-02-24T12:32:05.544Z,19,1,1,1,at://did:plc:ng6mdz23xa3jae4yr2crgocy/app.bsky...,unalive
4,"Y'all, please stop saying ""unalive"" and ""self-...",aliothfox.ursamajorartworks.com,Alioth Daddyfox,2025-02-21T17:55:36.580Z,115,2,7,28,at://did:plc:5mkojgjmjfhdpd5lvepg2q6h/app.bsky...,unalive


In [26]:
algospeak_top_df.tail()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,query
24,I'm gonna start selling strong indica gummies ...,cerromerussell.bsky.social,MonsterKing,2025-02-21T12:44:36.371Z,45,5,3,7,at://did:plc:gea43dvjdqehvtuewqvlq3df/app.bsky...,watermelon
25,Sandia (Watermelon) Mountains ~\nAfter a fresh...,swphotographer.bsky.social,Susan,2025-02-24T18:29:34.197Z,16,0,0,1,at://did:plc:ejmqr7phzk2knkvor5cit7tt/app.bsky...,watermelon
26,This is the sexiest watermelon I’ve ever seen\...,xenionx.bsky.social,Leonardo Decapitator,2025-02-24T03:39:58.719Z,7,0,2,1,at://did:plc:zs7ipxozo7uk44alreempiiy/app.bsky...,watermelon
27,You guys think that watermelon is okay?\n\n#Fr...,manwithpez.bsky.social,ManWithPez,2025-02-24T03:54:07.032Z,6,0,1,1,at://did:plc:be2lj522knei72jgshclexfn/app.bsky...,watermelon
28,i should totally crush a watermelon w my thigh...,carnalkitsune.bsky.social,Foxie ♡ (18+),2025-02-20T17:46:14.232Z,34,0,1,3,at://did:plc:gpbafdt2bxardryvwf2gkzms/app.bsky...,watermelon


In [29]:
algospeak_top_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 148 entries, 0 to 28
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   text          148 non-null    object
 1   author        148 non-null    object
 2   display_name  148 non-null    object
 3   date          148 non-null    object
 4   likes         148 non-null    int64 
 5   quotes        148 non-null    int64 
 6   replies       148 non-null    int64 
 7   reposts       148 non-null    int64 
 8   uri           148 non-null    object
 9   query         148 non-null    object
dtypes: int64(4), object(6)
memory usage: 12.7+ KB


In [30]:
algospeak_top_df.to_csv('algospeak_top_posts.csv', index=False)