# Getting Unalive Data
This notebook will get my second round of data utilizing the code from the [getting_bsky_data](https://github.com/Data-Science-for-Linguists-2025/Algospeak-on-Bluesky/blob/main/getting_bsky_data.ipynb) notebook.

**Contents**
- [1 Logging into the API](#1-Logging-into-the-API)
- [2 Getting connected posts and other metadata](#2-Getting-connected-posts-and-other-metadata)
    - [2.1 Converting URIs to URLs](#2.1-Converting-URIs-to-URLs)
- [3 Getting Posts](#3-Getting-Posts)
  
## 1 Logging into the API

In [1]:
from atproto import Client, client_utils
import pandas as pd

fname = r'private\pw.txt'
file = open(fname, 'r')
pw = file.read()
file.close()

client = Client()
client.login('sararosenauling.bsky.social', pw)

ProfileViewDetailed(did='did:plc:66lizgbz577cdzpwnfyox3g6', handle='sararosenauling.bsky.social', associated=ProfileAssociated(chat=None, feedgens=0, labeler=False, lists=0, starter_packs=0, py_type='app.bsky.actor.defs#profileAssociated'), avatar='https://cdn.bsky.app/img/avatar/plain/did:plc:66lizgbz577cdzpwnfyox3g6/bafkreicddmiczeju5vfi5yxgp2oosshkakweumupixgkofwwkq6jp3nqiq@jpeg', banner='https://cdn.bsky.app/img/banner/plain/did:plc:66lizgbz577cdzpwnfyox3g6/bafkreih3qtqbp27r3sq6jskcn4wmb7oxmbbbt6p3lcmhd5zbqi6ww5k5pm@jpeg', created_at='2023-06-25T12:38:14.621Z', description='she/her - Sociolinguist studying the internet at Pitt! Also into kpop/kdrama, musicals, the Phillies, cross stiching and crocheting, and singing in general\n', display_name='Sara Rosenau', followers_count=948, follows_count=1071, indexed_at='2024-11-11T20:50:23.741Z', joined_via_starter_pack=None, labels=[], pinned_post=None, posts_count=325, viewer=ViewerState(blocked_by=False, blocking=None, blocking_by_list=N

## 2 Getting connected posts and other metadata

In [2]:
results = client.app.bsky.feed.search_posts({'q': 'unalive', 'limit': 30, 'sort': 'top'})
results_dict = results.model_dump()

In [3]:
results_dict['posts'][2]

{'author': {'did': 'did:plc:uzfyjf4spsvvsn4mcqwxq4hs',
  'handle': 'willraisesfunds.bsky.social',
  'associated': {'chat': {'allow_incoming': 'following',
    'py_type': 'app.bsky.actor.defs#profileAssociatedChat'},
   'feedgens': None,
   'labeler': None,
   'lists': None,
   'starter_packs': None,
   'py_type': 'app.bsky.actor.defs#profileAssociated'},
  'avatar': 'https://cdn.bsky.app/img/avatar/plain/did:plc:uzfyjf4spsvvsn4mcqwxq4hs/bafkreigwi2logmvwjcmx6qvjlt6a5psxko7a3ldykq5n44oezsx7sxykti@jpeg',
  'created_at': '2023-05-17T01:07:26.183Z',
  'display_name': 'will',
  'labels': [],
  'viewer': {'blocked_by': False,
   'blocking': None,
   'blocking_by_list': None,
   'followed_by': None,
   'following': None,
   'known_followers': None,
   'muted': False,
   'muted_by_list': None,
   'py_type': 'app.bsky.actor.defs#viewerState'},
  'py_type': 'app.bsky.actor.defs#profileViewBasic'},
 'cid': 'bafyreicyncb7lqt7a55vcnmbztzrumf6cui3kg6a3rlh7fzd5lmvfzacgq',
 'indexed_at': '2025-04-27T2

If a post is a reply to something, 'reply' under the 'record' is not None. 

### 2.1 Converting URIs to URLs

As explained in [this post](https://github.com/bluesky-social/atproto/discussions/2523)

In [4]:
import re
u = re.split('/', 'at://did:plc:awzzrtrcrvpnxi3ph2sbhxwv/app.bsky.feed.post/3lmt2lhdlwk2l')
u

['at:',
 '',
 'did:plc:awzzrtrcrvpnxi3ph2sbhxwv',
 'app.bsky.feed.post',
 '3lmt2lhdlwk2l']

In [5]:
'https://bsky.app/profile/'+u[2]+'/post/'+u[4]

'https://bsky.app/profile/did:plc:awzzrtrcrvpnxi3ph2sbhxwv/post/3lmt2lhdlwk2l'

In [6]:
def uri_to_url(uri):
    u = re.split('/', uri)
    url = 'https://bsky.app/profile/'+u[2]+'/post/'+u[4]
    return url

In [7]:
uri_to_url('at://did:plc:g7cu7736qmemcopvjip74g3b/app.bsky.feed.post/3lmt2wyd5q22a')

'https://bsky.app/profile/did:plc:g7cu7736qmemcopvjip74g3b/post/3lmt2wyd5q22a'

## 3 Getting Posts
I can only get 100 posts at a time unfortunately...
I'll just run it again on a different time frame

In [8]:
def search2df_top(query, since, until):
    results = client.app.bsky.feed.search_posts({'q': query, 'limit': 100, 'sort': 'top', 'since': since, 'until': until})
    results_dict = results.model_dump()
    query_data = []
    for post in results_dict['posts']:
        metadata = {}
        metadata['text'] = post['record']['text']
        metadata['author'] = post['author']['handle']
        metadata['display_name'] = post['author']['display_name']
        metadata['date'] = post['record']['created_at']
        metadata['likes'] = post['like_count']
        metadata['quotes'] = post['quote_count']
        metadata['replies'] = post['reply_count']
        metadata['reposts'] = post['repost_count']
        metadata['uri'] = post['uri']
        metadata['url'] = uri_to_url(post['uri'])

        if post['record']['reply'] is not None:
            metadata['reply_to'] = 'Yes'
            metadata['reply_to_uri'] = post['record']['reply']['parent']['uri']
            metadata['reply_to_url'] = uri_to_url(post['record']['reply']['parent']['uri'])
        else:
            metadata['reply_to'] = 'No'
            metadata['reply_to_uri'] = None
            metadata['reply_to_url'] = None
            
        metadata['query'] = query
        query_data.append(metadata)
    query_df = pd.DataFrame(query_data)
    return query_df

In [9]:
df = search2df_top('unalive', None, None)

In [10]:
pd.set_option('display.max_columns', 100) # not working for some reason :/
df.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,url,reply_to,reply_to_uri,reply_to_url,query
0,Just saw a tiktok that said Luigi could be fac...,bransonreese.bsky.social,Branson Reese,2025-04-27T18:07:39.513Z,223,1,9,7,at://did:plc:zat42a5ynbahtktax56auasx/app.bsky...,https://bsky.app/profile/did:plc:zat42a5ynbaht...,No,,,unalive
1,Plan to unalive a cheeto.,tsambuca.bsky.social,Tsambuca 🇨🇦🍁,2025-04-27T07:06:13.357Z,25,0,1,0,at://did:plc:kecwogf3ul6mt5yf4hm4upw4/app.bsky...,https://bsky.app/profile/did:plc:kecwogf3ul6mt...,Yes,at://did:plc:uqfjrpikgbri6xxcygyyugju/app.bsky...,https://bsky.app/profile/did:plc:uqfjrpikgbri6...,unalive
2,as much as I hate “unalive” (which is a lot) t...,willraisesfunds.bsky.social,will,2025-04-27T22:57:35.149Z,65,2,6,2,at://did:plc:uzfyjf4spsvvsn4mcqwxq4hs/app.bsky...,https://bsky.app/profile/did:plc:uzfyjf4spsvvs...,No,,,unalive
3,Alasdair Gold ran the London marathon whilst i...,mickeygif.bsky.social,Mickey 'Club Signing' Gif,2025-04-29T07:45:35.503Z,21,0,1,0,at://did:plc:qibxoac5ahyhhcka6fu7t3bl/app.bsky...,https://bsky.app/profile/did:plc:qibxoac5ahyhh...,No,,,unalive
4,🤬donald has shut down the unalive hotline \...,joleneandgrant.bsky.social,JOLENEandGRANT #elbowsup,2025-04-28T04:14:31.913Z,18,1,2,12,at://did:plc:6ulypwexav6e5ftfxdibwv77/app.bsky...,https://bsky.app/profile/did:plc:6ulypwexav6e5...,Yes,at://did:plc:7qqkq2zdwq4j5jingukgtuky/app.bsky...,https://bsky.app/profile/did:plc:7qqkq2zdwq4j5...,unalive


In [11]:
df[['reply_to', 'reply_to_url']][:10] # seems to work!

Unnamed: 0,reply_to,reply_to_url
0,No,
1,Yes,https://bsky.app/profile/did:plc:uqfjrpikgbri6...
2,No,
3,No,
4,Yes,https://bsky.app/profile/did:plc:7qqkq2zdwq4j5...
5,Yes,https://bsky.app/profile/did:plc:qxwx3yb4m3dmj...
6,No,
7,No,
8,Yes,https://bsky.app/profile/did:plc:rcchhh6t2do26...
9,No,


In [12]:
# finding the timeframe of this set
date = df[['date']]
date

Unnamed: 0,date
0,2025-04-27T18:07:39.513Z
1,2025-04-27T07:06:13.357Z
2,2025-04-27T22:57:35.149Z
3,2025-04-29T07:45:35.503Z
4,2025-04-28T04:14:31.913Z
...,...
95,2025-04-17T03:16:39.000Z
96,2025-04-16T18:24:14.009Z
97,2025-04-17T16:44:40.908Z
98,2025-04-15T02:50:06.393Z


In [13]:
date.sort_values(by=['date'])

Unnamed: 0,date
98,2025-04-15T02:50:06.393Z
96,2025-04-16T18:24:14.009Z
95,2025-04-17T03:16:39.000Z
88,2025-04-17T04:32:30.687Z
93,2025-04-17T14:49:07.836Z
...,...
22,2025-04-28T07:18:56.421Z
16,2025-04-28T23:38:58.264Z
45,2025-04-29T05:57:00.238Z
3,2025-04-29T07:45:35.503Z


In [14]:
df2 = search2df_top('unalive', None ,'2025-03-31T00:00:00.000Z')
df2.head()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,url,reply_to,reply_to_uri,reply_to_url,query
0,"new ""unalive"" euphemism just dropped",joadsprocket.bsky.social,Joad The Wet Sprocket,2025-03-30T23:50:12.319Z,28,0,0,0,at://did:plc:3nqgwlbyx4i6j553jilcb2lo/app.bsky...,https://bsky.app/profile/did:plc:3nqgwlbyx4i6j...,No,,,unalive
1,Molly has every right to unalive Kristina by now.,leslieb68.bsky.social,Leslie,2025-03-30T15:48:54.259Z,3,0,1,0,at://did:plc:tnip6m3flxs2mrlt5bqlc5kd/app.bsky...,https://bsky.app/profile/did:plc:tnip6m3flxs2m...,Yes,at://did:plc:itooo5oj5hr255oyvszbboh7/app.bsky...,https://bsky.app/profile/did:plc:itooo5oj5hr25...,unalive
2,"can we extend this to ""unalive"" also?",yodoops.bsky.social,doops.,2025-03-30T16:29:52.359Z,4,0,1,0,at://did:plc:heyj4lbfp3znjle5kjoxa3xv/app.bsky...,https://bsky.app/profile/did:plc:heyj4lbfp3znj...,Yes,at://did:plc:h3y3f4pmwha4pqzekjpbjg4s/app.bsky...,https://bsky.app/profile/did:plc:h3y3f4pmwha4p...,unalive
3,Because in Dictatorships it’s a waste of money...,debsmith1647.bsky.social,,2025-03-30T03:14:18.635Z,29,0,1,1,at://did:plc:stjjhxmg7igobpbyyiy6kt65/app.bsky...,https://bsky.app/profile/did:plc:stjjhxmg7igob...,Yes,at://did:plc:y5xyloyy7s4a2bwfeimj7r3b/app.bsky...,https://bsky.app/profile/did:plc:y5xyloyy7s4a2...,unalive
4,"Every cold season, i have a deep fear of never...",potat0princess.has.army,✨ potato ✨,2025-03-30T16:37:32.374Z,2,0,1,0,at://did:plc:xygwtbnfmwfztjxs3ylvik4w/app.bsky...,https://bsky.app/profile/did:plc:xygwtbnfmwfzt...,Yes,at://did:plc:xygwtbnfmwfztjxs3ylvik4w/app.bsky...,https://bsky.app/profile/did:plc:xygwtbnfmwfzt...,unalive


In [15]:
date2 = df2[['date']]
date2.sort_values(by=['date']) # discrete time sequences!

Unnamed: 0,date
98,2025-03-16T15:28:42.737Z
99,2025-03-16T16:44:12.457Z
97,2025-03-17T01:17:05.026Z
92,2025-03-17T05:22:54.143Z
83,2025-03-17T09:27:02.579Z
...,...
8,2025-03-30T13:50:11.523Z
1,2025-03-30T15:48:54.259Z
2,2025-03-30T16:29:52.359Z
4,2025-03-30T16:37:32.374Z


In [16]:
unalive_top_df = pd.concat([df, df2])

In [17]:
unalive_top_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 200 entries, 0 to 99
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   text          200 non-null    object
 1   author        200 non-null    object
 2   display_name  200 non-null    object
 3   date          200 non-null    object
 4   likes         200 non-null    int64 
 5   quotes        200 non-null    int64 
 6   replies       200 non-null    int64 
 7   reposts       200 non-null    int64 
 8   uri           200 non-null    object
 9   url           200 non-null    object
 10  reply_to      200 non-null    object
 11  reply_to_uri  92 non-null     object
 12  reply_to_url  92 non-null     object
 13  query         200 non-null    object
dtypes: int64(4), object(10)
memory usage: 23.4+ KB


In [18]:
unalive_top_df.tail()

Unnamed: 0,text,author,display_name,date,likes,quotes,replies,reposts,uri,url,reply_to,reply_to_uri,reply_to_url,query
95,Biodome2 failed. Looks like we should be tryin...,dawnwilliamson.bsky.social,,2025-03-17T10:29:55.416Z,6,0,0,0,at://did:plc:7aqvrita4kqxrcjwgw45opip/app.bsky...,https://bsky.app/profile/did:plc:7aqvrita4kqxr...,Yes,at://did:plc:ux34natbhxube3xgdu3rhf45/app.bsky...,https://bsky.app/profile/did:plc:ux34natbhxube...,unalive
96,THEY'RE UNALIVING HER...\n\nTHEN THEY'RE GONNA...,beanycatte.bsky.social,I THINK THE DEATH PLAN SUCKS,2025-03-17T09:37:11.508Z,6,0,2,1,at://did:plc:3u7o6h2rx7elcbipmuwmvmrz/app.bsky...,https://bsky.app/profile/did:plc:3u7o6h2rx7elc...,No,,,unalive
97,"If Labour cuts £675 a month ,I'll save the the...",emmadm101.bsky.social,🇺🇦emmadm101 🇵🇸,2025-03-17T01:17:05.026Z,21,0,0,5,at://did:plc:ihdaraxh2s4vwlmw2yign5au/app.bsky...,https://bsky.app/profile/did:plc:ihdaraxh2s4vw...,Yes,at://did:plc:vovinwhtulbsx4mwfw26r5ni/app.bsky...,https://bsky.app/profile/did:plc:vovinwhtulbsx...,unalive
98,PSA: starting tonight I'll be on a two-week so...,amiberger.bsky.social,Ami Berger,2025-03-16T15:28:42.737Z,153,0,9,0,at://did:plc:fhuifhwlgpfh233on5jrmxrl/app.bsky...,https://bsky.app/profile/did:plc:fhuifhwlgpfh2...,No,,,unalive
99,i think only trans people should be allowed to...,livinginjeopardy.bsky.social,transsexual anarchy cringe CEO 🔆🥦,2025-03-16T16:44:12.457Z,7,0,0,1,at://did:plc:bucg2dsfr66ecm7kc3bmepwv/app.bsky...,https://bsky.app/profile/did:plc:bucg2dsfr66ec...,No,,,unalive


In [19]:
#unalive_top_df.to_csv('unalive_top_posts.csv', index=False)