# Web Data Scraping

[Spring 2023 ITSS Mini-Course](https://www.colorado.edu/cartss/programs/interdisciplinary-training-social-sciences-itss/mini-course-web-data-scraping) — ARSC 5040  
[Brian C. Keegan, Ph.D.](http://brianckeegan.com/)  
[Assistant Professor, Department of Information Science](https://www.colorado.edu/cmci/people/information-science/brian-c-keegan)  
University of Colorado Boulder  

Copyright and distributed under an [MIT License](https://opensource.org/licenses/MIT)

## Class outline

* **Week 1**: Introduction to Jupyter, browser console, structured data, ethical considerations
* **Week 2**: Scraping HTML with `requests` and `BeautifulSoup`
* **Week 3**: Scraping web data with Selenium
* **Week 4**: Scraping the Internet Archive and Wikipedia APIs
* **Week 5**: Scraping the Reddit and Mastodon APIs

## Acknowledgements

This course will draw on resources built by myself and [Allison Morgan](https://allisonmorgan.github.io/) for the [2018 Summer Institute for Computational Social Science](https://github.com/allisonmorgan/sicss_boulder), which were in turn derived from [other resources](https://github.com/simonmunzert/web-scraping-with-r-extended-edition) developed by [Simon Munzert](http://simonmunzert.github.io/) and [Chris Bail](http://www.chrisbail.net/). 

Thank you also to Professor [Terra KcKinnish](https://www.colorado.edu/economics/people/faculty/terra-mckinnish) for coordinating the ITSS seminars.

This notebook is adapted from excellent notebooks in Dr. [Cody Buntain](http://cody.bunta.in/)'s seminar on [Social Media and Crisis Informatics](http://cody.bunta.in/teaching/2018_winter_umd_inst728e/) as well as the [PRAW documentation](https://praw.readthedocs.io/en/latest/).

## Class 5 goals

* Sharing accomplishments and challenges with last week's material
* Authenticating with a closed API 
* Retrieving data from the Reddit API using [PRAW](https://praw.readthedocs.io/en/stable/) and [PSAW](https://github.com/dmarx/psaw)
* Retrieving data from the Mastodon API using [`mastodon.py`](https://mastodonpy.readthedocs.io/en/stable/)

We'll need a few common libraries for all these examples.

In [1]:
# Lets us talk to other servers on the web
import requests

# APIs spit out data in JSON
import json

# Use BeautifulSoup to parse some HTML
from bs4 import BeautifulSoup

# Handling dates and times
from datetime import datetime

# DataFrames!
import pandas as pd
import numpy as np

# Data visualization
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sb

## Scraping Reddit

Reddit also hosts a lot of detailed behavioral data that could be of interest to social scientists. As was the case with Wikipedia, our naïve inclination may be to develop scrapers and parsers to extract this information, but Reddit will give much of it to you for free through their API!

You can retrieve a few different types of entities from Reddit's API: sub-reddits, submissions, comments, and redditors. Many of these are interoperable: a sub-reddit contains submissions contributed by redditors with comments from other redditors.

We will use a wrapper library to communicate with the Reddit API called [Python Reddit API Wrapper](https://praw.readthedocs.io/en/latest/) or `praw`. 

Copy the code below to your terminal to install `praw`.

Afterwards, we can import `praw`.

In [None]:
import praw

We then need to authenticate with Reddit to get access to the API. Typically you can just enter the client ID, client secret, password, username, *etc*. as strings. 

1. You will need to create an account on Reddit. After you have created an account and logged in, go to https://www.reddit.com/prefs/apps/. 
2. Scroll down and click the "create app" button at the bottom. Provide a basic name, description, and enter a URL for your homepage (or just use http://www.colorado.edu).
3. You will need the client ID (the string of characters beneath the name of your app) as well as the secret (the other string of characters) as well as your username and password.
4. You can make up a user-agent string, but include your username as good practice for the sysadmins to track you down if you break things.

![Image from Cody Buntain](http://www.cs.umd.edu/~cbuntain/inst728e/reddit_screens/1-003a.png)

You'll create an API connector object (`r`) below that will authenticate with the API and handle making the requests.

In [None]:
r = praw.Reddit(
    client_id='your application id',
    client_secret='your application secret',
    password='your account password',
    user_agent='scraping script by /u/youraccountname',
    username='your account name'
)

You can confirm that this authentication process worked by making a simple request like printing your username.

In [None]:
print(r.user.me())

I'm going to read them in from a local file ("login.json") so that I post this notebook on the internet in the future without compromising my account security. This won't work for you, so just skip this step.

In [None]:
# Load my credentials from a local disk so I don't show the world
with open('reddit_login.json','r') as f:
    r_creds = json.load(f)
    
# Create an authenticated reddit instance using the creds
r = praw.Reddit(client_id = r_creds['client_id'],
                client_secret = r_creds['client_secret'],
                password = r_creds['password'],
                user_agent = r_creds['user_agent'],
                username = r_creds['username'])

# Make sure your reddit instance works
print(r.user.me())

### Sub-reddits
Now print the top 25 stories in /r/news.

[Documentation for the Subreddit model in PRAW](https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html).

Create a `news_subreddit` object to store the various attributes about this sub-reddit.

In [None]:
news_subreddit = r.subreddit('news')

The `news_subreddit` has a number of attributes and methods you can call on it. The time the sub-reddit was founded.

In [None]:
news_subreddit.created_utc

That's formatted in a UNIX timecode (seconds since 1 January 1970), but we can convert it into a more readable timestamp with `datetime`'s `utcfromtimestamp`.

In [None]:
print(datetime.utcfromtimestamp(news_subreddit.created_utc))

There are other attributes such as the number of subscribers, current active users, as well as the description of the sub-reddit.

In [None]:
'{0:,}'.format(news_subreddit.subscribers)

In [None]:
news_subreddit.over18

In [None]:
news_subreddit.active_user_count

In [None]:
print(news_subreddit.description)

The rules of the sub-reddit are available as a method `.rules()` which returns a list of dictionaries of rule objects.

In [None]:
news_subreddit.rules()['rules']

When were each of these rules created? Loop through each of the rules and print the "short_name" of the rule and the rule timestamp.

In [None]:
for rule in news_subreddit.rules()['rules']:
    created = rule['created_utc']
    print(rule['short_name'], datetime.utcfromtimestamp(created))

We can also get a list of the moderators for this subreddit.

In [None]:
mod_list = []

for mod in news_subreddit.moderator():
    mod_list.append(mod.name)
    
mod_list

### Submissions

We can get a list of submissions to a sub-reddit using [a few different methods](https://praw.readthedocs.io/en/latest/code_overview/models/subreddit.html).

* `.controversial()`
* `.hot()`
* `.new()`
* `.rising()`
* `.search()`
* `.top()`

Here we will use the `.top()` method to get the top 25 submissions on the /r/news subreddit from the past 12 months.

[Documentation for the Submission model in PRAW](https://praw.readthedocs.io/en/latest/code_overview/models/submission.html).

In [None]:
top25_news = r.subreddit('news').top('year',limit=25)

`top25_news` is a `ListingGenerator` object, which is a special [generator](https://www.dataquest.io/blog/python-generators-tutorial/) class defined by PRAW. It does not actually go out and get the data at this stage. There's not much you can do to look inside this `ListingGenerator` other than loop through and perform operations. In this case, lets add each submission to a list of `top25_submissions`.

In [None]:
top25_submissions = []

for submission in r.subreddit('news').top('year',limit=25):
    top25_submissions.append(submission)

We can inspect the first (top) `Submission` object.

In [None]:
first_submission = top25_submissions[0]
first_submission

Use the `dir` function to see the other methods and attributes inside this first top `Submission` object. (There are a lot of other "hidden" attributes and methods that use the "\_" which we can ignore with this list comprehension.)

In [None]:
[i for i in dir(first_submission) if '_' not in i]

`vars` may be even more helpful.

In [None]:
vars(first_submission)

We can extract the features of each submission, store them in a dictionary, and save to an external list. This step will take a while (approximately one second per submission) because we make an API call for each submission in the `ListingGenerator` returned by the `r.subreddit('news').top('year',limit=25)` we're looping through.

In [None]:
submission_stats = []

for submission in r.subreddit('news').top('year',limit=25):
    d = {}
    d['id'] = submission.id
    d['title'] = submission.title
    d['num_comments'] = submission.num_comments
    d['score'] = submission.score
    d['upvote_ratio'] = submission.upvote_ratio
    d['date'] = datetime.utcfromtimestamp(submission.created_utc)
    d['domain'] = submission.domain
    d['gilded'] = submission.gilded
    d['num_crossposts'] = submission.num_crossposts
    d['nsfw'] = submission.over_18
    if submission.author is not None:
        d['author'] = submission.author.name
    submission_stats.append(d)

We can turn `submission_stats` into a pandas DataFrame.

In [None]:
top25_df = pd.DataFrame(submission_stats)
top25_df.head()

Plot out the relationship between score and number of comments.

In [None]:
ax = top25_df.plot.scatter(x='score',y='num_comments',s=50,c='k',alpha=.5)
# ax.set_xlim((0,200000))
# ax.set_ylim((0,16000))

### Comments

This is a simple Reddit submission: [What is a dataset that you can't believe is available to the public?](https://www.reddit.com/r/datasets/comments/akb4mr/what_is_a_dataset_that_you_cant_believe_is/). We can inspect the comments in this simple submission.

[Documentation for Comment model in PRAW](https://praw.readthedocs.io/en/latest/code_overview/models/comment.html).

In [None]:
cant_believe = r.submission(id='akb4mr')

print("This submission was made on {0}.".format(datetime.utcfromtimestamp(cant_believe.created_utc)))
print("There are {0:,} comments.".format(cant_believe.num_comments))

We can inspect these comments, working from the [Comment Extraction and Parsing](https://praw.readthedocs.io/en/latest/tutorials/comments.html) tutorial in PRAW.

In [None]:
cant_believe.comments.replace_more(limit=None)

for comment in cant_believe.comments.list():
    print(comment.body)

Each comment has a lot of metadata we can preserve.

In [None]:
cant_believe_comment_metadata = []

for comment in cant_believe.comments.list():
    if not comment.collapsed: # Skip collapsed/deleted comments
        d = {}
        d['id'] = comment.id
        d['parent_id'] = comment.parent_id
        d['body'] = comment.body
        d['depth'] = comment.depth
        d['edited'] = comment.edited
        d['score'] = comment.score
        d['date'] = datetime.utcfromtimestamp(comment.created_utc)
        d['submission_id'] = comment.submission.id
        d['submission_title'] = comment.submission.title
        d['subreddit'] = comment.subreddit.display_name
        if comment.author is not None:
            d['author'] = comment.author.name
        cant_believe_comment_metadata.append(d)

Convert to a DataFrame.

In [None]:
cant_believe_df = pd.DataFrame(cant_believe_comment_metadata)

# How long is the comment
cant_believe_df['comment_length'] = cant_believe_df['body'].str.len()

cant_believe_df.head()

Do comments deeper in this comment tree have lower scores?

In [None]:
sb.catplot(x='depth',y='score',data=cant_believe_df,kind='bar',color='lightblue')

Do comments deeper in this comment tree have shorter lengths?

In [None]:
sb.catplot(x='depth',y='comment_length',data=cant_believe_df,kind='bar',color='lightblue')

### Redditors

A Redditor is a user and we can get meta-data about the account as well as the history of the user's comments and submissions from the API.

[Documentation for the Redditor model in PRAW](https://praw.readthedocs.io/en/latest/code_overview/models/redditor.html).

How much link and comment karma does this user have?

In [None]:
spez = r.redditor('spez')
print("Link karma: {0:,}".format(spez.link_karma))
print("Comment karma: {0:,}".format(spez.comment_karma))

Interestingly, Reddit flags the users who are employees of Reddit as well as if accounts have verified email addresses.

In [None]:
spez.is_employee

In [None]:
spez.has_verified_email

We can also get the time this user's account was created.

In [None]:
datetime.utcfromtimestamp(spez.created_utc)

We can also get information about individual redditors' submissions and comment histories. Here we will use u/spez (the CEO of Reddit), get his top-voted submissions, and loop through them to get the data for each submission.

In [None]:
spez_submissions = []

for submission in r.redditor('spez').submissions.top('all',limit=25):
    d = {}
    d['id'] = submission.id
    d['title'] = submission.title
    d['num_comments'] = submission.num_comments
    d['score'] = submission.score
    d['upvote_ratio'] = submission.upvote_ratio
    d['date'] = datetime.utcfromtimestamp(submission.created_utc)
    d['domain'] = submission.domain
    d['gilded'] = submission.gilded
    d['num_crossposts'] = submission.num_crossposts
    d['nsfw'] = submission.over_18
    if comment.author is not None:
        d['author'] = submission.author.name
    spez_submissions.append(d)

Again we can turn this list of dictionaries into a DataFrame to do substantive data analysis.

In [None]:
pd.DataFrame(spez_submissions).head()

We can also get all the comments made by an editor.

In [None]:
spez_comments = []

for comment in r.redditor('spez').comments.top('all',limit=25):
    d = {}
    d['id'] = comment.id
    d['body'] = comment.body
    try:
        d['depth'] = comment.depth
    except:
        d['depth'] = np.nan
    d['edited'] = comment.edited
    d['score'] = comment.score
    d['date'] = datetime.utcfromtimestamp(comment.created_utc)
    d['submission_id'] = comment.submission.id
    d['submission_title'] = comment.submission.title
    d['subreddit'] = comment.subreddit.display_name
    if comment.author is not None:
        d['author'] = comment.author.name
    spez_comments.append(d)

In [None]:
pd.DataFrame(spez_comments).head()

This user's top comments are mostly focused in the /r/announcements subreddit.

In [None]:
pd.DataFrame(spez_comments)['subreddit'].value_counts()

### Archived data via PushShift

PushShift is a researcher-maintained archive of Reddit posts and comments. Full data dumps of [submissions](https://files.pushshift.io/reddit/submissions/) and [comments](https://files.pushshift.io/reddit/comments/) are available, although these are (unsurprisingly) very space intensive. You can also access an API to make [ElasticSearch](https://www.elastic.co/elasticsearch/) queries against a database of this archive of submissions and comments. Unfortunately, the service is frequently down. 

We will use the [`pmaw`](https://github.com/mattpodolak/pmaw) library to access this data endpoint using Python. Install `pmaw` once:

In [4]:
! pip install pmaw

Collecting pmaw
  Downloading pmaw-3.0.0-py3-none-any.whl (29 kB)
Installing collected packages: pmaw
Successfully installed pmaw-3.0.0


Load up the `PushshiftAPI` class from `psaw`.

In [5]:
from pmaw import PushshiftAPI

api = PushshiftAPI()

Retrieve the submission history for a subreddit.

In [19]:
submissions = api.search_submissions(subreddit="wallstreetbets",limit=10000,score=">1000")

submissions_list = [p for p in submissions]

len(submissions_list)

Not all PushShift shards are active. Query results may be incomplete.


0

## Scraping Mastodon

After Twitter's acquisition by Elon Musk in October 2022, the service rapidly deteriorated through neglect and mismanagement. Given the increasingly precarity of Twitter's API access, I do not recommend that researchers build projects around Twitter data access and availability. 

I am consciously moving this course towards alternatives like [Mastodon](https://joinmastodon.org/). Mastodon and its API have similar but not identical affordances as Twitter as both a user and developer. **Importantly**, Mastodon users are much more resistant to having their data being collected, even by researchers. Users can [opt-out of being indexed](https://docs.joinmastodon.org/user/preferences/#misc) by search engines and the "noindex" meta-data value included in their user information.

Like most semi-public APIs, you will need to register an account first. I would recommend the following academic-focused instances, and there are [many others](https://github.com/nathanlesage/academics-on-mastodon):
* [sciences.social](https://sciences.social/) - General social sciences
* [hcommons.social](https://hcommons.social/) - Digital humanities
* [mstdn.science](https://mstdn.science/) - Microbiology initially
* [datasci.social](https://datasci.social/) - Data science
* [historians.social](https://historians.social/) - Historians
* [hci.social](https://hci.social/) - Human-computer interaction
* [sigmoid.social](https://sigmoid.social/) - AI

We will use the [Mastodon.py](https://mastodonpy.readthedocs.io/) library. Install from pip once:

In [20]:
! pip install Mastodon.py

Collecting Mastodon.py
  Downloading Mastodon.py-1.8.0-py2.py3-none-any.whl (64 kB)
     ---------------------------------------- 0.0/64.5 kB ? eta -:--:--
     ------ --------------------------------- 10.2/64.5 kB ? eta -:--:--
     ------------------------------------ - 61.4/64.5 kB 825.8 kB/s eta 0:00:01
     -------------------------------------- 64.5/64.5 kB 695.8 kB/s eta 0:00:00
Collecting python-magic
  Downloading python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting blurhash>=1.1.4
  Downloading blurhash-1.1.4-py2.py3-none-any.whl (5.3 kB)
Installing collected packages: blurhash, python-magic, Mastodon.py
Successfully installed Mastodon.py-1.8.0 blurhash-1.1.4 python-magic-0.4.27


Import Mastodon.py

In [21]:
from mastodon import Mastodon

Create an application using your account.

In [29]:
Mastodon.create_app(
    client_name = 'itss',
    scopes = ['read'],
    api_base_url = 'https://hci.social',
    to_file = 'itss_credentials.secret'
)

('TbRairSxsP56ThK6xmGDhGyGMIHMAw0e7KT7VhcNZNQ',
 'sEOxSS2BcNErdNCXKMg06THW81fYnFHAPOpVSymHcRk')

Load up *my* credentials for logging into my account.

In [30]:
with open('login.json','r') as f:
    credentials = json.load(f)

Log in.

In [32]:
masto = Mastodon(client_id = 'itss_credentials.secret')
masto.log_in(
    username = credentials['email'],
    password = credentials['password'],
    scopes = ['read'],
    to_file = 'itss_user_credentials.secret'
)

'oi79cEZzbfJYK_6Vqky6jz2ciBrg-JUbIgi9f_bQ3u4'

Verify that your app is working.

In [33]:
masto.app_verify_credentials()

{'name': 'itss',
 'website': None,
 'vapid_key': 'BLBt86rTuiyg19iKV80dvsZ55WxX7Z-k91TZfeGBimVy-_CdEPi2AFz8lhwNXCrbZCpGmhF2rXieL2-xe-NQDXs='}

Get information about my account.

In [34]:
masto.me()

{'id': 108280556926327188,
 'username': 'bkeegan',
 'acct': 'bkeegan',
 'display_name': 'Brian C. Keegan',
 'locked': False,
 'bot': False,
 'discoverable': True,
 'group': False,
 'created_at': datetime.datetime(2022, 5, 11, 0, 0, tzinfo=tzutc()),
 'note': '<p>{Social, Data, Network, Information} Scientist. @CUInfoScience. High-tempo collaboration, online commons, cannabis informatics. @ucwcolorado. Born at 345ppm.</p>',
 'url': 'https://hci.social/@bkeegan',
 'avatar': 'https://storage.googleapis.com/hci-social-storage/accounts/avatars/108/280/556/926/327/188/original/e41167600515b0ae.png',
 'avatar_static': 'https://storage.googleapis.com/hci-social-storage/accounts/avatars/108/280/556/926/327/188/original/e41167600515b0ae.png',
 'header': 'https://storage.googleapis.com/hci-social-storage/accounts/headers/108/280/556/926/327/188/original/24ced44cadb7b394.png',
 'header_static': 'https://storage.googleapis.com/hci-social-storage/accounts/headers/108/280/556/926/327/188/original/24ce

Search for another account. [Eugen Rochko](https://mastodon.social/@Gargron) (@Gargron) is the lead developer of Mastodon.

In [35]:
masto.account_search("Gargron@mastodon.social")

[{'id': 108290692843086639,
  'username': 'Gargron',
  'acct': 'Gargron@mastodon.social',
  'display_name': 'Eugen Rochko',
  'locked': False,
  'bot': False,
  'discoverable': True,
  'group': False,
  'created_at': datetime.datetime(2016, 3, 16, 0, 0, tzinfo=tzutc()),
  'note': '<p>Founder, CEO and lead developer <span class="h-card"><a href="https://mastodon.social/@Mastodon" class="u-url mention" rel="nofollow noopener noreferrer" target="_blank">@<span>Mastodon</span></a></span>, Germany.</p>',
  'url': 'https://mastodon.social/@Gargron',
  'avatar': 'https://storage.googleapis.com/hci-social-storage/cache/accounts/avatars/108/290/692/843/086/639/original/50cc6dd87123c2a1.jpg',
  'avatar_static': 'https://storage.googleapis.com/hci-social-storage/cache/accounts/avatars/108/290/692/843/086/639/original/50cc6dd87123c2a1.jpg',
  'header': 'https://storage.googleapis.com/hci-social-storage/cache/accounts/headers/108/290/692/843/086/639/original/b3821e505a12f479.jpeg',
  'header_static

Get some recent statuses.

In [42]:
bkeegan_statuses = masto.account_statuses(108280556926327188,limit=500)
bkeegan_statuses[0]

{'id': 109898984328297750,
 'created_at': datetime.datetime(2023, 2, 20, 20, 37, 0, 54000, tzinfo=tzutc()),
 'in_reply_to_id': None,
 'in_reply_to_account_id': None,
 'sensitive': False,
 'spoiler_text': '',
 'visibility': 'public',
 'language': None,
 'uri': 'https://hci.social/users/bkeegan/statuses/109898984328297750/activity',
 'url': 'https://hci.social/users/bkeegan/statuses/109898984328297750/activity',
 'replies_count': 0,
 'reblogs_count': 0,
 'favourites_count': 0,
 'edited_at': None,
 'favourited': False,
 'reblogged': False,
 'muted': False,
 'bookmarked': False,
 'content': '',
 'filtered': [],
 'reblog': {'id': 109898952295262798,
  'created_at': datetime.datetime(2023, 2, 20, 20, 28, 47, tzinfo=tzutc()),
  'in_reply_to_id': None,
  'in_reply_to_account_id': None,
  'sensitive': False,
  'spoiler_text': '',
  'visibility': 'public',
  'language': 'en',
  'uri': 'https://mastodon.social/users/atomicpoet/statuses/109898952063347105',
  'url': 'https://mastodon.social/@atomi

Get who I follow.

In [43]:
bkeegan_following = masto.account_following(108280556926327188)
bkeegan_following[0]

{'id': 109252321582544851,
 'username': 'ryanstraight',
 'acct': 'ryanstraight',
 'display_name': 'Dr. Ryan Straight',
 'locked': False,
 'bot': False,
 'discoverable': True,
 'group': False,
 'created_at': datetime.datetime(2022, 10, 29, 0, 0, tzinfo=tzutc()),
 'note': '<p>Assoc/Honors Applied Computing/Cyber Operations Prof @uarizona | Director: <span class="h-card"><a href="https://hci.social/@mavrxlab" class="u-url mention">@<span>mavrxlab</span></a></span>  | toots mine alone 🤭, Boost ≠ endorsement | <a href="https://hci.social/tags/highered" class="mention hashtag" rel="tag">#<span>highered</span></a> <a href="https://hci.social/tags/hci" class="mention hashtag" rel="tag">#<span>hci</span></a> <a href="https://hci.social/tags/xr" class="mention hashtag" rel="tag">#<span>xr</span></a> <a href="https://hci.social/tags/vr" class="mention hashtag" rel="tag">#<span>vr</span></a> <a href="https://hci.social/tags/ar" class="mention hashtag" rel="tag">#<span>ar</span></a> <a href="https:

Get who follows me.

In [44]:
bkeegan_followers = masto.account_followers(108280556926327188)
bkeegan_followers[0]

{'id': 109444142731904621,
 'username': 'chitalyconf',
 'acct': 'chitalyconf',
 'display_name': 'CHItaly 2023 Turin',
 'locked': False,
 'bot': False,
 'discoverable': True,
 'group': False,
 'created_at': datetime.datetime(2022, 12, 2, 0, 0, tzinfo=tzutc()),
 'note': '<p>CHItaly2023 is the 15th edition of Biannual Conference of the Italian SIGCHI Chapter, hosted in Turin.<br />The conference theme is “Crossing HCI and AI.”</p>',
 'url': 'https://hci.social/@chitalyconf',
 'avatar': 'https://storage.googleapis.com/hci-social-storage/accounts/avatars/109/444/142/731/904/621/original/fba1f4565a8e80aa.png',
 'avatar_static': 'https://storage.googleapis.com/hci-social-storage/accounts/avatars/109/444/142/731/904/621/original/fba1f4565a8e80aa.png',
 'header': 'https://storage.googleapis.com/hci-social-storage/accounts/headers/109/444/142/731/904/621/original/66e920544988d237.png',
 'header_static': 'https://storage.googleapis.com/hci-social-storage/accounts/headers/109/444/142/731/904/621/o

Get my local timeline.

In [46]:
local_timeline = masto.timeline_local()
local_timeline[0]

{'id': 109899267268038294,
 'created_at': datetime.datetime(2023, 2, 20, 21, 48, 57, 369000, tzinfo=tzutc()),
 'in_reply_to_id': None,
 'in_reply_to_account_id': None,
 'sensitive': False,
 'spoiler_text': '',
 'visibility': 'public',
 'language': 'en',
 'uri': 'https://hci.social/users/cfiesler/statuses/109899267268038294',
 'url': 'https://hci.social/@cfiesler/109899267268038294',
 'replies_count': 1,
 'reblogs_count': 4,
 'favourites_count': 7,
 'edited_at': None,
 'favourited': False,
 'reblogged': False,
 'muted': False,
 'bookmarked': False,
 'content': '<p>&quot;5 burning questions about ChatGPT, answered by humans&quot; from my university. One of those humans is me, I&#39;m a human! </p><p><a href="https://www.colorado.edu/today/2023/02/20/5-burning-questions-about-chatgpt-answered-humans" target="_blank" rel="nofollow noopener noreferrer"><span class="invisible">https://www.</span><span class="ellipsis">colorado.edu/today/2023/02/20/</span><span class="invisible">5-burning-que

Search for toots with a given hashtag.

In [47]:
hashtag_toots = masto.timeline_hashtag('theorizingthefediverse')
hashtag_toots

[{'id': 109872751291748175,
  'created_at': datetime.datetime(2023, 2, 16, 5, 25, 35, 686000, tzinfo=tzutc()),
  'in_reply_to_id': None,
  'in_reply_to_account_id': None,
  'sensitive': False,
  'spoiler_text': '',
  'visibility': 'public',
  'language': 'en',
  'uri': 'https://hci.social/users/bkeegan/statuses/109872751291748175',
  'url': 'https://hci.social/@bkeegan/109872751291748175',
  'replies_count': 6,
  'reblogs_count': 8,
  'favourites_count': 4,
  'edited_at': datetime.datetime(2023, 2, 16, 5, 58, 31, 120000, tzinfo=tzutc()),
  'favourited': False,
  'reblogged': True,
  'muted': False,
  'bookmarked': False,
  'pinned': False,
  'content': '<p>What is the biggest challenge the fediverse will face by the end of 2023? <a href="https://hci.social/tags/TheorizingTheFediverse" class="mention hashtag" rel="tag">#<span>TheorizingTheFediverse</span></a></p>',
  'filtered': [],
  'reblog': None,
  'application': {'name': 'Ivory for iOS', 'website': 'https://tapbots.com/'},
  'accou