# NB01 – Data Gathering
**Project Question:** Example: *How does post engagement (score, comments) vary across r/technology, r/Futurology, and r/science?*

**AI Prompt 1:**  
> “Suggest a concise way to phrase my research question about Reddit engagement differences between subreddits.”  
**AI Response:** “...”  
**Refinement:** “...”  
**Commit:** “Added project question after AI‑aided refinement.”


🔐 2. Install Dependencies & Handle Secrets keys

In [1]:
from google.colab import userdata
import os
!pip install praw --quiet
import praw

os.environ["REDDIT_CLIENT_ID"] = userdata.get("REDDIT_CLIENT_ID")
os.environ["REDDIT_CLIENT_SECRET"] = userdata.get("REDDIT_CLIENT_SECRET")
os.environ["REDDIT_USERNAME"] = userdata.get("REDDIT_USERNAME")
os.environ["REDDIT_PASSWORD"] = userdata.get("REDDIT_PASSWORD")

# Now use os.getenv in your PRAW setup


*AI Prompt 2*

 “Write code to set Reddit credentials from Colab Secrets into environment variables”
[Include AI response here and any adjustments]

In [3]:
reddit = praw.Reddit(
    client_id=os.getenv("REDDIT_CLIENT_ID"),
    client_secret=os.getenv("REDDIT_CLIENT_SECRET"),
    user_agent="DS105 Project by /u/" + os.getenv("REDDIT_USERNAME"),
    username=os.getenv("REDDIT_USERNAME"),
    password=os.getenv("REDDIT_PASSWORD")
)

print("Read-only?", reddit.read_only)
# Retrieve 5 hot posts from r/python
for post in reddit.subreddit("python").hot(limit=5):
    print(post.title, post.score)


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/189.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m189.3/189.3 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25h

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Read-only? False
Sunday Daily Thread: What's everyone working on this week? 9
Saturday Daily Thread: Resource Request and Sharing! Daily Thread 3
Premier: Instantly Turn Your ASGI App into an API Gateway 8
Building an ERP: ready-made platforms vs custom development 2
New in coding world. Need recommendations of tutorials for python in finance. 2


⚙️ 3. Authenticate PRAW

In [4]:
import praw

reddit = praw.Reddit(
    client_id=os.getenv("REDDIT_CLIENT_ID"),
    client_secret=os.getenv("REDDIT_CLIENT_SECRET"),
    user_agent="ds105:reddit.engagement.analysis:v1 (by /u/" + os.getenv("REDDIT_USERNAME") + ")",
    username=os.getenv("REDDIT_USERNAME"),
    password=os.getenv("REDDIT_PASSWORD"),
)

print("Authenticated:", not reddit.read_only)


Authenticated: True


# 📦 4. Define Subreddits & Fetch Posts

In [5]:
# AI Prompt 3
# “Generate Python code to fetch 100 hot posts including metadata for a list of subreddits using PRAW.”
# [Include AI response and your refinements]

subreddits = ["technology", "Futurology", "science"]
posts = []

for sr in subreddits:
    for post in reddit.subreddit(sr).hot(limit=100):
        posts.append({
            "subreddit": sr,
            "post_id": post.id,
            "title": post.title,
            "author": str(post.author),
            "created_utc": post.created_utc,
            "score": post.score,
            "upvote_ratio": post.upvote_ratio,
            "num_comments": post.num_comments
        })

import pandas as pd
df_posts = pd.DataFrame(posts)
df_posts.head()
print("Total posts:", len(df_posts))


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Total posts: 300


# 🗃️ 5. Fetch Comments

In [6]:
# AI Prompt 4
# “Generate pseudocode to fetch up to 20 top-level comments per post and normalize into a list.”

comments = []
for _, row in df_posts.iterrows():
    submission = reddit.submission(id=row.post_id)
    submission.comment_sort = 'top'
    submission.comments.replace_more(limit=0)
    for c in submission.comments[:20]:
        comments.append({
            "comment_id": c.id,
            "post_id": row.post_id,
            "author": str(c.author),
            "created_utc": c.created_utc,
            "score": c.score,
            "body": c.body
        })
df_comments = pd.DataFrame(comments)
print("Total comments:", len(df_comments))
df_comments.head()


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/l

Total comments: 3279


Unnamed: 0,comment_id,post_id,author,created_utc,score,body
0,mxocdxt,1laxbzg,Affectionate_Lie5601,1749868000.0,1789,bro \n\n3.409M ads blocked for me since instal...
1,mxoa9b7,1laxbzg,the_cat_did_it,1749867000.0,1385,At some point won't it just be faster to downl...
2,mxoeuxq,1laxbzg,VeryGayLopunny,1749869000.0,567,It's abhorrent on Roku. 60-75-second ad breaks...
3,mxoc4dt,1laxbzg,romjpn,1749868000.0,494,The era of free access to many websites is end...
4,mxo8nzu,1laxbzg,EzeakioDarmey,1749867000.0,445,Meanwhile; Brave users are just watching videos.


🧠 6. Create Subreddits Table

In [7]:
sub_data = []
for sr in subreddits:
    info = reddit.subreddit(sr)
    sub_data.append({
        "subreddit_id": info.id,
        "name": info.display_name,
        "subscribers": info.subscribers,
        "created_utc": info.created_utc,
        "description": info.public_description
    })

df_subreddits = pd.DataFrame(sub_data)
df_subreddits.head()


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Unnamed: 0,subreddit_id,name,subscribers,created_utc,description
0,2qh16,technology,19468200,1201232000.0,Subreddit dedicated to the news and discussion...
1,2t7no,Futurology,21574310,1323681000.0,A subreddit devoted to the field of Future(s) ...
2,mouw,science,34144890,1161180000.0,This community is a place to share and discuss...
