<a href="https://colab.research.google.com/github/DVerma11/Reddit_Anxiety_Symptoms_Narratives_NLP_Exploration/blob/main/Part1_Anxiety_Symptoms_Multipost_Reddit_Extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# I. Install libraries

In [None]:
%%capture
pip install praw

In [None]:
!pip install praw pandas tqdm



In [None]:
import praw
import pandas as pd
from tqdm import tqdm
import hashlib

# II. Load Reddit API credentials and Post URLs

In [None]:
# Reddit API credentials
reddit = praw.Reddit(
    client_id="YOUR_CLIENT_ID",
    client_secret="YOUR_CLIENT_SECRET",
    user_agent="YOUR_USER_AGENT",
    username="YOUR_USERNAME",
    password="YOUR_PASSWORD"
)


In [None]:
#Store Your URLs
urls = [
    "https://www.reddit.com/r/Anxiety/comments/1czzuoo/",
    "https://www.reddit.com/r/Anxiety/comments/r1ridv/",
    "https://www.reddit.com/r/Anxietyhelp/comments/16ixn6t/",
    "https://www.reddit.com/r/Anxiety/comments/1d0unue/",
    "https://www.reddit.com/r/Anxiety/comments/11e5epk/",
    "https://www.reddit.com/r/Anxiety/comments/191w0az/",
    "https://www.reddit.com/r/Anxiety/comments/18hwzku/",
    "https://www.reddit.com/r/Anxiety/comments/um67a8/",
    "https://www.reddit.com/r/Anxiety/comments/1e466yk/",
    "https://www.reddit.com/r/Anxiety/comments/10jcfko/",
    "https://www.reddit.com/r/Anxiety/comments/1dcr9t0/",
    "https://www.reddit.com/r/Anxiety/comments/1cl2frr/",
    "https://www.reddit.com/r/Anxiety/comments/13nzoyr/",
    "https://www.reddit.com/r/Anxiety/comments/1hese67/"
]


In [None]:
def anon_author(author):
    if author is None:
        return None
    return hashlib.sha256(str(author).encode()).hexdigest()


In [None]:
#Collect Threads
post_rows = []
comment_rows = []

for url in tqdm(urls):
    submission = reddit.submission(url=url)
    submission.comments.replace_more(limit=None)

    # ---- POST ----
    post_rows.append({
        "post_id": submission.id,
        "subreddit": submission.subreddit.display_name,
        "title": submission.title,
        "post_body": submission.selftext,
        "author_hash": anon_author(submission.author),
        "score": submission.score,
        "upvote_ratio": submission.upvote_ratio,
        "num_comments": submission.num_comments,
        "created_utc": submission.created_utc,
        "url": submission.url
    })

    # ---- COMMENTS ----
    for c in submission.comments.list():
        comment_rows.append({
            # ---- Comment info ----
            "comment_id": c.id,
            "parent_id": c.parent_id,
            "comment_body": c.body,
            "author_hash": anon_author(c.author),
            "score": c.score,
            "created_utc": c.created_utc,

            # ---- Post info ----
            "post_id": submission.id,
            "subreddit": submission.subreddit.display_name,
            "title": submission.title,
            "post_body": submission.selftext,
            "post_author_hash": anon_author(submission.author),
            "post_score": submission.score,
            "post_upvote_ratio": submission.upvote_ratio,
            "num_comments": submission.num_comments,
            "post_created_utc": submission.created_utc,
            "post_url": submission.url
        })



100%|██████████████████████████████████████████████████████████████████████████████████| 14/14 [01:03<00:00,  4.55s/it]


# III. Save raw comments file

In [None]:
import pandas as pd

# --------------------------------
# SAVE RAW COMMENTS FILE
# --------------------------------

# Ensure unique comments
comments_raw = comments_df.drop_duplicates(subset="comment_id").copy()

# Create full_text for NLP
comments_raw["full_text"] = (
    comments_raw["title"].fillna("") + " " +
    comments_raw["post_body"].fillna("") + " " +
    comments_raw["comment_body"].fillna("")
)

# Save raw file
comments_raw.to_csv("reddit_anxietysymptoms_comments_raw.csv", index=False)
print("Saved raw NLP file: reddit_anxietysymptoms_comments_raw.csv")

# Ready for NLP
texts = comments_raw["comment_body"].astype(str).tolist()


Saved raw NLP file: reddit_anxietysymptoms_comments_raw.csv


# IV. Basic Exploration of Reddit Data

Fetched posts, comments, length, upvotes, time span

In [None]:
len(comments_raw)
print("Total fetched comments across all posts:", len(comments_raw))

Total fetched comments across all posts: 3075


In [None]:
print(comments_df.columns)

Index(['post_id', 'comment_id', 'parent_id', 'comment_body', 'author_hash',
       'score', 'created_utc', 'title', 'post_body', 'full_text'],
      dtype='object')


In [None]:
import pandas as pd

comments_df = pd.read_csv("reddit_anxietysymptoms_comments_raw.csv")



In [None]:
# --------------------------------
# POST-LEVEL STATS
# --------------------------------

# Count fetched comments per post
comment_counts = (
    comments_df
    .groupby("post_id")
    .agg(
        fetched_comments=("comment_id", "count"),
        unique_users=("author_hash", "nunique"),
        first_comment_utc=("created_utc", "min"),
        last_comment_utc=("created_utc", "max")
    )
    .reset_index()
)

# Base post table
posts_base = (
    posts_df[[
        "post_id",
        "subreddit",
        "title",
        "post_body",
        "score",
        "upvote_ratio",
        "num_comments",
        "created_utc"
    ]]
    .rename(columns={"created_utc": "post_created_utc"})
    .drop_duplicates(subset="post_id")
)

# Merge stats
posts_stats = posts_base.merge(
    comment_counts,
    on="post_id",
    how="left"
)

# Fill missing (posts with no comments)
posts_stats[[
    "fetched_comments",
    "unique_users"
]] = posts_stats[[
    "fetched_comments",
    "unique_users"
]].fillna(0).astype(int)

# Convert timestamps
posts_stats["first_comment_time"] = pd.to_datetime(
    posts_stats["first_comment_utc"], unit="s", errors="coerce"
)
posts_stats["last_comment_time"] = pd.to_datetime(
    posts_stats["last_comment_utc"], unit="s", errors="coerce"
)
posts_stats["post_created_time"] = pd.to_datetime(
    posts_stats["post_created_utc"], unit="s", errors="coerce"
)

# Save stats file
posts_stats.to_csv("reddit_anxietysymptoms_stats.csv", index=False)
print("Saved stats file: reddit_anxietysymptoms_stats.csv")


Saved stats file: reddit_anxietysymptoms_stats.csv


In [None]:
from IPython.display import FileLink, display

filename = "reddit_anxietysymptoms_stats.csv"

display(FileLink(filename))

In [None]:
print(posts_stats.columns)

Index(['post_id', 'subreddit', 'title', 'post_body', 'score', 'upvote_ratio',
       'num_comments', 'post_created_utc', 'fetched_comments', 'unique_users',
       'first_comment_utc', 'last_comment_utc', 'first_comment_time',
       'last_comment_time', 'post_created_time'],
      dtype='object')


In [None]:
len(posts_stats)
print("Total fetched comments across all posts:", len(posts_stats))

Total fetched comments across all posts: 14


In [None]:
# Show first 5 rows
posts_stats.head()

Unnamed: 0,post_id,subreddit,title,post_body,score,upvote_ratio,num_comments,post_created_utc,fetched_comments_x,unique_users,first_comment_utc,last_comment_utc,first_comment_time,last_comment_time,post_created_time
0,1czzuoo,Anxiety,Here is a full list of anxiety symptoms I deal...,Anxiety easily can cause a million different s...,484,0.99,380,1716597000.0,374,208,1716598000.0,1765582000.0,2024-05-25 00:48:51,2025-12-12 23:31:37,2024-05-25 00:37:51
1,r1ridv,Anxiety,What are your anxiety symptoms (Physical)?,"Mine is jaw numbness/pain, shortness of breath...",223,1.0,297,1637827000.0,292,160,1637828000.0,1745191000.0,2021-11-25 08:13:10,2025-04-20 23:11:18,2021-11-25 08:00:15
2,16ixn6t,Anxietyhelp,everyone with an anxiety disorder what are /wa...,i want to know what other people’s symptoms wa...,58,1.0,97,1694735000.0,94,57,1694735000.0,1764270000.0,2023-09-14 23:38:36,2025-11-27 18:54:40,2023-09-14 23:38:35
3,1d0unue,Anxiety,What are YOUR symptoms of anxiety?,My aunt was telling me that it helped her unde...,212,1.0,307,1716702000.0,313,167,1716702000.0,1760810000.0,2024-05-26 05:46:42,2025-10-18 17:49:06,2024-05-26 05:45:13
4,11e5epk,Anxiety,What’s everyone’s everyday anxiety symptoms?,Mines racing heart\nPalpitations \nConstant he...,242,0.99,390,1677589000.0,387,237,1677590000.0,1761670000.0,2023-02-28 13:14:31,2025-10-28 16:49:50,2023-02-28 12:57:09


In [None]:
#show only few columns:
posts_stats[["post_id", "subreddit", "title", "num_comments", "fetched_comments", "unique_users"]]

Unnamed: 0,post_id,subreddit,title,num_comments,fetched_comments,unique_users
0,1czzuoo,Anxiety,Here is a full list of anxiety symptoms I deal...,380,374,208
1,r1ridv,Anxiety,What are your anxiety symptoms (Physical)?,297,292,160
2,16ixn6t,Anxietyhelp,everyone with an anxiety disorder what are /wa...,97,94,57
3,1d0unue,Anxiety,What are YOUR symptoms of anxiety?,307,313,167
4,11e5epk,Anxiety,What’s everyone’s everyday anxiety symptoms?,390,387,237
5,191w0az,Anxiety,What are your physical symptoms of anxiety?,411,402,238
6,18hwzku,Anxiety,What are your anxiety symptoms like?,105,103,53
7,um67a8,Anxiety,What are some symptoms/signs of anxiety that y...,106,105,75
8,1e466yk,Anxiety,What symptoms have you felt from anxiety?,304,304,159
9,10jcfko,Anxiety,Long-Term Anxiety Symptoms I Had! (from someon...,693,700,319


In [None]:
#Total fetched comments across all posts:

total_fetched_comments = posts_stats["fetched_comments"].sum()
print("Total fetched comments across all posts:", total_fetched_comments)


Total fetched comments across all posts: 3822


# V. Timespan

In [None]:
#TIME SPAN
first_post_time = posts_stats["post_created_time"].min()
last_comment_time = posts_stats["last_comment_time"].max()

print("Earliest post:", first_post_time)
print("Latest comment:", last_comment_time)


Earliest post: 2021-11-25 08:00:15
Latest comment: 2025-12-12 23:31:37


In [None]:
len(posts_stats)
print("Total fetched comments across all posts:", len(comments_df))


Total fetched comments across all posts: 3822


In [None]:
#Count rows per group
comments_raw.groupby('title').size()

title
Anyone suffering with long term anxiety physical symptoms?                                             79
Here is a full list of anxiety symptoms I dealt with during my anxiety recovery journey               374
Long term ongoing physical anxiety symptoms.                                                           27
Long-Term Anxiety Symptoms I Had! (from someone who has recovered/been free from them for 1+ year)    700
What are YOUR symptoms of anxiety?                                                                    313
What are some symptoms/signs of anxiety that you didn’t realize were from anxiety at first?           105
What are your anxiety symptoms (Physical)?                                                            292
What are your anxiety symptoms like?                                                                  103
What are your physical symptoms of anxiety?                                                           402
What are your physical symptoms of chron

In [None]:
#Visually Inspect One  Document
print(comments_raw.loc[0, "full_text"][:3000])

Here is a full list of anxiety symptoms I dealt with during my anxiety recovery journey Anxiety easily can cause a million different symptoms. I made a near full recovery and one of the worst things I had to deal with was the symptoms. Dealing with symptoms is an endless cycle that seems to never end. When I lost the fear of 1 symptom, I had a new one the next week. Its important to understand these symptoms because it takes away the power they have over you. Here is a SHORT list of the symptoms I had. I easily had 100+ symptoms, and I am leaving out the dpdr and ocd symptoms. I have recovered 95% from all of this. Feel free to ask me about any of these symptoms! Edit, please checkout my page which has all resources for free. It’s on my Reddit profile! 

**Physical Symptoms that I had**

1.  Heart Palpitations 
2. Shortness of Breath 
3. Weakness  
4. Feelings of fainting 
5. Intense Headaches 
6. Tingling Sensations all over the body
7. Body pains (Back pain, shoulder pain, leg pain, 

In [None]:
#Explicitly Confirm Each Component Separately
i = 0

print("TITLE:\n", comments_raw.loc[i, "title"][:300])
print("\nPOST BODY:\n", comments_raw.loc[i, "post_body"][:500])
print("\nCOMMENTS (first 500 chars):\n", comments_raw.loc[i, "comment_body"][:500])

TITLE:
 Here is a full list of anxiety symptoms I dealt with during my anxiety recovery journey

POST BODY:
 Anxiety easily can cause a million different symptoms. I made a near full recovery and one of the worst things I had to deal with was the symptoms. Dealing with symptoms is an endless cycle that seems to never end. When I lost the fear of 1 symptom, I had a new one the next week. Its important to understand these symptoms because it takes away the power they have over you. Here is a SHORT list of the symptoms I had. I easily had 100+ symptoms, and I am leaving out the dpdr and ocd symptoms. I ha

COMMENTS (first 500 chars):
 omg you have no idea how much better i feel. i’ve had all these feeling and i genuinely have been feeling like im going to die. i have been going to therapy and have been taking medication for about a month and i feel a lot better than i did two months ago but im still not 100% yet.


In [None]:
#Upvotes
#Weight symptom mentions, Filter low-quality comments, Identify salient symptoms
comments_df = comments_df[comments_df["score"] >= 2]


End of Data Extraction