# Complete Reddit API Deep Dive
This notebook explores **EVERYTHING** you can extract from Reddit using PRAW (Python Reddit API Wrapper).

We'll use r/learnpython as our example - it's active, has diverse content, and perfect for testing.

## 1. Setup & Installation

In [4]:
# Install PRAW (Python Reddit API Wrapper) and python-dotenv
%pip install praw python-dotenv pandas

Collecting python-dotenv
  Downloading python_dotenv-1.2.1-py3-none-any.whl (21 kB)
Collecting pandas
  Using cached pandas-2.3.3-cp310-cp310-macosx_11_0_arm64.whl (10.8 MB)
Collecting tzdata>=2022.7
  Downloading tzdata-2025.3-py2.py3-none-any.whl (348 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m348.5/348.5 kB[0m [31m18.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting numpy>=1.22.4
  Using cached numpy-2.2.6-cp310-cp310-macosx_14_0_arm64.whl (5.3 MB)
Collecting pytz>=2020.1
  Using cached pytz-2025.2-py2.py3-none-any.whl (509 kB)
Installing collected packages: pytz, tzdata, python-dotenv, numpy, pandas
Successfully installed numpy-2.2.6 pandas-2.3.3 python-dotenv-1.2.1 pytz-2025.2 tzdata-2025.3

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to r

In [6]:
import praw
import pandas as pd
from datetime import datetime
import json
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Initialize Reddit API using environment variables
reddit = praw.Reddit(
    client_id=os.getenv('REDDIT_CLIENT_ID'),
    client_secret=os.getenv('REDDIT_CLIENT_SECRET'),
    username=os.getenv('REDDIT_USERNAME'),
    password=os.getenv('REDDIT_PASSWORD'),
    user_agent=os.getenv('REDDIT_USER_AGENT')
)

print(f"Read-only mode: {reddit.read_only}")
print(f"API initialized successfully!")

Read-only mode: False
API initialized successfully!


## 2. SUBREDDIT-LEVEL DATA
First, let's extract everything available at the subreddit level.

In [8]:
# Get the subreddit
subreddit = reddit.subreddit("learnpython")

# Extract ALL subreddit metadata
subreddit_data = {
    # Basic Info
    'display_name': subreddit.display_name,
    'title': subreddit.title,
    'description': subreddit.description,
    'public_description': subreddit.public_description,
    'subscribers': subreddit.subscribers,
    'active_user_count': getattr(subreddit, 'active_user_count', None),  # May not always be available
    
    # Dates & Times
    'created_utc': datetime.fromtimestamp(subreddit.created_utc),
    
    # Content Settings
    'over18': subreddit.over18,
    'subreddit_type': subreddit.subreddit_type,
    'allow_images': getattr(subreddit, 'allow_images', None),
    'allow_videogifs': getattr(subreddit, 'allow_videogifs', None),
    'allow_videos': getattr(subreddit, 'allow_videos', None),
    'allow_polls': getattr(subreddit, 'allow_polls', None),
    'allow_predictions': getattr(subreddit, 'allow_predictions', None),
    'allow_galleries': getattr(subreddit, 'allow_galleries', None),
    
    # Submission Settings
    'submission_type': subreddit.submission_type,
    'link_flair_enabled': subreddit.link_flair_enabled,
    'can_assign_link_flair': getattr(subreddit, 'can_assign_link_flair', None),
    'can_assign_user_flair': getattr(subreddit, 'can_assign_user_flair', None),
    
    # Community Settings
    'community_icon': subreddit.community_icon,
    'banner_img': getattr(subreddit, 'banner_img', None),
    'banner_background_image': getattr(subreddit, 'banner_background_image', None),
    'header_img': getattr(subreddit, 'header_img', None),
    'icon_img': getattr(subreddit, 'icon_img', None),
    'primary_color': getattr(subreddit, 'primary_color', None),
    'key_color': getattr(subreddit, 'key_color', None),
    
    # Rules & Restrictions
    'free_form_reports': getattr(subreddit, 'free_form_reports', None),
    'spoilers_enabled': getattr(subreddit, 'spoilers_enabled', None),
    'wiki_enabled': getattr(subreddit, 'wiki_enabled', None),
    
    # URLs
    'url': subreddit.url,
    'display_name_prefixed': subreddit.display_name_prefixed,
    
    # Language
    'lang': subreddit.lang,
    
    # Other
    'id': subreddit.id,
    'name': subreddit.name,
}

print("=" * 80)
print("SUBREDDIT METADATA")
print("=" * 80)
for key, value in subreddit_data.items():
    print(f"{key:30} : {value}")
    
print(f"\n✅ Found {len(subreddit_data)} subreddit attributes")

SUBREDDIT METADATA
display_name                   : learnpython
title                          : Python Education
description                    : *****************************

> [**Rules**](#icon-information)
> 
> 1: Be polite

> 2: Posts to this subreddit must be requests for help learning python.

> 3: Replies on this subreddit must be pertinent to the question OP asked. 

> 4: No replies copy / pasted from ChatGPT or similar.  

> 5: No advertising. No blogs/tutorials/videos/books/recruiting attempts.

> This means no posts advertising blogs/videos/tutorials/etc, no recruiting/hiring/seeking others posts. We're here to help, not to be advertised to. 

> Please, no "hit and run" posts, if you make a post, engage with people that answer you. Please do not delete your post after you get an answer, others might have a similar question or want to continue the conversation.

*****************************

> [**Learning resources**](#icon-information)

> Wiki and FAQ: [/r/learnpython/w/i

### 2.1 Subreddit Rules

In [9]:
# Get all subreddit rules
rules_data = []
for rule in subreddit.rules:
    rules_data.append({
        'short_name': rule.short_name,
        'description': rule.description,
        'kind': rule.kind,
        'violation_reason': rule.violation_reason,
        'created_utc': datetime.fromtimestamp(rule.created_utc),
        'priority': rule.priority,
    })

rules_df = pd.DataFrame(rules_data)
print(f"Found {len(rules_df)} rules")
print("\n", rules_df.to_string())

Found 5 rules

                                                               short_name                                                                                                                                                                                                                                          description     kind                                                    violation_reason         created_utc  priority
0                                                             Be polite.                                                                                             Don't insult others; everyone comes to Python with a different level of knowledge and experience, and what is obvious to you may not be obvious to them.      all                                                          Be polite. 2022-05-19 15:01:00         0
1     Posts to this subreddit must be requests for help learning python.  Please check [the FAQ](https://www.reddit.com/r/learnpython/

### 2.2 Flair Templates

In [10]:
# Get link flair templates
link_flairs = []
try:
    for flair in subreddit.flair.link_templates:
        link_flairs.append({
            'id': flair['id'],
            'text': flair['text'],
            'css_class': flair.get('css_class', ''),
            'background_color': flair.get('background_color', ''),
            'text_color': flair.get('text_color', ''),
            'mod_only': flair.get('mod_only', False),
            'allowable_content': flair.get('allowable_content', ''),
            'max_emojis': flair.get('max_emojis', 0),
        })
    
    if link_flairs:
        flair_df = pd.DataFrame(link_flairs)
        print(f"Found {len(flair_df)} link flair templates")
        print("\n", flair_df.to_string())
    else:
        print("No link flair templates found")
except Exception as e:
    print(f"Could not retrieve flairs: {e}")

Could not retrieve flairs: received 403 HTTP response


## 3. POST-LEVEL DATA (THE BIG ONE!)
Now let's get EVERYTHING from posts. We'll grab a few posts and extract every single attribute.

In [12]:
# Get top posts
posts = list(subreddit.hot(limit=5))
print(f"Retrieved {len(posts)} posts\n")

all_posts_data = []

for idx, post in enumerate(posts, 1):
    print(f"Processing post {idx}/{len(posts)}: {post.title[:50]}...")
    
    post_data = {
        # === BASIC INFO ===
        'id': post.id,
        'name': post.name,
        'title': post.title,
        'selftext': post.selftext,
        'author_name': str(post.author) if post.author else '[deleted]',
        
        # === SCORES & VOTING ===
        'score': post.score,
        'upvote_ratio': post.upvote_ratio,
        'ups': post.ups,
        'downs': post.downs,
        
        # === ENGAGEMENT METRICS ===
        'num_comments': post.num_comments,
        'num_crossposts': post.num_crossposts,
        
        # === AWARDS & GILDING ===
        'total_awards_received': post.total_awards_received,
        'all_awardings': len(post.all_awardings),
        'gilded': post.gilded,
        'gildings': post.gildings,
        
        # === TIMESTAMPS ===
        'created_utc': datetime.fromtimestamp(post.created_utc),
        'edited': post.edited if post.edited else False,
        
        # === POST TYPE & CONTENT ===
        'is_self': post.is_self,
        'is_video': post.is_video,
        'is_original_content': post.is_original_content,
        'is_reddit_media_domain': post.is_reddit_media_domain,
        'is_meta': post.is_meta,
        'is_robot_indexable': post.is_robot_indexable,
        
        # === MEDIA & LINKS ===
        'url': post.url,
        'domain': post.domain,
        'permalink': post.permalink,
        'full_link': f"https://reddit.com{post.permalink}",
        'thumbnail': post.thumbnail,
        'post_hint': getattr(post, 'post_hint', None),
        
        # === FLAIR ===
        'link_flair_text': post.link_flair_text,
        'link_flair_css_class': post.link_flair_css_class,
        'link_flair_background_color': post.link_flair_background_color,
        'link_flair_text_color': post.link_flair_text_color,
        'link_flair_type': post.link_flair_type,
        
        # === STATUS FLAGS ===
        'stickied': post.stickied,
        'pinned': post.pinned,
        'locked': post.locked,
        'archived': post.archived,
        'hidden': post.hidden,
        'saved': post.saved,
        'spoiler': post.spoiler,
        'over_18': post.over_18,
        'quarantine': post.quarantine,
        
        # === MODERATION ===
        'removed_by_category': post.removed_by_category,
        'approved': getattr(post, 'approved', None),
        'can_mod_post': post.can_mod_post,
        'mod_note': getattr(post, 'mod_note', None),
        'mod_reason_by': getattr(post, 'mod_reason_by', None),
        'mod_reason_title': getattr(post, 'mod_reason_title', None),
        'mod_reports': post.mod_reports,
        'user_reports': post.user_reports,
        'num_reports': getattr(post, 'num_reports', None),
        
        # === CONTEST & SPECIAL ===
        'contest_mode': post.contest_mode,
        'allow_live_comments': post.allow_live_comments,
        'treatment_tags': post.treatment_tags,
        
        # === SUBREDDIT INFO ===
        'subreddit': post.subreddit.display_name,
        'subreddit_id': post.subreddit_id,
        'subreddit_name_prefixed': post.subreddit_name_prefixed,
        'subreddit_subscribers': post.subreddit_subscribers,
        
        # === VISIBILITY ===
        'view_count': getattr(post, 'view_count', None),
        'visited': post.visited,
        'hide_score': post.hide_score,
        
        # === DISCUSSION ===
        'suggested_sort': post.suggested_sort,
        'can_gild': post.can_gild,
        'discussion_type': getattr(post, 'discussion_type', None),
        
        # === COLLECTIONS ===
        'collections': getattr(post, 'collections', None),
        
        # === PREVIEW ===
        'preview_enabled': hasattr(post, 'preview'),
        
        # === WHITELIST STATUS ===
        'whitelist_status': getattr(post, 'whitelist_status', None),
        'wls': getattr(post, 'wls', None),
        
        # === DISTINGUISHED ===
        'distinguished': post.distinguished,
        
        # === CATEGORY ===
        'category': getattr(post, 'category', None),
        'content_categories': getattr(post, 'content_categories', None),
    }
    
    all_posts_data.append(post_data)

posts_df = pd.DataFrame(all_posts_data)
print(f"\n✅ Extracted {len(posts_df.columns)} attributes from each post!")
print(f"\nColumns: {list(posts_df.columns)}")
posts_df

Retrieved 5 posts

Processing post 1/5: Ask Anything Monday - Weekly Thread...
Processing post 2/5: Ask Anything Monday - Weekly Thread...
Processing post 3/5: i wanna start to learn coding...
Processing post 4/5: How did you go about learning Python, and how long...
Processing post 5/5: Why does Spark spill to disk even with tons of mem...

✅ Extracted 72 attributes from each post!

Columns: ['id', 'name', 'title', 'selftext', 'author_name', 'score', 'upvote_ratio', 'ups', 'downs', 'num_comments', 'num_crossposts', 'total_awards_received', 'all_awardings', 'gilded', 'gildings', 'created_utc', 'edited', 'is_self', 'is_video', 'is_original_content', 'is_reddit_media_domain', 'is_meta', 'is_robot_indexable', 'url', 'domain', 'permalink', 'full_link', 'thumbnail', 'post_hint', 'link_flair_text', 'link_flair_css_class', 'link_flair_background_color', 'link_flair_text_color', 'link_flair_type', 'stickied', 'pinned', 'locked', 'archived', 'hidden', 'saved', 'spoiler', 'over_18', 'quarantine'

Unnamed: 0,id,name,title,selftext,author_name,score,upvote_ratio,ups,downs,num_comments,...,suggested_sort,can_gild,discussion_type,collections,preview_enabled,whitelist_status,wls,distinguished,category,content_categories
0,1pmt14q,t3_1pmt14q,Ask Anything Monday - Weekly Thread,"Welcome to another /r/learnPython weekly ""Ask ...",AutoModerator,1,0.6,1,0,0,...,new,False,,,False,,6,,,
1,1paxmgz,t3_1paxmgz,Ask Anything Monday - Weekly Thread,"Welcome to another /r/learnPython weekly ""Ask ...",AutoModerator,3,0.81,3,0,9,...,new,False,,,False,,6,,,
2,1pohs4v,t3_1pohs4v,i wanna start to learn coding,so i’ve heard that python is the best to start...,G2-118,4,0.7,4,0,7,...,,False,,,False,,6,,,
3,1po7kn6,t3_1po7kn6,"How did you go about learning Python, and how ...",I recently transitioned from Cybersecurity to ...,Practical-Secret3344,11,0.78,11,0,11,...,,False,,,False,,6,,,
4,1pnxqkr,t3_1pnxqkr,Why does Spark spill to disk even with tons of...,i’m running a pretty big Apache Spark job. lot...,Familiar_Network_108,23,0.91,23,0,5,...,,False,,,False,,6,,,


### 3.1 Detailed Award Information

In [13]:
# Extract detailed award information from posts
all_awards = []

for post in posts:
    if post.all_awardings:
        for award in post.all_awardings:
            all_awards.append({
                'post_id': post.id,
                'post_title': post.title[:50],
                'award_name': award['name'],
                'award_count': award['count'],
                'coin_price': award.get('coin_price', 0),
                'coin_reward': award.get('coin_reward', 0),
                'description': award.get('description', ''),
                'icon_url': award.get('icon_url', ''),
                'days_of_premium': award.get('days_of_premium', 0),
            })

if all_awards:
    awards_df = pd.DataFrame(all_awards)
    print(f"Found {len(awards_df)} awards across {len(posts)} posts")
    print(awards_df)
else:
    print("No awards found on these posts")

No awards found on these posts


### 3.2 Media & Gallery Information

In [14]:
# Extract media information
media_data = []

for post in posts:
    media_info = {
        'post_id': post.id,
        'post_title': post.title[:50],
        'is_video': post.is_video,
        'is_gallery': hasattr(post, 'is_gallery') and post.is_gallery,
        'thumbnail': post.thumbnail,
        'url': post.url,
        'domain': post.domain,
    }
    
    # Video details
    if post.is_video and hasattr(post, 'media') and post.media:
        reddit_video = post.media.get('reddit_video', {})
        media_info['video_duration'] = reddit_video.get('duration', None)
        media_info['video_height'] = reddit_video.get('height', None)
        media_info['video_width'] = reddit_video.get('width', None)
        media_info['video_fallback_url'] = reddit_video.get('fallback_url', None)
        media_info['video_bitrate_kbps'] = reddit_video.get('bitrate_kbps', None)
    
    # Gallery details
    if hasattr(post, 'is_gallery') and post.is_gallery:
        if hasattr(post, 'gallery_data'):
            media_info['gallery_item_count'] = len(post.gallery_data.get('items', []))
    
    # Preview images
    if hasattr(post, 'preview'):
        preview_images = post.preview.get('images', [])
        if preview_images:
            media_info['preview_image_count'] = len(preview_images)
            source = preview_images[0].get('source', {})
            media_info['preview_width'] = source.get('width', None)
            media_info['preview_height'] = source.get('height', None)
    
    media_data.append(media_info)

media_df = pd.DataFrame(media_data)
print(f"Media information for {len(media_df)} posts:")
media_df

Media information for 5 posts:


Unnamed: 0,post_id,post_title,is_video,is_gallery,thumbnail,url,domain
0,1pmt14q,Ask Anything Monday - Weekly Thread,False,False,self,https://www.reddit.com/r/learnpython/comments/...,self.learnpython
1,1paxmgz,Ask Anything Monday - Weekly Thread,False,False,self,https://www.reddit.com/r/learnpython/comments/...,self.learnpython
2,1pohs4v,i wanna start to learn coding,False,False,self,https://www.reddit.com/r/learnpython/comments/...,self.learnpython
3,1po7kn6,"How did you go about learning Python, and how ...",False,False,self,https://www.reddit.com/r/learnpython/comments/...,self.learnpython
4,1pnxqkr,Why does Spark spill to disk even with tons of...,False,False,self,https://www.reddit.com/r/learnpython/comments/...,self.learnpython


## 4. COMMENT-LEVEL DATA (DEEP DIVE!)
Now let's extract EVERYTHING from comments within a post.

In [23]:
# Find a post with comments
target_post = None
for post in posts:
    if post.num_comments > 0:
        target_post = post
        break

if not target_post:
    print("⚠️ No posts with comments found in the current batch. Try getting more posts or a different subreddit.")
    # Create empty dataframe to prevent errors
    comments_df = pd.DataFrame()
else:
    print(f"Analyzing comments from: {target_post.title}")
    print(f"Post has {target_post.num_comments} comments\n")

    # Replace "MoreComments" objects to get all comments
    target_post.comments.replace_more(limit=0)

    # Get all comments (flattened)
    all_comments = target_post.comments.list()
    print(f"Found {len(all_comments)} total comments (after expand)\n")

    # Extract EVERYTHING from comments
    comments_data = []

    # Process up to 20 comments for demo
    num_to_process = min(20, len(all_comments))
    
    for idx, comment in enumerate(all_comments[:num_to_process], 1):
        print(f"Processing comment {idx}/{num_to_process}...")
        
        comment_data = {
            # === BASIC INFO ===
            'id': comment.id,
            'name': comment.name,
            'body': comment.body[:100],  # First 100 chars
            'body_length': len(comment.body),
            'author_name': str(comment.author) if comment.author else '[deleted]',
            
            # === SCORES & VOTING ===
            'score': comment.score,
            'ups': comment.ups,
            'downs': comment.downs,
            'score_hidden': comment.score_hidden,
            
            # === AWARDS & GILDING ===
            'total_awards_received': comment.total_awards_received,
            'all_awardings_count': len(comment.all_awardings),
            'gilded': comment.gilded,
            'gildings': comment.gildings,
            
            # === TIMESTAMPS ===
            'created_utc': datetime.fromtimestamp(comment.created_utc),
            'edited': comment.edited if comment.edited else False,
            
            # === HIERARCHY & THREADING ===
            'parent_id': comment.parent_id,
            'link_id': comment.link_id,
            'depth': getattr(comment, 'depth', None),
            'is_root': comment.is_root,
            
            # === STATUS FLAGS ===
            'stickied': comment.stickied,
            'locked': getattr(comment, 'locked', False),
            'archived': comment.archived,
            'saved': comment.saved,
            'can_gild': comment.can_gild,
            
            # === MODERATION ===
            'distinguished': comment.distinguished,
            'removed': getattr(comment, 'removed', False),
            'approved': getattr(comment, 'approved', None),
            'mod_note': getattr(comment, 'mod_note', None),
            'mod_reason_by': getattr(comment, 'mod_reason_by', None),
            'mod_reason_title': getattr(comment, 'mod_reason_title', None),
            'mod_reports': comment.mod_reports,
            'user_reports': comment.user_reports,
            'num_reports': getattr(comment, 'num_reports', None),
            'can_mod_post': comment.can_mod_post,
            
            # === SPECIAL TYPES ===
            'is_submitter': comment.is_submitter,  # Is the post author
            'send_replies': comment.send_replies,
            'collapsed': comment.collapsed,
            'collapsed_reason': getattr(comment, 'collapsed_reason', None),
            'collapsed_because_crowd_control': getattr(comment, 'collapsed_because_crowd_control', None),
            
            # === SUBREDDIT INFO ===
            'subreddit': comment.subreddit.display_name,
            'subreddit_id': comment.subreddit_id,
            'subreddit_name_prefixed': comment.subreddit_name_prefixed,
            
            # === PERMALINK ===
            'permalink': comment.permalink,
            'full_link': f"https://reddit.com{comment.permalink}",
            
            # === CONTROVERSY ===
            'controversiality': comment.controversiality,
            
            # === TREATMENT ===
            'treatment_tags': comment.treatment_tags,
            
            # === ASSOCIATED POST ===
            'submission_title': comment.submission.title[:50],
            'submission_id': comment.submission.id,
        }
        
        comments_data.append(comment_data)

    comments_df = pd.DataFrame(comments_data)
    print(f"\n✅ Extracted {len(comments_df.columns)} attributes from each comment!")
    print(f"\nColumns: {list(comments_df.columns)}")
    
comments_df

Analyzing comments from: Ask Anything Monday - Weekly Thread
Post has 9 comments

Found 8 total comments (after expand)

Processing comment 1/8...
Processing comment 2/8...
Processing comment 3/8...
Processing comment 4/8...
Processing comment 5/8...
Processing comment 6/8...
Processing comment 7/8...
Processing comment 8/8...

✅ Extracted 48 attributes from each comment!

Columns: ['id', 'name', 'body', 'body_length', 'author_name', 'score', 'ups', 'downs', 'score_hidden', 'total_awards_received', 'all_awardings_count', 'gilded', 'gildings', 'created_utc', 'edited', 'parent_id', 'link_id', 'depth', 'is_root', 'stickied', 'locked', 'archived', 'saved', 'can_gild', 'distinguished', 'removed', 'approved', 'mod_note', 'mod_reason_by', 'mod_reason_title', 'mod_reports', 'user_reports', 'num_reports', 'can_mod_post', 'is_submitter', 'send_replies', 'collapsed', 'collapsed_reason', 'collapsed_because_crowd_control', 'subreddit', 'subreddit_id', 'subreddit_name_prefixed', 'permalink', 'full_l

Unnamed: 0,id,name,body,body_length,author_name,score,ups,downs,score_hidden,total_awards_received,...,collapsed_because_crowd_control,subreddit,subreddit_id,subreddit_name_prefixed,permalink,full_link,controversiality,treatment_tags,submission_title,submission_id
0,nrosyi4,t1_nrosyi4,I am using Spyder with Anaconda in Windows for...,1187,iorgfeflkd,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
1,nrqtoqe,t1_nrqtoqe,I heard the Automate The Boring Stuff course b...,184,redash12345,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
2,nrsou6g,t1_nrsou6g,"Hello guys, beginner Python user here. \nAny m...",528,papupig,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
3,nt6gmza,t1_nt6gmza,"Hello everyone, I am a beginner in Python. I w...",204,Impressive_Ice_7083,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
4,nt7szai,t1_nt7szai,I started easing my way into coding about 4-5 ...,1027,[deleted],1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
5,nrtie23,t1_nrtie23,Super odd. Grand that you posted the answer he...,70,CowboyBoats,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
6,nrti65w,t1_nrti65w,"> Hello guys, beginner Python user here. Any m...",837,CowboyBoats,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz
7,nteons7,t1_nteons7,YouTube is a great place to start! I'd keep se...,239,CowboyBoats,1,1,0,False,0,...,,learnpython,t5_2r8ot,r/learnpython,/r/learnpython/comments/1paxmgz/ask_anything_m...,https://reddit.com/r/learnpython/comments/1pax...,0,[],Ask Anything Monday - Weekly Thread,1paxmgz


### 4.1 Comment Awards Detail

In [24]:
# Extract comment awards
comment_awards = []

for comment in all_comments[:20]:
    if comment.all_awardings:
        for award in comment.all_awardings:
            comment_awards.append({
                'comment_id': comment.id,
                'comment_body': comment.body[:50],
                'award_name': award['name'],
                'award_count': award['count'],
                'coin_price': award.get('coin_price', 0),
                'description': award.get('description', ''),
            })

if comment_awards:
    comment_awards_df = pd.DataFrame(comment_awards)
    print(f"Found {len(comment_awards_df)} awards on comments")
    print(comment_awards_df)
else:
    print("No awards found on these comments")

No awards found on these comments


### 4.2 Comment Thread Structure

In [17]:
# Visualize comment thread structure
def build_thread_tree(comment, depth=0, max_depth=3):
    """Recursively build comment thread structure"""
    if depth > max_depth:
        return []
    
    thread_data = [{
        'depth': depth,
        'comment_id': comment.id,
        'author': str(comment.author) if comment.author else '[deleted]',
        'score': comment.score,
        'body_preview': comment.body[:60].replace('\n', ' '),
        'num_replies': len(comment.replies) if hasattr(comment, 'replies') else 0,
        'is_submitter': comment.is_submitter,
    }]
    
    # Process replies
    if hasattr(comment, 'replies'):
        for reply in comment.replies[:3]:  # Limit to 3 replies per level
            if isinstance(reply, praw.models.Comment):
                thread_data.extend(build_thread_tree(reply, depth + 1, max_depth))
    
    return thread_data

# Build thread for top-level comments
thread_structure = []
for top_comment in target_post.comments[:3]:  # Top 3 comment threads
    if isinstance(top_comment, praw.models.Comment):
        thread_structure.extend(build_thread_tree(top_comment))

if thread_structure:
    thread_df = pd.DataFrame(thread_structure)
    print(f"Comment thread structure (first 3 threads):")
    print(thread_df.to_string())
else:
    print("No thread structure to display")

No thread structure to display


## 5. USER-LEVEL DATA (Basic Info Only)
Extract basic information about users (post authors and comment authors).

In [18]:
# Collect unique users from posts and comments
unique_users = set()

# From posts
for post in posts:
    if post.author:
        unique_users.add(post.author.name)

# From comments
for comment in all_comments[:20]:
    if comment.author:
        unique_users.add(comment.author.name)

print(f"Found {len(unique_users)} unique users\n")

# Extract basic user info
users_data = []

for username in list(unique_users)[:10]:  # First 10 users for demo
    try:
        user = reddit.redditor(username)
        
        user_info = {
            'username': user.name,
            'id': user.id,
            'link_karma': user.link_karma,
            'comment_karma': user.comment_karma,
            'total_karma': user.link_karma + user.comment_karma,
            'created_utc': datetime.fromtimestamp(user.created_utc),
            'account_age_days': (datetime.now() - datetime.fromtimestamp(user.created_utc)).days,
            'is_gold': user.is_gold,
            'is_mod': user.is_mod,
            'is_employee': user.is_employee,
            'has_verified_email': user.has_verified_email,
            'icon_img': user.icon_img if hasattr(user, 'icon_img') else None,
        }
        
        users_data.append(user_info)
        print(f"✓ {username}")
        
    except Exception as e:
        print(f"✗ {username}: {e}")

users_df = pd.DataFrame(users_data)
print(f"\n✅ Collected basic info for {len(users_df)} users")
users_df

Found 4 unique users

✓ Practical-Secret3344
✓ Familiar_Network_108
✓ AutoModerator
✓ G2-118

✅ Collected basic info for 4 users


Unnamed: 0,username,id,link_karma,comment_karma,total_karma,created_utc,account_age_days,is_gold,is_mod,is_employee,has_verified_email,icon_img
0,Practical-Secret3344,1vvqmmc3yz,22,1,23,2025-08-18 19:58:57,120,False,False,False,True,https://www.redditstatic.com/avatars/defaults/...
1,Familiar_Network_108,1uh8zatwpm,78,46,124,2025-07-28 04:39:59,141,False,False,False,True,https://www.redditstatic.com/avatars/defaults/...
2,AutoModerator,6l4z3,1000,1000,2000,2012-01-05 00:24:28,5094,True,True,False,True,https://styles.redditmedia.com/t5_1yz875/style...
3,G2-118,xv700dulp,253,219,472,2024-04-08 04:00:09,617,False,True,False,True,https://styles.redditmedia.com/t5_b9h0u4/style...


## 6. AGGREGATE STATISTICS & INSIGHTS

In [20]:
# Generate comprehensive statistics
print("=" * 80)
print("COMPREHENSIVE REDDIT DATA SUMMARY")
print("=" * 80)

print(f"\n📊 SUBREDDIT: r/{subreddit.display_name}")
print(f"   Subscribers: {subreddit.subscribers:,}")
active_users = getattr(subreddit, 'active_user_count', None)
if active_users:
    print(f"   Active Users: {active_users:,}")
print(f"   Created: {datetime.fromtimestamp(subreddit.created_utc).strftime('%Y-%m-%d')}")

print(f"\n📝 POSTS ANALYZED: {len(posts)}")
if not posts_df.empty:
    print(f"   Total Score: {posts_df['score'].sum():,}")
    print(f"   Average Score: {posts_df['score'].mean():.2f}")
    print(f"   Total Comments: {posts_df['num_comments'].sum():,}")
    print(f"   Average Comments: {posts_df['num_comments'].mean():.2f}")
    print(f"   Total Awards: {posts_df['total_awards_received'].sum()}")
    print(f"   Average Upvote Ratio: {posts_df['upvote_ratio'].mean():.2%}")
    print(f"   Video Posts: {posts_df['is_video'].sum()}")
    print(f"   OC Posts: {posts_df['is_original_content'].sum()}")
    print(f"   Locked Posts: {posts_df['locked'].sum()}")
    print(f"   NSFW Posts: {posts_df['over_18'].sum()}")
else:
    print("   No post data available")

print(f"\n💬 COMMENTS ANALYZED: {len(comments_df)}")
if not comments_df.empty:
    print(f"   Total Score: {comments_df['score'].sum():,}")
    print(f"   Average Score: {comments_df['score'].mean():.2f}")
    print(f"   Total Awards: {comments_df['total_awards_received'].sum()}")
    print(f"   Gilded Comments: {comments_df['gilded'].sum()}")
    print(f"   Distinguished Comments: {comments_df['distinguished'].notna().sum()}")
    print(f"   Comments by OP: {comments_df['is_submitter'].sum()}")
    if comments_df['depth'].notna().sum() > 0:
        print(f"   Average Depth: {comments_df['depth'].mean():.2f}")
    print(f"   Root Comments: {comments_df['is_root'].sum()}")
else:
    print("   No comment data available - run the comment extraction cells first")

print(f"\n👥 USERS FOUND: {len(unique_users)}")
if not users_df.empty:
    print(f"   Total Karma (sample): {users_df['total_karma'].sum():,}")
    print(f"   Average Account Age: {users_df['account_age_days'].mean():.0f} days")
    print(f"   Gold Members: {users_df['is_gold'].sum()}")
    print(f"   Moderators: {users_df['is_mod'].sum()}")
else:
    print("   No user data available - run the user extraction cell first")

print("\n" + "=" * 80)
print("✅ DATA EXTRACTION COMPLETE!")
print("=" * 80)

COMPREHENSIVE REDDIT DATA SUMMARY

📊 SUBREDDIT: r/learnpython
   Subscribers: 980,532
   Created: 2009-10-02

📝 POSTS ANALYZED: 5
   Total Score: 42
   Average Score: 8.40
   Total Comments: 32
   Average Comments: 6.40
   Total Awards: 0
   Average Upvote Ratio: 76.00%
   Video Posts: 0
   OC Posts: 0
   Locked Posts: 0
   NSFW Posts: 0

💬 COMMENTS ANALYZED: 0
   No comment data available - run the comment extraction cells first

👥 USERS FOUND: 4
   Total Karma (sample): 2,619
   Average Account Age: 1493 days
   Gold Members: 1
   Moderators: 2

✅ DATA EXTRACTION COMPLETE!


## 7. EXPORT DATA
Save all extracted data to files for further analysis.

In [21]:
# Export to CSV
posts_df.to_csv('reddit_posts_data.csv', index=False)
comments_df.to_csv('reddit_comments_data.csv', index=False)
if not users_df.empty:
    users_df.to_csv('reddit_users_data.csv', index=False)

# Export to JSON for complete data preservation
posts_df.to_json('reddit_posts_data.json', orient='records', indent=2)
comments_df.to_json('reddit_comments_data.json', orient='records', indent=2)

# Save subreddit metadata
with open('reddit_subreddit_data.json', 'w') as f:
    json.dump(subreddit_data, f, indent=2, default=str)

print("✅ Data exported to:")
print("   - reddit_posts_data.csv")
print("   - reddit_posts_data.json")
print("   - reddit_comments_data.csv")
print("   - reddit_comments_data.json")
print("   - reddit_users_data.csv")
print("   - reddit_subreddit_data.json")

✅ Data exported to:
   - reddit_posts_data.csv
   - reddit_posts_data.json
   - reddit_comments_data.csv
   - reddit_comments_data.json
   - reddit_users_data.csv
   - reddit_subreddit_data.json


## 📋 COMPLETE ATTRIBUTE REFERENCE

### Subreddit Attributes (40+)
- Basic: name, title, description, subscribers, active users
- Settings: over18, subreddit_type, allow_images, allow_videos, etc.
- Appearance: icons, banners, colors
- Features: wiki_enabled, spoilers_enabled, flair settings
- Rules and moderation settings

### Post Attributes (70+)
- **Identity**: id, name, title, author
- **Content**: selftext, url, domain, media
- **Scores**: score, upvote_ratio, ups, downs
- **Engagement**: num_comments, num_crossposts
- **Awards**: total_awards_received, gildings, all_awardings
- **Timing**: created_utc, edited
- **Type**: is_self, is_video, is_original_content
- **Status**: locked, archived, stickied, pinned, hidden, saved, spoiler
- **Flair**: link_flair_text, colors, css_class
- **Moderation**: removed_by_category, mod_reports, user_reports
- **Special**: contest_mode, allow_live_comments, treatment_tags

### Comment Attributes (50+)
- **Identity**: id, name, author
- **Content**: body, body_html
- **Scores**: score, ups, downs, score_hidden
- **Awards**: total_awards_received, gilded, gildings
- **Hierarchy**: parent_id, depth, is_root
- **Status**: stickied, locked, archived, collapsed
- **Special**: is_submitter, controversiality, distinguished
- **Moderation**: mod_reports, user_reports, removed

### User Attributes (Basic)
- username, id, karma (link + comment)
- created_utc, account_age
- is_gold, is_mod, is_employee
- has_verified_email

---

**🎯 This notebook demonstrates extraction of 200+ unique data points from Reddit!**