# Climate Change Posts Data Collection

This notebook downloads recent posts from the r/climate, r/environment, r/sustainability subreddit using the Reddit API (PRAW) and saves them to a CSV file.

**Author**: Bhupinder jit Mehton                                                
**Created**: November 2, 2025  
**Purpose**: Collect climate change posts for analysis

## 1. Import Required Libraries

Import all necessary libraries for Reddit API access, CSV handling, and environment variable management.

In [37]:
%pip install praw python-dotenv



In [38]:
# -*- coding: utf-8 -*-
import praw
import csv
import os
from dotenv import load_dotenv

print("Libraries imported successfully!")

Libraries imported successfully!


## 2. Load Environment Variables

Load Reddit API credentials from the environment file.

In [39]:
# Load environment variables from .env file
# load_dotenv('reddit_api.env') # This line was commented out
from dotenv import dotenv_values
import os

# Define the path to your .env file.

env_file_path = '/content/reddit_api_template.env'


# Load environment variables from reddit_api.env file if it exists
if os.path.exists(env_file_path):
    config = dotenv_values(env_file_path)
    print(f" Environment variables loaded from {env_file_path}!")
else:
    config = {}
    print(f"Error: '{env_file_path}' not found. Environment variables not loaded.")
    print("Please ensure the 'reddit_api.env' file is in the specified Google Drive path.")

 Environment variables loaded from /content/reddit_api_template.env!


## 3. Authenticate with Reddit API

Establish connection to Reddit using PRAW with the loaded credentials.

In [40]:
# Authenticate with Reddit using environment variables
reddit = praw.Reddit(
    client_id=config.get('REDDIT_CLIENT_ID'),
    client_secret=config.get('REDDIT_CLIENT_SECRET'),
    username=config.get('REDDIT_USERNAME'),
    #password=config.get('REDDIT_PASSWORD'),
    user_agent=config.get('REDDIT_USER_AGENT')
)

print("Reddit API authenticated successfully!")
print(f"Connected as: {reddit.user.me()}")

Reddit API authenticated successfully!
Connected as: None


## 4. Define Data Collection Functions

Create functions to download recent posts from a specified subreddit with proper error handling and improvements.

In [41]:
import praw
import pandas as pd
from datetime import datetime
import numpy as np



# Helper function: Extract post attributes safely

def extract_post_data(post, search_query=None):
    """Extracts relevant attributes from a Reddit post."""
    return {
        "title": getattr(post, "title", None),
        "score": getattr(post, "score", None),
        "upvote_ratio": getattr(post, "upvote_ratio", None),
        "num_comments": getattr(post, "num_comments", None),
        "author": str(getattr(post, "author", None)),
        "subreddit": str(getattr(post, "subreddit", None)),
        "url": getattr(post, "url", None),
        "permalink": f"https://www.reddit.com{getattr(post, 'permalink', None)}",
        "created_utc": getattr(post, "created_utc", None),
        "is_self": getattr(post, "is_self", None),
        "selftext": (getattr(post, "selftext", "")[:500] if getattr(post, "selftext", None) else None),
        "flair": getattr(post, "link_flair_text", None),
        "domain": getattr(post, "domain", None),
        "search_query": search_query
    }


# Fetch "Hot" Posts

def fetch_hot_posts(subreddits, limit=50):
    all_posts = []

    for subreddit_name in subreddits:
        subreddit = reddit.subreddit(subreddit_name)
        hot_posts = subreddit.hot(limit=limit)

        count = 0
        for post in hot_posts:
            all_posts.append(extract_post_data(post))
            count += 1

        print(f"Collected {count} posts from r/{subreddit_name}.")

    return all_posts


# Keyword-Based Search

def search_posts(query, subreddits, limit=50):
    search_results = []

    for subreddit_name in subreddits:
        subreddit = reddit.subreddit(subreddit_name)
        results = subreddit.search(query, limit=limit)

        count = 0
        for post in results:
            search_results.append(extract_post_data(post, search_query=query))
            count += 1

        print(f"Searched '{query}' in r/{subreddit_name}: collected {count} posts.")

    return search_results


# Data Export to CSV

def export_to_csv(data, filename="reddit_data.csv"):
    df = pd.DataFrame(data)
    df.drop_duplicates(subset=["permalink"], inplace=True)
    df.replace({np.nan: None}, inplace=True)
    df.to_csv(filename, index=False)
    print(f"\nSaved {len(df)} unique posts to {filename}.")


# Run the collection pipeline

if __name__ == "__main__":
    target_subreddits = ["climate", "environment", "sustainability"]

    # Fetch hot posts
    hot_data = fetch_hot_posts(target_subreddits, limit=50)

    # Search posts by keyword (example: "global warming")
    keyword_data = search_posts("global warming", target_subreddits, limit=50)

    # Combine both datasets
    all_data = hot_data + keyword_data

    # Export to CSV
    export_to_csv(all_data, "reddit_climate_data.csv")


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.

It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Collected 50 posts from r/climate.


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Collected 50 posts from r/environment.


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Collected 50 posts from r/sustainability.


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Searched 'global warming' in r/climate: collected 50 posts.


It is strongly recommended to use Async PRAW: https://asyncpraw.readthedocs.io.
See https://praw.readthedocs.io/en/latest/getting_started/multiple_instances.html#discord-bots-and-asynchronous-environments for more info.



Searched 'global warming' in r/environment: collected 50 posts.
Searched 'global warming' in r/sustainability: collected 50 posts.

Saved 300 unique posts to reddit_climate_data.csv.


## 7. Verify Results

Check the created CSV file and display some sample data using pandas.

In [42]:
import pandas as pd

# Check if file exists and load data
if os.path.exists('/content/reddit_climate_data.csv'):
    # Load the CSV file
    df = pd.read_csv('/content/reddit_climate_data.csv')

    print(f" Dataset Overview:")
    print(f"   Total posts: {len(df)}")
    print(f"   Columns: {list(df.columns)}")
    print(f"   File size: {os.path.getsize('/content/reddit_climate_data.csv')} bytes")

    print(f"\n Sample Posts:")
    print("=" * 50)

    # Display first 3 posts
    for i, row in df.head(3).iterrows():
        print(f"\nPost {i+1}:")
        print(f"Title: {row['title'][:80]}...")
        print(f"Author: {row['author']}")
        print(f"Score: {row['score']}")
        print(f"URL: {row['url']}")
        print("-" * 30)

    # Basic statistics
    print(f"\n Basic Statistics:")
    print(f"   Average score: {df['score'].mean():.2f}")
    print(f"   Highest score: {df['score'].max()}")
    print(f"   Lowest score: {df['score'].min()}")

else:
    print(f"File '{'/content/reddit_climate_data.csv'}' not found!")


 Dataset Overview:
   Total posts: 300
   Columns: ['title', 'score', 'upvote_ratio', 'num_comments', 'author', 'subreddit', 'url', 'permalink', 'created_utc', 'is_self', 'selftext', 'flair', 'domain', 'search_query']
   File size: 138019 bytes

 Sample Posts:

Post 1:
Title: How to get involved with a local group to create the political will for climate ...
Author: silence7
Score: 1499
URL: https://www.reddit.com/r/climate/comments/b49xgi/how_to_get_involved_with_a_local_group_to_create/
------------------------------

Post 2:
Title: Bill Gates Says Climate Change ‘Will Not Lead to Humanity’s Demise’. In a memo, ...
Author: coolbern
Score: 208
URL: https://www.nytimes.com/2025/10/28/climate/bill-gates-climate-change-humanity.html?unlocked_article_code=1.yE8.njbJ.jmwXlb3yEWw-&smid=url-share
------------------------------

Post 3:
Title: US accused of ‘bully-boy’ tactics to sink climate deal | Officials say ‘threats’...
Author: silence7
Score: 396
URL: https://www.ft.com/content/4e0a9a3