# Data Mining
To use praw:
`pip install praw`

## Data Source
Data is gathered from a variety of subreddits to ensure a mix of post topics and audience sizes.

To fetch data I create an instance of the praw Reddit class.
From here I send a request to the reddit API to fetch information from some popular subreddits.
To ensure a wide variety of success levels and avoid "survivorship bias" in the data, I decided to request the 6000 most recent posts, which should cover a wide enough time to include highly successful and less successful posts.

I then parsed the data points I want to analyse into a csv file using the python csv library.

In [15]:
import praw
from dotenv import load_dotenv
import os
import csv

# env variables
load_dotenv()
APP_NAME = os.getenv("APP_NAME")
SECRET = os.getenv("SECRET")
CLIENT_ID = os.getenv("CLIENT_ID")
USERNAME = os.getenv("USERNAME")

USER_AGENT = f"windows:{APP_NAME}:v1.0 (by /u/{USERNAME})"


SUBREDDITS = ["funny", "todayilearned", "technology", "aww", "worldnews", "food", "gaming"]
VARIABLES = ["id", "title", "created_utc", "ups", "downs", "is_video", "selftext", "is_self"]


reddit = praw.Reddit(
    client_id=CLIENT_ID,
    client_secret=SECRET,
    user_agent=USER_AGENT
)

# using csv library to avoid dealing with comma/quotes/other chars that would need to be escaped
# https://stackoverflow.com/questions/2425272/how-to-dynamically-access-class-properties-in-python referenced to fix issue looping through submission object
with open("reddit_data.csv", "w", encoding="utf-8") as f:
    csv_writer = csv.writer(f)
    csv_writer.writerow(VARIABLES)
    for sub in SUBREDDITS:        
        for submission in reddit.subreddit(sub).new(limit=6000):
            data = [getattr(submission, var) for var in VARIABLES]
            csv_writer.writerow(data)
