### Aim

* The code first aims to collect the average comments per post of all the subreddits (listed below)
* Then the data is compared with other subreddits in it's own niche
* An average of all the comments per post of each subreddit in it's niche is also calculated and compared with appropriate graphs 

### Niches

In [1]:
subreddits = [
    # Technology
    "programming",
    "machinelearning",
    "technology",
    # Gaming
    "gaming",
    "Minecraft",
    "leagueoflegends",
    # Fitness & Health
    "fitness",
    "running",
    "nutrition",
    ]

Limit of posts to view in each subreddit. Increasing this value would increase accuracy

In [2]:
post_limit = 100

Currently there are 3 subreddits per niche as well as 3 niches only. While this is small, in the future more subreddits and more niches will be added.

### Imports

Praw is reddit's official API, and will be used to collect the necassary data from each subreddit

In [3]:
import praw

This will be used to get the enviromental variables from the ".env" file saved in the root directory

In [4]:
from dotenv import load_dotenv
import os

Pandas is used to convert the dats into a CSV file

In [5]:
import pandas as pd

The client_id, client_secret, and user agent has been saved in .env. And using dotenv will be loaded in to the script. 

In [6]:
load_dotenv()

True

Accessing the variables

In [7]:
client_id = os.getenv("CLIENT_ID")
client_secret = os.getenv("CLIENT_SECRET")
username = os.getenv("USERNAME")

Creating main object, using .env variables

In [8]:
reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    user_agent=f"SubredditActivityComparator by /u/{username}",
)

## Getting and storing the data

The data would be stored in a list full of dictionaries. This is a good way to visualise it:

| Subreddit | Total Comments | Comments / Post |
|----------|----------|----------|
| technology    | ...  | ...  |
| gaming    | ...  | ...  |

The code below loops through each subreddit, collects the total comments and comments per post and saves them in the data list.


In [9]:
data = [] # List storing all the subreddit data

for sub in subreddits: # Loop through each subreddit
    subreddit = reddit.subreddit(sub)
    
    posts = list(subreddit.new(limit=post_limit))
    
    total_posts = len(posts)
    total_comments = sum(post.num_comments for post in posts)
    avg_comments = total_comments / total_posts if total_posts > 0 else 0  # calculate avg_comments by dividing the total comments with the total posts
    
    # Append the dictionary to the data list
    data.append({
        "Subreddit": subreddit,
        "Posts Scanned": post_limit,
        "Total Comments": total_comments,
        "Avgerage Comments": avg_comments,
    })

Convert the list to a data frame

In [10]:
df = pd.DataFrame(data)

Export to CSV

In [11]:
df.to_csv("../data/raw/subreddit_stats.csv", index=False)