## Business Case
My Business case is a particular game company wishes to investigate the popularity of Epic Games Store. The video game company plans to use the results of the investigation to inform future business decisions (whether they will make an exclusivity deal with Epic Games Store for their upcoming title). We approached the problem by analysing sentiments of posts in the Epic Games Store subreddit (r/EpicGamesPC: https://www.reddit.com/r/EpicGamesPC/ ).

In [1]:
import datetime

import praw
import pandas as pd

from keys import client_id, client_secret

### Collecting the posts for our topic

Initializing a Reddit Instance

In [2]:
reddit = praw.Reddit( client_id=client_id,
                      client_secret=client_secret,
                      user_agent='web-app:sentimentAnalysis:v1 (by /u/sakyawira)')

Calling the API and building a dataframe from it

In [3]:
EGS = reddit.subreddit('EpicGamesPC')

# Select the top 500 posts, with their title, URL, body, upvotes, timestamp, 
# and an index that serves as a key between the posts and the comments we collect later.
posts = []
for index, post in enumerate(EGS.top(limit=500)):
    posts.append([post.title, "https://www.reddit.com" + post.permalink, post.selftext, post.score, post.created_utc, index])

# Convert to DataFrame
posts = pd.DataFrame(posts, columns=['Title', 'URL', 'Body', 'Upvotes', 'Time', 'Key'])

# Convert from UTC to standard timestamp
posts.Time = posts.Time.apply(lambda x: pd.to_datetime(datetime.datetime.fromtimestamp(x)))

# The first post is a sticky, so we can drop it
posts = posts.iloc[1:]

In [4]:
posts.head(3)

Unnamed: 0,Title,URL,Body,Upvotes,Time,Key
1,That time of the week again boys,https://www.reddit.com/r/EpicGamesPC/comments/...,,1050,2020-05-22 04:34:25,1
2,Bonjour 🐻,https://www.reddit.com/r/EpicGamesPC/comments/...,,853,2020-08-06 17:03:10,2
3,"It's been a week, and I'm impatient",https://www.reddit.com/r/EpicGamesPC/comments/...,,823,2020-06-26 03:31:24,3


In [5]:
posts.shape

(499, 6)

### Collecting the comments for each of our posts

We want to get all the comments for the posts we collected

In [6]:
def collect_replies(key, url):
    ''' 
    params pandas series row: each row of the dataframe we built above in the form of a panda series
    Returns a pandas DataFrame, where each row represents an individual comment
    '''
    submission = reddit.submission(url=url)
    submission.comments.replace_more(limit=None)
    comment_queue = submission.comments[:] 

    table = {'Reply':[], 'Upvote':[], 'Time':[], 'Key':[]}

    while comment_queue:
        comment = comment_queue.pop(0)
        table['Reply'].append(comment.body)
        table['Time'].append(comment.created_utc)
        table['Upvote'].append(comment.score)
        table['Key'].append(key)
        comment_queue.extend(comment.replies)
    
    return pd.DataFrame.from_dict(table)

Now the function has been defined, we create our dataframe of comments. Using list comprehensions will speed things up slightly

In [7]:
# Generate a list of tuples that contains the Key and URL for each row - 
# the first value of the tuple is the Key, and the second value is the URL

keys = posts.Key.tolist()
urls = posts.URL.tolist()
tupules = list(zip(keys, urls))

# Generate 'Comments' data-frame using list comprehensions
comments = pd.concat([collect_replies(x[0], x[1]) for x in tupules])

In [8]:
# Convert from UTC to standard timestamp
comments.Time = comments.Time.apply(lambda x: pd.to_datetime(datetime.datetime.fromtimestamp(x)))

In [9]:
comments.head(3)

Unnamed: 0,Reply,Upvote,Time,Key
0,"I just did my first purchase from epic, I got ...",24.0,2020-05-22 12:37:51,1.0
1,Actually i already bought 3 games on epic:\nTh...,8.0,2020-05-22 12:45:06,1.0
2,me who got the 80th game for free :),7.0,2020-05-22 14:11:36,1.0


In [10]:
comments.shape

(24781, 4)

In [11]:
comments.to_csv('CommentsEpic.csv', index=False)

In [12]:
posts.to_csv('PostsEpic.csv', index=False)