In this project, I plan on calculating the average sentiment scores on a monthly basis, particularly the ones leading up to the immense rise in the GME stock. GME is the abbreviation for the Gamestop stock that rose in early 2021. I want to see if the estimated sentiment values of all the posts with the word GME indicate the gradual rise in the stock value over time.

In [None]:
from bs4 import BeautifulSoup
import requests

Creating a new API for subreddits.

In [None]:
%pip install psaw

import pandas as pd
pd.set_option('max_colwidth', 500)
pd.set_option('max_columns', 50)

from psaw import PushshiftAPI

# Initialize PushShift
api = PushshiftAPI()

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting psaw
  Downloading psaw-0.1.0-py3-none-any.whl (15 kB)
Installing collected packages: psaw
Successfully installed psaw-0.1.0


To work with this data set we also need to make the date values able to be manipulated.

In [None]:
import datetime as dt
start_Jan1 = int(dt.datetime(2021, 1, 1).timestamp())
end_Jan1 = int(dt.datetime(2021, 1, 7).timestamp())

GMEJan1_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)',
        subreddit = "WallStreetBets", after = start_Jan1, before = end_Jan1)

In [None]:
GMEJan1_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan1_generator])
GMEJan1_Submissions.columns

Index(['all_awardings', 'allow_live_comments', 'author',
       'author_flair_css_class', 'author_flair_richtext', 'author_flair_text',
       'author_flair_type', 'author_fullname', 'author_patreon_flair',
       'author_premium', 'awarders', 'can_mod_post', 'contest_mode',
       'created_utc', 'domain', 'full_link', 'gildings', 'id',
       'is_crosspostable', 'is_meta', 'is_original_content',
       'is_reddit_media_domain', 'is_robot_indexable', 'is_self', 'is_video',
       'link_flair_background_color', 'link_flair_css_class',
       'link_flair_richtext', 'link_flair_template_id', 'link_flair_text',
       'link_flair_text_color', 'link_flair_type', 'locked', 'media_only',
       'no_follow', 'num_comments', 'num_crossposts', 'over_18',
       'parent_whitelist_status', 'permalink', 'pinned', 'post_hint',
       'preview', 'pwls', 'removed_by_category', 'retrieved_on', 'score',
       'selftext', 'send_replies', 'spoiler', 'stickied', 'subreddit',
       'subreddit_id', 'subred

Now we import the VADER package to calculate sentiment values.

In [None]:
import nltk
nltk.download('vader_lexicon')

from nltk.sentiment.vader import SentimentIntensityAnalyzer
SIA = SentimentIntensityAnalyzer()

[nltk_data] Downloading package vader_lexicon to /root/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


Now that we've created the VADER analyzer, we can implement it.

In [None]:
final_df = pd.DataFrame(columns = ["Week", "Sentiment_Value"])
final_dfcomments = pd.DataFrame(columns = ["Week", "Sentiment_Value"])
final_dfaverage = pd.DataFrame(columns = ["Week", "Sentiment_Value"])

In [None]:
from tkinter.constants import TRUE
from numpy.lib.function_base import average
from nltk.classify.textcat import maxsize

def calculate_sentiment(text):
    # Run VADER on the text
    scores = SIA.polarity_scores(text)
    # Extract the compound score
    compound_score = scores['compound']
    # Return compound score
    return compound_score

GMEJan1_Submissions['selftext'] = GMEJan1_Submissions['selftext'].astype(str)
GMEJan1_Submissions['sentiment_score'] = GMEJan1_Submissions['selftext'].apply(calculate_sentiment)

GMEJan1_average = average(GMEJan1_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 1: Jan 1-7 2021", "Sentiment_Value": GMEJan1_average}, ignore_index = TRUE)

Now we repeat the process for every week that we want to account for, starting with January.

In [None]:
# January 8-14
start_Jan2 = int(dt.datetime(2021, 1, 8).timestamp())
end_Jan2 = int(dt.datetime(2021, 1, 14).timestamp())

GMEJan2_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)',
        subreddit = "WallStreetBets", after = start_Jan2, before = end_Jan2)

GMEJan2_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan2_generator])

GMEJan2_Submissions['selftext'] = GMEJan2_Submissions['selftext'].astype(str)

GMEJan2_Submissions['sentiment_score'] = GMEJan2_Submissions['selftext'].apply(calculate_sentiment)

GMEJan2_average = average(GMEJan2_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 2: Jan 8-14 2021", "Sentiment_Value": GMEJan2_average}, ignore_index = TRUE)

# January 15-21
start_Jan3 = int(dt.datetime(2021, 1, 15).timestamp())
end_Jan3 = int(dt.datetime(2021, 1, 21).timestamp())

GMEJan3_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)',
        subreddit = "WallStreetBets", after = start_Jan3, before = end_Jan3)

GMEJan3_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan3_generator])

GMEJan3_Submissions['selftext'] = GMEJan3_Submissions['selftext'].astype(str)

GMEJan3_Submissions['sentiment_score'] = GMEJan3_Submissions['selftext'].apply(calculate_sentiment)

GMEJan3_average = average(GMEJan3_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 3: Jan 15-21 2021", "Sentiment_Value": GMEJan3_average}, ignore_index = TRUE)

# January 22-31
start_Jan4 = int(dt.datetime(2021, 1, 22).timestamp())
end_Jan4 = int(dt.datetime(2021, 1, 31).timestamp())

GMEJan4_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', score = ">1000",
        subreddit = "WallStreetBets", after = start_Jan4, before = end_Jan4)

GMEJan4_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan4_generator])

GMEJan4_Submissions['selftext'] = GMEJan4_Submissions['selftext'].astype(str)

GMEJan4_Submissions['sentiment_score'] = GMEJan4_Submissions['selftext'].apply(calculate_sentiment)

GMEJan4_average = average(GMEJan4_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 4: Jan 22-31 2021", "Sentiment_Value": GMEJan4_average}, ignore_index = TRUE)



Repeat the same process for February.

In [None]:
# February 1-7
start_Feb1 = int(dt.datetime(2021, 2, 1).timestamp())
end_Feb1 = int(dt.datetime(2021, 2, 7).timestamp())

GMEFeb1_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Feb1, before = end_Feb1)
GMEFeb1_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb1_generator])

GMEFeb1_Submissions['selftext'] = GMEFeb1_Submissions['selftext'].astype(str)

GMEFeb1_Submissions['sentiment_score'] = GMEFeb1_Submissions['selftext'].apply(calculate_sentiment)

GMEFeb1_average = average(GMEFeb1_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 5: Feb 1-7 2021", "Sentiment_Value": GMEFeb1_average}, ignore_index=TRUE)

# February 8-14
start_Feb2 = int(dt.datetime(2021, 2, 8).timestamp())
end_Feb2 = int(dt.datetime(2021, 2, 14).timestamp())

GMEFeb2_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Feb2, before = end_Feb2)
GMEFeb2_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb2_generator])

GMEFeb2_Submissions['selftext'] = GMEFeb2_Submissions['selftext'].astype(str)

GMEFeb2_Submissions['sentiment_score'] = GMEFeb2_Submissions['selftext'].apply(calculate_sentiment)

GMEFeb2_average = average(GMEFeb2_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 6: Feb 8-14 2021", "Sentiment_Value": GMEFeb2_average}, ignore_index=TRUE)

# February 15-21
start_Feb3 = int(dt.datetime(2021, 2, 15).timestamp())
end_Feb3 = int(dt.datetime(2021, 2, 21).timestamp())

GMEFeb3_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                                           after = start_Feb3, before = end_Feb3)
GMEFeb3_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb3_generator])

GMEFeb3_Submissions['selftext'] = GMEFeb3_Submissions['selftext'].astype(str)

GMEFeb3_Submissions['sentiment_score'] = GMEFeb3_Submissions['selftext'].apply(calculate_sentiment)

GMEFeb3_average = average(GMEFeb3_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 7: Feb 15-21 2021", "Sentiment_Value": GMEFeb3_average}, ignore_index=TRUE)

# February 22-28
start_Feb4 = int(dt.datetime(2021, 2, 22).timestamp())
end_Feb4 = int(dt.datetime(2021, 2, 28).timestamp())

GMEFeb4_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Feb4, before = end_Feb4)
GMEFeb4_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb4_generator])

GMEFeb4_Submissions['selftext'] = GMEFeb4_Submissions['selftext'].astype(str)

GMEFeb4_Submissions['sentiment_score'] = GMEFeb4_Submissions['selftext'].apply(calculate_sentiment)

GMEFeb4_average = average(GMEFeb4_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 8: Feb 22-28 2021", "Sentiment_Value": GMEFeb4_average}, ignore_index=TRUE)

Repeat weekly again for March.

In [None]:
# March 1-7
start_Mar1 = int(dt.datetime(2021, 3, 1).timestamp())
end_Mar1 = int(dt.datetime(2021, 3, 7).timestamp())

GMEMar1_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Mar1, before = end_Mar1)
GMEMar1_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar1_generator])

GMEMar1_Submissions['selftext'] = GMEMar1_Submissions['selftext'].astype(str)

GMEMar1_Submissions['sentiment_score'] = GMEMar1_Submissions['selftext'].apply(calculate_sentiment)

GMEMar1_average = average(GMEMar1_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 9: March 1-7 2021", "Sentiment_Value": GMEMar1_average}, ignore_index=TRUE)

# March 8-14
start_Mar2 = int(dt.datetime(2021, 3, 8).timestamp())
end_Mar2 = int(dt.datetime(2021, 3, 14).timestamp())

GMEMar2_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                                           after = start_Mar2, before = end_Mar2)
GMEMar2_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar2_generator])

GMEMar2_Submissions['selftext'] = GMEMar2_Submissions['selftext'].astype(str)

GMEMar2_Submissions['sentiment_score'] = GMEMar2_Submissions['selftext'].apply(calculate_sentiment)

GMEMar2_average = average(GMEMar2_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 10: March 8-14 2021", "Sentiment_Value": GMEMar2_average}, ignore_index=TRUE)

# March 15-21
start_Mar3 = int(dt.datetime(2021, 3, 15).timestamp())
end_Mar3 = int(dt.datetime(2021, 3, 21).timestamp())

GMEMar3_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                                           after = start_Mar3, before = end_Mar3)
GMEMar3_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar3_generator])

GMEMar3_Submissions['selftext'] = GMEMar3_Submissions['selftext'].astype(str)

GMEMar3_Submissions['sentiment_score'] = GMEMar3_Submissions['selftext'].apply(calculate_sentiment)

GMEMar3_average = average(GMEMar3_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 11: March 15-21 2021", "Sentiment_Value": GMEMar3_average}, ignore_index=TRUE)

# March 22-31
start_Mar4 = int(dt.datetime(2021, 3, 1).timestamp())
end_Mar4 = int(dt.datetime(2021, 3, 31).timestamp())

GMEMar4_generator = api.search_submissions(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Mar4, before = end_Mar4)
GMEMar4_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar4_generator])

GMEMar4_Submissions['selftext'] = GMEMar4_Submissions['selftext'].astype(str)

GMEMar4_Submissions['sentiment_score'] = GMEMar4_Submissions['selftext'].apply(calculate_sentiment)

GMEMar4_average = average(GMEMar4_Submissions["sentiment_score"])
final_df = final_df.append({"Week": "Week 12: March 22-31 2021", "Sentiment_Value": GMEMar4_average}, ignore_index=TRUE)

We can now repeat this with the comments on the subreddit. The process itself is the exact same. The only difference lies within the search function as "api.search_comments" and the "selftext" section being replaced with "body". "Body" is the name of the text section for comments.

In [None]:
# January 1-7
GMEJan1_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                              after = start_Jan1, before = end_Jan1)
GMEJan1_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan1_generator])

GMEJan1_Submissions['body'] = GMEJan1_Submissions['body'].astype(str)

GMEJan1_Submissions['sentiment_score'] = GMEJan1_Submissions['body'].apply(calculate_sentiment)

GMEJan1_average = average(GMEJan1_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 1: Jan 1-7 2021", "Sentiment_Value": GMEFeb1_average}, ignore_index=TRUE)

# January 8-14
start_Jan2 = int(dt.datetime(2021, 1, 8).timestamp())
end_Jan2 = int(dt.datetime(2021, 1, 14).timestamp())

GMEJan2_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)',
        subreddit = "WallStreetBets", after = start_Jan2, before = end_Jan2)

GMEJan2_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan2_generator])

GMEJan2_Submissions['body'] = GMEJan2_Submissions['body'].astype(str)

GMEJan2_Submissions['sentiment_score'] = GMEJan2_Submissions['body'].apply(calculate_sentiment)

GMEJan2_average = average(GMEJan2_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 2: Jan 8-14 2021", "Sentiment_Value": GMEJan2_average}, ignore_index = TRUE)

# January 15-21
start_Jan3 = int(dt.datetime(2021, 1, 15).timestamp())
end_Jan3 = int(dt.datetime(2021, 1, 21).timestamp())

GMEJan3_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)',
        subreddit = "WallStreetBets", after = start_Jan3, before = end_Jan3)

GMEJan3_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan3_generator])

GMEJan3_Submissions['body'] = GMEJan3_Submissions['body'].astype(str)

GMEJan3_Submissions['sentiment_score'] = GMEJan3_Submissions['body'].apply(calculate_sentiment)

GMEJan3_average = average(GMEJan3_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 3: Jan 15-21 2021", "Sentiment_Value": GMEJan3_average}, ignore_index = TRUE)

# January 22-31
start_Jan4 = int(dt.datetime(2021, 1, 22).timestamp())
end_Jan4 = int(dt.datetime(2021, 1, 31).timestamp())

GMEJan4_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', score = ">1000",
        subreddit = "WallStreetBets", after = start_Jan4, before = end_Jan4)

GMEJan4_Submissions = pd.DataFrame([submission.d_ for submission in GMEJan4_generator])

GMEJan4_Submissions['body'] = GMEJan4_Submissions['body'].astype(str)

GMEJan4_Submissions['sentiment_score'] = GMEJan4_Submissions['body'].apply(calculate_sentiment)

GMEJan4_average = average(GMEJan4_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 4: Jan 22-31 2021", "Sentiment_Value": GMEJan4_average}, ignore_index = TRUE)



In [None]:
# February 1-7
GMEFeb1_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                             score = ">1000", after = start_Feb1, before = end_Feb1)
GMEFeb1_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb1_generator])

GMEFeb1_Submissions['body'] = GMEFeb1_Submissions['body'].astype(str)

GMEFeb1_Submissions['sentiment_score'] = GMEFeb1_Submissions['body'].apply(calculate_sentiment)

GMEFeb1_average = average(GMEFeb1_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 5: Feb 1-7 2021", "Sentiment_Value": GMEFeb1_average}, ignore_index=TRUE)

# February 8-14
GMEFeb2_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                             score = ">1000", after = start_Feb2, before = end_Feb2)
GMEFeb2_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb2_generator])

GMEFeb2_Submissions['body'] = GMEFeb2_Submissions['body'].astype(str)

GMEFeb2_Submissions['sentiment_score'] = GMEFeb2_Submissions['body'].apply(calculate_sentiment)

GMEFeb2_average = average(GMEFeb2_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 6: Feb 8-14 2021", "Sentiment_Value": GMEFeb2_average}, ignore_index=TRUE)

# February 15-21
GMEFeb3_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                                           after = start_Feb3, before = end_Feb3)
GMEFeb3_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb3_generator])

GMEFeb3_Submissions['body'] = GMEFeb3_Submissions['body'].astype(str)

GMEFeb3_Submissions['sentiment_score'] = GMEFeb3_Submissions['body'].apply(calculate_sentiment)

GMEFeb3_average = average(GMEFeb3_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 7: Feb 15-21 2021", "Sentiment_Value": GMEFeb3_average}, ignore_index=TRUE)

# February 22-28
start_Feb4 = int(dt.datetime(2021, 2, 22).timestamp())
end_Feb4 = int(dt.datetime(2021, 2, 28).timestamp())

GMEFeb4_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                             score = ">1000", after = start_Feb4, before = end_Feb4)
GMEFeb4_Submissions = pd.DataFrame([submission.d_ for submission in GMEFeb4_generator])

GMEFeb4_Submissions['body'] = GMEFeb4_Submissions['body'].astype(str)

GMEFeb4_Submissions['sentiment_score'] = GMEFeb4_Submissions['body'].apply(calculate_sentiment)

GMEFeb4_average = average(GMEFeb4_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 8: Feb 22-28 2021", "Sentiment_Value": GMEFeb4_average}, ignore_index=TRUE)

In [None]:
# March 1-7
GMEMar1_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Mar1, before = end_Mar1)
GMEMar1_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar1_generator])

GMEMar1_Submissions['body'] = GMEMar1_Submissions['body'].astype(str)

GMEMar1_Submissions['sentiment_score'] = GMEMar1_Submissions['body'].apply(calculate_sentiment)

GMEMar1_average = average(GMEMar1_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 9: March 1-7 2021", "Sentiment_Value": GMEMar1_average}, ignore_index=TRUE)

# March 8-14
start_Mar2 = int(dt.datetime(2021, 3, 8).timestamp())
end_Mar2 = int(dt.datetime(2021, 3, 14).timestamp())

GMEMar2_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets", score = ">1000",
                                           after = start_Mar2, before = end_Mar2)
GMEMar2_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar2_generator])

GMEMar2_Submissions['body'] = GMEMar2_Submissions['body'].astype(str)

GMEMar2_Submissions['sentiment_score'] = GMEMar2_Submissions['body'].apply(calculate_sentiment)

GMEMar2_average = average(GMEMar2_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 10: March 8-14 2021", "Sentiment_Value": GMEMar2_average}, ignore_index=TRUE)

# March 15-21
start_Mar3 = int(dt.datetime(2021, 3, 15).timestamp())
end_Mar3 = int(dt.datetime(2021, 3, 21).timestamp())

GMEMar3_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                                           after = start_Mar3, before = end_Mar3)
GMEMar3_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar3_generator])

GMEMar3_Submissions['body'] = GMEMar3_Submissions['body'].astype(str)

GMEMar3_Submissions['sentiment_score'] = GMEMar3_Submissions['body'].apply(calculate_sentiment)

GMEMar3_average = average(GMEMar3_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 11: March 15-21 2021", "Sentiment_Value": GMEMar3_average}, ignore_index=TRUE)

# March 22-31
start_Mar4 = int(dt.datetime(2021, 3, 22).timestamp())
end_Mar4 = int(dt.datetime(2021, 3, 31).timestamp())

GMEMar4_generator = api.search_comments(q = '(GME)|(GameStop)|(Gamestop)|(gamestop)', subreddit = "WallStreetBets",
                                           after = start_Mar4, before = end_Mar4)
GMEMar4_Submissions = pd.DataFrame([submission.d_ for submission in GMEMar4_generator])

GMEMar4_Submissions['body'] = GMEMar4_Submissions['body'].astype(str)

GMEMar4_Submissions['sentiment_score'] = GMEMar4_Submissions['body'].apply(calculate_sentiment)

GMEMar4_average = average(GMEMar4_Submissions["sentiment_score"])
final_dfcomments = final_dfcomments.append({"Week": "Week 12: March 22-31 2021", "Sentiment_Value": GMEMar4_average}, ignore_index=TRUE)

In [None]:
final_df

Unnamed: 0,Week,Sentiment_Value
0,Week 1: Jan 1-7 2021,0.100271
1,Week 2: Jan 8-14 2021,0.056268
2,Week 3: Jan 15-21 2021,0.084896
3,Week 4: Jan 22-31 2021,-0.000548
4,Week 5: Feb 1-7 2021,0.059575
5,Week 6: Feb 8-14 2021,0.145467
6,Week 7: Feb 15-21 2021,0.0955
7,Week 8: Feb 22-28 2021,0.178414
8,Week 9: March 1-7 2021,0.205091
9,Week 10: March 8-14 2021,0.098718


In [None]:
final_dfcomments

Unnamed: 0,Week,Sentiment_Value
0,Week 1: Jan 1-7 2021,0.059575
1,Week 2: Jan 8-14 2021,0.089548
2,Week 3: Jan 15-21 2021,0.095264
3,Week 4: Jan 22-31 2021,0.027579
4,Week 5: Feb 1-7 2021,0.071215
5,Week 6: Feb 8-14 2021,0.149867
6,Week 7: Feb 15-21 2021,0.055861
7,Week 8: Feb 22-28 2021,0.041493
8,Week 9: March 1-7 2021,0.073942
9,Week 10: March 8-14 2021,-0.257867


In [None]:
final_dfaverage = final_df.merge(final_dfcomments, on = "Week")
final_dfaverage["Average_Sentiment"] = final_dfaverage[["Sentiment_Value_x", "Sentiment_Value_y"]].mean(axis = 1)

stock_value = ["4.40", "8.88", "16.25", "81.25", "15.94", "13.10", "10.15", "25.44", "34.44", "66.13", "50.07", "45.25"]
final_dfaverage["Stock Value"] = stock_value

final_dfaverage = final_dfaverage.rename(columns={"Sentiment_Value_x":"Sentiment_Value_Submissions", "Sentiment_Value_y":"Sentiment_Value_Comments"})

In [None]:
final_dfaverage

Unnamed: 0,Week,Sentiment_Value_Submissions,Sentiment_Value_Comments,Average_Sentiment,Stock Value
0,Week 1: Jan 1-7 2021,0.100271,0.059575,0.079923,4.4
1,Week 2: Jan 8-14 2021,0.056268,0.089548,0.072908,8.88
2,Week 3: Jan 15-21 2021,0.084896,0.095264,0.09008,16.25
3,Week 4: Jan 22-31 2021,-0.000548,0.027579,0.013516,81.25
4,Week 5: Feb 1-7 2021,0.059575,0.071215,0.065395,15.94
5,Week 6: Feb 8-14 2021,0.145467,0.149867,0.147667,13.1
6,Week 7: Feb 15-21 2021,0.0955,0.055861,0.07568,10.15
7,Week 8: Feb 22-28 2021,0.178414,0.041493,0.109954,25.44
8,Week 9: March 1-7 2021,0.205091,0.073942,0.139517,34.44
9,Week 10: March 8-14 2021,0.098718,-0.257867,-0.079575,66.13


From what I have up until this point, I can see that there is indeed a gradual rise in the Sentiment Value between the months of January of 2021 and March 2021. There are a few issues that do not allow me to compile the rest of the Sentiment Values further on in the year, but I believe that is due to the amount of data that I took off of Reddit at a period of time. I hope to solve this issue as I complete this project. However, this issue does not stand in the way of the fact that there does seem to be at least a little bit of a correlation between the stock market value and the sentiment value.