# RGR Stock Price Forecasting Project - Part 2

Author: Jack Wang

---

## Problem Statement

Stock prices are hard to predict because they are not only affected by the performance of the underlying companies but also the expectations from the general public. As known, the stock price of firearm companies are highly correlated to the public opinions toward gun control. My model intends to predict the stock price of one of the largest firearm company in the states, RGR (Sturm, Ruger & Co., firearm company), by using its historical stock price, public opinions toward gun control, and its financial reports to SEC. 

## Executive Summary

The goal of my projcet is to build a **time series regression model** that predicts the stock price of RGR. The data I am using would be historical stock price from [Yahoo Finance](https://finance.yahoo.com/quote/RGR/history?p=RGR), twitter posts scraped from [twitter](https://twitter.com/), subreddit posts mentioned about gun control, and also the financial reports to [SEC](https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000095029&type=&dateb=&owner=exclude&count=100). I will do sentiment analysis on the text data and time series modeling on the historical stock price data. The model will be evaluated using MSE.

## Content

This project consists of 7 Jupyter notebooks:
- Part-1-stock-price-data
- ***Part-2-twitter-scraper***
- Part-3-twitter-data-cleaning
- Part-4-reddit-data-scraper
- Part-5-reddit-data-cleaning
- Part-6-sec-data-cleaning
- Part-7-modeling-and-evaluation


---


**Since the Twitter API has limitation on requests, I will be using [twitterscraper](https://github.com/taspinar/twitterscraper) created by taspinar on GitHub to collect my Twitter data.**

### Twitter scraper

In [1]:
!pip install twitterscraper



In [2]:
from twitterscraper import query_tweets
import pandas as pd
import datetime, time
import re
from nltk.tokenize import RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

INFO: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201'}


In [21]:
# Keywords I want to search for
query_string = f'"gun ban" OR "gun control" OR "firearm control" OR'\
               f' "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR'\
               f' "firearm-reform" OR "firearm reform" -filter:retweets'

query_string

# initialize a time counter
t0 = time.time()

# set empty lists that we will fill with tweet data
text = []
times = []

# scrape twitter for tweets containing certain keywords 
list_of_tweets = query_tweets(query_string,
                        begindate = datetime.date(2019,12,3), #Set up time range here
                        enddate = datetime.date(2019,12,5),
                        poolsize = 2,
                        lang="en"
                       )

# loop through each tweet to grab data and append the data to their respective lists
for tweet in list_of_tweets:
    text.append(tweet.text)
    times.append(tweet.timestamp)

# build the dataframe
df = pd.DataFrame({
    'tweet': text,
    'time_stamp': times
})

print((time.time()-t0)/60)

INFO: queries: ['"gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2019-12-03 until:2019-12-04', '"gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2019-12-04 until:2019-12-05']
INFO: Querying "gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2019-12-03 until:2019-12-04
INFO: Querying "gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2019-12-04 until:2019-12-05
INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q="gun%20ban"%20OR%20"gun%20cont

INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwLyRyJbTryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-04%20until%3A2019-12-05&l=en
INFO: Using proxy 171.6.199.252:8213
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLb9_fmtriEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3

INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLbplLinriEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 131.161.68.37:31264
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCgLeh69DIryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%

INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaGwL3pob-9ryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-04%20until%3A2019-12-05&l=en
INFO: Using proxy 46.63.71.13:8080
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLbRzuSfriEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Ar

INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLvl_8mXriEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 213.6.101.174:23500
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLuhkc-zryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%

INFO: Using proxy 110.74.208.154:21776
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwL6dg_WOriEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 94.236.198.183:41258
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaKwL6F1qirryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 50.192.195.69:52018
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCgKfRvaWIriEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 50.192.195.69:52018
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwKbVvq6gryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform

INFO: Using proxy 122.102.41.82:55783
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwL6Z8_KXryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-04%20until%3A2019-12-05&l=en
INFO: Using proxy 216.228.69.202:32170
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLuRoIX_rSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refor

INFO: Using proxy 103.198.34.164:33247
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaGwL2V75r3rSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 103.198.34.164:33247
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaEwL7t8IaLryEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 159.224.37.181:43462
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLXB95jrriEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-04%20until%3A2019-12-05&l=en
INFO: Using proxy 46.146.203.124:42031
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaEwL3F0bPsrSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 154.72.202.62:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwKDhwOPbrSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 154.72.202.62:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLSlzsDUriEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform

INFO: Using proxy 95.67.65.18:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCgLeNm73KriEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-04%20until%3A2019-12-05&l=en
INFO: Using proxy 202.62.51.24:8080
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLWZheC7rSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20

INFO: Using proxy 197.211.245.50:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwKPRz5-vrSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 197.211.245.50:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLyh2pC_riEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 95.158.153.69:49753
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLzN9_e1riEWgsC00YHb2a8hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-04%20until%3A2019-12-05&l=en
INFO: Using proxy 154.117.181.34:38975
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCgLaN7bikrSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refor

INFO: Using proxy 202.150.139.46:60740
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgKaJ1P2WrSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2019-12-03%20until%3A2019-12-04&l=en
INFO: Using proxy 157.245.66.20:3128
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwKP1hIWWrSEWgMCmofThtK4hEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform

1.6759276151657105


In [22]:
df = df.drop_duplicates()

# remove any twitter pic urls
df['tweet'] = [re.sub(r'pic.twitter.com\S+', '', post).strip() for post in df['tweet']]

# remove any http urls
df['tweet'] = [re.sub(r'http\S+', '', post).strip() for post in df['tweet']]

# instatiate the tokenizer
tknr = RegexpTokenizer(r'[a-zA-Z&0-9]+')

# start with empty lists
tokens = []

# fill the list with tokenized versions of each post title
for post in df['tweet']:
    tokens.append(" ".join(tknr.tokenize(post.lower())))
df['tweet'] = tokens

# add a word count column
df['tweet_word_count'] = df['tweet'].apply(lambda post: len(post.split()))

# compound score added
sia = SentimentIntensityAnalyzer()

# create function to return compound score
def get_compound(text):
    return sia.polarity_scores(text)['compound']

# add compound score features for title and tac column
df['compound'] = df['tweet'].map(lambda x : get_compound(x))

df = df.sort_values(by = 'time_stamp')

In [25]:
df.shape

(2, 5)

In [8]:
# save the dataframe file
df.to_csv('../data/twitter_2019_12.csv', index=False)