# RGR Stock Price Forecasting Project - Part 2

Author: Jack Wang

---

## Problem Statement

Stock prices are hard to predict because they are not only affected by the performance of the underlying companies but also the expectations from the general public. As known, the stock price of firearm companies are highly correlated to the public opinions toward gun control. My model intends to predict the stock price of one of the largest firearm company in the states, RGR (Sturm, Ruger & Co., firearm company), by using its historical stock price, public opinions toward gun control, and its financial reports to SEC. 

## Executive Summary

The goal of my projcet is to build a **time series regression model** that predicts the stock price of RGR. The data I am using would be historical stock price from [Yahoo Finance](https://finance.yahoo.com/quote/RGR/history?p=RGR), twitter posts scraped from [twitter](https://twitter.com/), subreddit posts mentioned about gun control, and also the financial reports to [SEC](https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000095029&type=&dateb=&owner=exclude&count=100). I will do sentiment analysis on the text data and time series modeling on the historical stock price data. The model will be evaluated using MSE.

## Content

This project consists of 7 Jupyter notebooks:
- Part-1-stock-price-data
- ***Part-2-twitter-scraper***
- Part-3-twitter-data-cleaning
- Part-4-reddit-data-scraper
- Part-5-reddit-data-cleaning
- Part-6-sec-data-cleaning
- Part-7-modeling-and-evaluation


---


**Since the Twitter API has limitation on requests, I will be using [twitterscraper](https://github.com/taspinar/twitterscraper) created by taspinar on GitHub to collect my Twitter data.**

### Twitter scraper

In [1]:
!pip install twitterscraper

In [2]:
from twitterscraper import query_tweets
import pandas as pd
import datetime, time
import re
from nltk.tokenize import RegexpTokenizer
from nltk.sentiment.vader import SentimentIntensityAnalyzer

INFO: {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:2.2) Gecko/20110201'}


In [3]:
# Keywords I want to search for
query_string = f'"gun ban" OR "gun control" OR "firearm control" OR'\
               f' "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR'\
               f' "firearm-reform" OR "firearm reform" -filter:retweets'

query_string

In [5]:
# initialize a time counter
t0 = time.time()

# set empty lists that we will fill with tweet data
text = []
times = []

# scrape twitter for tweets containing certain keywords 
list_of_tweets = query_tweets(query_string,
                        begindate = datetime.date(2019,11,1), #Set up time range here
                        enddate = datetime.date(2019,12,1),
                        poolsize = 2,
                        lang="en"
                       )

# loop through each tweet to grab data and append the data to their respective lists
for tweet in list_of_tweets:
    text.append(tweet.text)
    times.append(tweet.timestamp)

# build the dataframe
df = pd.DataFrame({
    'tweet': text,
    'time_stamp': times
})

print((time.time()-t0)/60)

INFO: queries: ['"gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2016-11-10 until:2016-11-11', '"gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2016-11-11 until:2016-11-12']
INFO: Querying "gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2016-11-11 until:2016-11-12
INFO: Querying "gun ban" OR "gun control" OR "firearm control" OR "gun-control" OR "firearm-control" OR "gun reform" OR "gun-reform" OR "firearm-reform" OR "firearm reform" -filter:retweets since:2016-11-10 until:2016-11-11
INFO: Scraping tweets from https://twitter.com/search?f=tweets&vertical=default&q="gun%20ban"%20OR%20"gun%20cont

INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLvpgZ_4jhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-10%20until%3A2016-11-11&l=en
INFO: Using proxy 188.163.170.130:41209
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLLd796XkBYWgMC79dXkqJAWEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filte

INFO: Using proxy 159.224.220.63:44299
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCgLzdqez_jxYWgMC79dXkqJAWEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-11%20until%3A2016-11-12&l=en
INFO: Using proxy 159.224.220.63:44299
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgKnJkLLojhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 88.247.10.31:8080
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLjNzajbjhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-10%20until%3A2016-11-11&l=en
INFO: Using proxy 114.134.191.194:31867
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLy5hLjujxYWgMC79dXkqJAWEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform

INFO: Using proxy 88.118.134.214:38662
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaIgLKJ7YzfjxYWgMC79dXkqJAWEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-11%20until%3A2016-11-12&l=en
INFO: Using proxy 88.118.134.214:38662
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaSwLvFzLDOjhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 203.172.185.122:55482
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLi979LEjhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-10%20until%3A2016-11-11&l=en
INFO: Using proxy 194.183.168.129:31385
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLKxj8WnjxYWgMC79dXkqJAWEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-re

INFO: Using proxy 109.74.142.138:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaCwLD98NiSjxYWgMC79dXkqJAWEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-11%20until%3A2016-11-12&l=en
INFO: Using proxy 109.74.142.138:53281
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLDl8oOqjhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-refo

INFO: Using proxy 103.9.190.206:42726
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwKPJgfaHjhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-10%20until%3A2016-11-11&l=en
INFO: Using proxy 117.54.250.58:38508
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLT18YSFjhYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform

INFO: Using proxy 158.69.62.238:3128
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAgLy1ibDrjRYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20OR%20"firearm%20reform"%20-filter%3Aretweets%20since%3A2016-11-10%20until%3A2016-11-11&l=en
INFO: Using proxy 34.68.165.19:80
INFO: Scraping tweets from https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaEwLeV9O_ojRYWgIC0ufWThI8WEjUAFQAlAFUAFQAA&q="gun%20ban"%20OR%20"gun%20control"%20OR%20"firearm%20control"%20OR%20"gun-control"%20OR%20"firearm-control"%20OR%20"gun%20reform"%20OR%20"gun-reform"%20OR%20"firearm-reform"%20O

1.279092220465342


In [8]:
df = df.drop_duplicates()

In [9]:
# remove any twitter pic urls
df['tweet'] = [re.sub(r'pic.twitter.com\S+', '', post).strip() for post in df['tweet']]

# remove any http urls
df['tweet'] = [re.sub(r'http\S+', '', post).strip() for post in df['tweet']]

# instatiate the tokenizer
tknr = RegexpTokenizer(r'[a-zA-Z&0-9]+')

# start with empty lists
tokens = []

# fill the list with tokenized versions of each post title
for post in df['tweet']:
    tokens.append(" ".join(tknr.tokenize(post.lower())))
df['tweet'] = tokens

# add a word count column
df['tweet_word_count'] = df['tweet'].apply(lambda post: len(post.split()))

# compound score added
sia = SentimentIntensityAnalyzer()

# create function to return compound score
def get_compound(text):
    return sia.polarity_scores(text)['compound']

# add compound score features for title and tac column
df['compound'] = df['tweet'].map(lambda x : get_compound(x))

In [12]:
df = df.sort_values(by = 'time_stamp')

In [14]:
df.shape

(2400, 4)

In [15]:
# save the dataframe file
df.to_csv('../data/twitter_2019_11.csv', index=False)