# RGR Stock Price Forecasting Project

Author: Jack Wang

---

## Problem Statement

Stock prices are hard to predict because they are not only affected by the performance of the underlying companies but also the expectations from the general public. As known, the stock price of firearm companies are highly correlated to the public opinions toward gun ban. My model intends to predict the stock price of one of the largest firearm company in the states, RGR (Sturm, Ruger & Co., firearm company), by using its historical stock price and public opinions toward gun ban. 

## Executive Summary

The goal of my projcet is to build a **time series regression model** that predicts the stock price of RGR. The data I am using would be historical stock price from Yahoo Finance, twitter posts scraped from [twitter](https://twitter.com/), and also the news articles from major news website. I will perform NPL on the text data and time series modeling on the historical stock price data. The model will be evaluated using R^2 score.

## Content

This project consists of 5 Jupyter notebooks:
- Part-1-stock-price-data
- Part-2-twitter-scraper
- Part-3-twitter-data-cleaning
- Part-4-reddit-data-scraper
- ***Part-5-reddit-data-cleaning***
- Part-4-combined-data-and-EDA
- Part-5-modeling
    - [Example](#Most-Frequent-Words-in-Title-and-Content)
- Part-6-Conclusion-and-Discussion


---


In [165]:
import pandas as pd
import numpy as np
# import matplotlib.pyplot as plt
# import itertools
# import re

from datetime import datetime
# from nltk.tokenize import RegexpTokenizer
# from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [166]:
df = pd.read_csv("../data/SEC.csv")

In [167]:
df['Filing Date']= pd.to_datetime(df['Filing Date'])

In [168]:
df['date']= pd.to_datetime(df['Filing Date']).dt.date

In [169]:
df['Filings'].value_counts()

8-K         44
SC 13G/A    14
10-Q        11
UPLOAD       6
PX14A6G      5
CORRESP      4
DEF 14A      3
DEFA14A      3
10-K         3
SD           3
SC 13G       2
S-8 POS      1
S-8          1
Name: Filings, dtype: int64

In [170]:
df = df.loc[(df['Filings']=='8-K')|(df['Filings']=='10-K')|(df['Filings']=='10-Q'),:]

In [171]:
df = pd.get_dummies(df,columns=['Filings'])

In [172]:
df = df.reset_index(drop=True)

In [173]:
df = df[5:54].copy()

In [175]:
df = df.reset_index(drop=True)

In [178]:
df = df.drop(columns='Filing Date')

In [182]:
df.columns = ['date', '10-k', '10-q', '8-k']

In [195]:
df = df.groupby('date').sum()

In [197]:
df['date'] = df.index

In [199]:
df.to_csv("../data/sec_data.csv",index=False)