# RGR Stock Price Forecasting Project - Part 6

Author: Jack Wang

---

## Problem Statement

Stock prices are hard to predict because they are not only affected by the performance of the underlying companies but also the expectations from the general public. As known, the stock price of firearm companies are highly correlated to the public opinions toward gun control. My model intends to predict the stock price of one of the largest firearm company in the states, RGR (Sturm, Ruger & Co., firearm company), by using its historical stock price, public opinions toward gun control, and its financial reports to SEC. 

## Executive Summary

The goal of my projcet is to build a **time series regression model** that predicts the stock price of RGR. The data I am using would be historical stock price from [Yahoo Finance](https://finance.yahoo.com/quote/RGR/history?p=RGR), twitter posts scraped from [twitter](https://twitter.com/), subreddit posts mentioned about gun control, and also the financial reports to [SEC](https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000095029&type=&dateb=&owner=exclude&count=100). I will do sentiment analysis on the text data and time series modeling on the historical stock price data. The model will be evaluated using MSE.

## Content

This project consists of 7 Jupyter notebooks:
- Part-1-stock-price-data
- Part-2-twitter-scraper
- Part-3-twitter-data-cleaning
- Part-4-reddit-data-scraper
- Part-5-reddit-data-cleaning
- ***Part-6-sec-data-cleaning***
- Part-7-modeling-and-evaluation


---


**All public companies in the US are required to submit financial statements or public annoucement to SEC. So I collected the reports (10K, 10Q, 8K) from [SEC](https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=0000095029&type=&dateb=&owner=exclude&count=100) and implement them to the model.**

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime

In [3]:
df = pd.read_csv("../data/sec/SEC.csv")

In [4]:
# Convert datetime

df['Filing Date']= pd.to_datetime(df['Filing Date'])
df['date']= pd.to_datetime(df['Filing Date']).dt.date

# Select only 8K, 10K, and 10Q

df = df.loc[(df['Filings']=='8-K')|(df['Filings']=='10-K')|(df['Filings']=='10-Q'),:]

# Get dummies

df = pd.get_dummies(df,columns=['Filings'])
df = df.reset_index(drop=True)
df = df[5:54].copy()
df = df.reset_index(drop=True)
df = df.drop(columns='Filing Date')

# Rename
df.columns = ['date', '10-k', '10-q', '8-k']
df = df.groupby('date').sum()

# Include date column

df['date'] = df.inde

In [16]:
df.to_csv("../data/sec/sec_data.csv",index=False)