# Stock Price Prediction

### Task
* 모델 및 하이퍼파라미터들을 바꿔가며 accuracy를 높혀 보자
* 밑에 제시된 여러가지 시도를 해보자
* The main flow of this code is referenced in this [blog](https://medium.com/@aniruddha.choudhury94/stock-market-prediction-by-recurrent-neural-network-on-lstm-model-56de700bff68)
* 주식데이터를 직접 다운 받아서 실제 예측을 해보자
  * Train data: 2017년 1월 1일 ~ 2019년 12월 31일 데이터
  * Test data: 2019년 1월 1일 ~ 2019년 2월 28일 데이터
  * Close price 예측 (baseline은 open price)

### Dataset
* [Yahoo finance datasets](https://www.imdb.com/interfaces/)
* 2-3년간 daily stock price 데이터를 이용하여 미래 한달의 주식가격을 예측

### Baseline code
* Dataset: train, test로 split
* Input data shape: (`batch_size`, `past_day`=60, 1)
* Output data shape: (`batch_size`, 1)
* Architecture: 
  * `LSTM` - `Dense`
  * [`tf.keras.layers`](https://www.tensorflow.org/api_docs/python/tf/keras/layers) 사용
* Training
  * `model.fit` 사용
* Evaluation
  * `model.evaluate` 사용 for test dataset

### Try some techniques
* Change model architectures (Custom model)
  * Use another cells (LSTM, GRU, etc.)
  * Use dropout layers
  * Change the `past_day`
* Data augmentation (if possible)
* Try Early stopping
* Use various features (open, high, low, close prices and volume features)

## Import modules

In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import sys
import time

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from IPython import display

import tensorflow as tf
from tensorflow.keras import layers
tf.enable_eager_execution()

tf.logging.set_verbosity(tf.logging.INFO)

os.environ["CUDA_VISIBLE_DEVICES"]="0"

## Data Download

* We colud download daliy stock price using `fix_yahoo_finance` library.
* Some stock ticker symbols [NASDAQ] [link](http://eoddata.com/symbols.aspx)
  * `AAPL`: Apple Inc.
  * `AMZM`: Amazon.com Inc.
  * `GOOG`: Alphabet Class C (Google)
  * `MSFT`: Microsoft Corp.

In [None]:
import fix_yahoo_finance as yf

dataset = yf.download(tickers='AAPL', start='2017-01-01', end='2019-01-01', auto_adjust=True)

In [None]:
dataset.head()

### Data Preprocessing

1. Data discretization: Part of data reduction but with particular importance, especially for numerical data
2. Data transformation: Normalization.
3. Data cleaning: Fill in missing values.
4. Data integration: Integration of data files.

After the dataset is transformed into a clean dataset, the dataset is divided into training and testing sets so as to evaluate. Creating a data structure with 60 timesteps and 1 output

In [None]:
#Data cleaning
dataset.isna().any()

In [None]:
dataset.info()

In [None]:
dataset['Open'].plot(figsize=(16, 6))
dataset.Close.plot(figsize=(16, 6))
plt.show()

In [None]:
# convert column to float type when column type is an object
#dataset["Close"] = dataset["Close"].str.replace(',', '').astype(float)

In [None]:
# 7 day rolling mean
dataset.rolling(7).mean().head(20)

In [None]:
dataset['Close: 30 Day Mean'] = dataset.Close.rolling(window=30).mean()
dataset[['Close', 'Close: 30 Day Mean']].plot(figsize=(16, 6))
plt.show()

### Make a training dataset

In [None]:
train_data = dataset['Open']
train_data = pd.DataFrame(train_data)

In [None]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
train_data_scaled = sc.fit_transform(train_data)

In [None]:
plt.figure(figsize=(16, 6))
plt.plot(train_data_scaled)
plt.show()

In [None]:
# Creating a data structure with 60 timesteps and 1 output
past_days = 60
X_train = []
y_train = []
for i in range(past_days, len(train_data_scaled)):
  X_train.append(train_data_scaled[i-past_days:i, 0])
  y_train.append(train_data_scaled[i, 0])
X_train, y_train = np.array(X_train), np.array(y_train)

# Reshaping
X_train = np.reshape(X_train, (X_train.shape[0], X_train.shape[1], 1))

# Cast
X_train = X_train.astype(np.float32)
y_train = y_train.astype(np.float32)

In [None]:
# We predict the price of next day given past 60 days prices
print(X_train.shape)
print(y_train.shape)

## Build a model

In [None]:
model = tf.keras.Sequential()

In [None]:
# Adding the first LSTM layer and some Dropout regularisation
model.add(layers.LSTM(units = 50, return_sequences = True, input_shape = (X_train.shape[1], 1)))
model.add(layers.Dropout(0.2))

# Adding a second LSTM layer and some Dropout regularisation
model.add(layers.LSTM(units = 50)
model.add(layers.Dropout(0.2))

# Adding the output layer
model.add(layers.Dense(units = 1))

In [None]:
# Check for model
model(X_train[0:2])[0]

In [None]:
# Compiling the RNN
model.compile(optimizer=tf.train.AdamOptimizer(0.001),
              loss='mean_squared_error')

In [None]:
# Fitting the RNN to the Training set
model.fit(X_train, y_train, epochs=10, batch_size=32)

## Performance on Test-Set

Now that the model has been trained we can calculate its mean squared error on the test-set.

In [None]:
# Part 3 - Making the predictions and visualising the results

# Getting the real stock price of 2018
dataset_test = yf.download(tickers='GOOG', start='2019-01-01', end='2019-02-01', auto_adjust=True)

In [None]:
dataset_test.head()

In [None]:
dataset_test.info()

In [None]:
test_data = dataset_test['Open']
test_data = pd.DataFrame(test_data)

In [None]:
test_data.info()

In [None]:
# Feature Scaling
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0, 1))
test_set_scaled = sc.fit_transform(test_data)

In [None]:
test_set_scaled = pd.DataFrame(test_set_scaled)
test_set_scaled.head()

In [None]:
# Getting the predicted stock price of 2018
dataset_total = pd.concat((dataset['Open'], dataset_test['Open']), axis = 0)
inputs = dataset_total[len(dataset_total) - len(dataset_test) - past_days:].values
inputs = inputs.reshape(-1,1)
inputs = sc.transform(inputs)
X_test = []
y_test = []

prediction_days = dataset_test.shape[0]
for i in range(past_days, past_days + prediction_days):
  X_test.append(inputs[i-past_days:i, 0])
  y_test.append(inputs[i, 0])
  
X_test, y_test = np.array(X_test), np.array(y_test)
X_test = np.reshape(X_test, (X_test.shape[0], X_test.shape[1], 1))

# Cast
X_test = X_test.astype(np.float32)
y_test = y_test.astype(np.float32)

In [None]:
predicted_stock_price = model.predict(X_test)
predicted_stock_price = sc.inverse_transform(predicted_stock_price)

In [None]:
# Visualising the results
real_stock_price = dataset_test.Open.values
plt.plot(real_stock_price, color='red', label='Real Stock Price')
plt.plot(predicted_stock_price, color='blue', label='Predicted Stock Price')
plt.title('Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legend()
plt.show()

### Evalueate for test dataset

In [None]:
%%time
result = model.evaluate(X_test, y_test)

In [None]:
print("Mean Squared Error: {0:.2%}".format(result))