# Case Study 2 : Data Science in Financial Data

**Recommended Readings:** 
* [Quantopian Tutorials](https://www.quantopian.com/tutorials/) 
* Please register an account in [Quantopian online notebook system](https://www.quantopian.com/notebooks/).
* Upload this file into the system and start working on your idea.


**NOTE**
* Please download your code (notebook file as an ipynb file) and include it in your submission.


# Problem: pick a data science problem that you plan to solve using Stock Price Data
* The problem should be important and interesting, which has a potential impact in some area.
* The problem should be solvable using the data available and data science solutions.

Please briefly describe in the following cell: what problem are you trying to solve? why this problem is important and interesting?

In [None]:
# We will solve the problem of not having more money, by writing an 
# algorithm which can optimize puchasing and selling gold stocks.

# Data Collection/Processing: 

In [71]:
# import relevant libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# load the daily stock price of GOLD
gold_data = pd.read_csv("gold.csv")
# calculate continuous target feature
gold_data['Price Change'] = gold_data['Close'].pct_change().shift(-1)
# calculate discrete target features
gold_data['Price Increased'] = (gold_data['Price Change'] > 0).astype(int)
gold_data['Price Stayed'] = (gold_data['Price Change'] == 0).astype(int)
gold_data['Price Decreased'] = (gold_data['Price Change'] < 0).astype(int)
# calculate additional features
gold_data['Open Change'] = gold_data['Open'].pct_change()
gold_data['High Change'] = gold_data['High'].pct_change()
gold_data['Low Change'] = gold_data['Low'].pct_change()
gold_data['Close Change'] = gold_data['Close'].pct_change()
gold_data['Adj Close Change'] = gold_data['Adj Close'].pct_change()
gold_data['Volume Change'] = gold_data['Volume'].pct_change()
# TODO: Adjust for inflation
# Drop unnecesary rows and columns
gold_data.drop(['Date','Open','High','Low','Close','Adj Close','Volume'],axis=1,inplace=True)
gold_data.replace([np.inf,-np.inf],np.nan,inplace=True)
gold_data.dropna(inplace=True)
# extract neural network inputs
nn_input = gold_data.drop(['Price Change','Price Increased','Price Stayed','Price Decreased'],axis=1).values
nn_continuous_target = gold_data['Price Change'].values
nn_discrete_target = gold_data[['Price Increased','Price Stayed','Price Decreased']].values
# process into training, validation, and test sets
sample_num = nn_input.shape[0]
training_input = nn_input[:int(sample_num*0.5),:]
validation_input = nn_input[int(sample_num*0.5):int(sample_num*0.75),:]
testing_input = nn_input[int(sample_num*0.75):,:]
training_continuous_target = nn_continuous_target[:int(sample_num*0.5)]
validation_continuous_target = nn_continuous_target[int(sample_num*0.5):int(sample_num*0.75)]
testing_continuous_target = nn_continuous_target[int(sample_num*0.75):]
training_discrete_target = nn_discrete_target[:int(sample_num*0.5),:]
validation_discrete_target = nn_discrete_target[int(sample_num*0.5):int(sample_num*0.75),:]
testing_discrete_target = nn_discrete_target[int(sample_num*0.75):,:]
# prepare for normalizing data
input_scaler = StandardScaler()
training_input = input_scaler.fit_transform(training_input).reshape(training_input.shape[0],1,training_input.shape[1])
validation_input = input_scaler.transform(validation_input).reshape(validation_input.shape[0],1,validation_input.shape[1])
testing_input = input_scaler.transform(testing_input).reshape(testing_input.shape[0],1,testing_input.shape[1])
target_scaler = StandardScaler()
training_continuous_target = target_scaler.fit_transform(training_continuous_target.reshape(-1, 1))
validation_continuous_target = target_scaler.transform(validation_continuous_target.reshape(-1, 1))
testing_continuous_target = target_scaler.transform(testing_continuous_target.reshape(-1, 1))

inputs:  Index(['Open Change', 'High Change', 'Low Change', 'Close Change',
       'Adj Close Change', 'Volume Change'],
      dtype='object')


# Data Exploration: Exploring the Dataset

**plot the weekly returns of a set of stocks of your choice** 


# The Solution: implement a data science solution to the problem you are trying to solve.

Briefly describe the idea of your solution to the problem in the following cell:

In [None]:
# Naive Approach: Build a feed-forward neural network using last n days as input
# Convolutional Approach: Build a one-dimentional convolutional neural network using last n days as input
# Recurrent Approach: Build a neural network of long short-term memory nodes using current day as input

Write codes to implement the solution in python:

In [81]:
from keras.models import Sequential
from keras.layers import Dense, LSTM
from keras.callbacks import EarlyStopping

feature_num = training_input.shape[2]
callbacks = [EarlyStopping(monitor='val_loss',patience=5,restore_best_weights=False)]

continuous_lstm_model = Sequential()
continuous_lstm_model.add(LSTM(8,input_shape=(1,feature_num)))
continuous_lstm_model.add(Dense(1,activation='linear'))
continuous_lstm_model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
continuous_lstm_model.fit(training_input,training_continuous_target,validation_data=(validation_input,validation_continuous_target),epochs=1000,callbacks=callbacks)
print(continuous_lstm_model.evaluate(testing_input,testing_continuous_target))
'''
discrete_lstm_model = Sequential()
discrete_lstm_model.add(LSTM(8,input_shape=(1,feature_num)))
discrete_lstm_model.add(Dense(3,activation='softmax'))
discrete_lstm_model.compile(loss='categorical_crossentropy',optimizer='adam',metrics=['accuracy'])
discrete_lstm_model.fit(training_input,training_discrete_target,validation_data=(validation_input,validation_discrete_target),epochs=1000,callbacks=callbacks)
discrete_lstm_model.evaluate(testing_input,testing_discrete_target)
'''

SyntaxError: invalid syntax (<ipython-input-81-f89d005a6bf1>, line 15)

# Results: summarize and visualize the results discovered from the analysis

Please use figures, tables, or videos to communicate the results with the audience.



# Done

All set! 

**What do you need to submit?**

* **Notebook File**: Save this Jupyter notebook, and find the notebook file in your folder (for example, "filename.ipynb"). This is the file you need to submit. Please make sure all the plotted tables and figures are in the notebook. If you used "jupyter notebook --pylab=inline" to open the notebook, all the figures and tables should have shown up in the notebook.

* **PPT Slides**: please prepare PPT slides to present about the case study . Each team present their case studies in class for 7 minutes.

Please compress all the files in a zipped file.


**How to submit:**

        Please submit through Canvas, in the Assignment "Case Study 2".
        
**Note: Each team only needs to submit one submission in Canvas**