# Stock prices dataset
The data is of tock exchange's stock listings for each trading day of 2010 to 2016.

## Description
A brief description of columns.
- open: The opening market price of the equity symbol on the date
- high: The highest market price of the equity symbol on the date
- low: The lowest recorded market price of the equity symbol on the date
- close: The closing recorded price of the equity symbol on the date
- symbol: Symbol of the listed company
- volume: Total traded volume of the equity symbol on the date
- date: Date of record

In this assignment, we will work on the stock prices dataset named "prices.csv". Task is to create a Neural Network to classify closing price for a stock based on some parameters.

In [245]:
# Initialize the random number generator
import random
random.seed(0)

# Ignore the warnings
import warnings
warnings.filterwarnings("ignore")

## Question 1

### Load the data
- load the csv file and read it using pandas
- file name is prices.csv

In [29]:
# run this cell to upload file using GUI if you are using google colab

from google.colab import files
files.upload()

ModuleNotFoundError: No module named 'google.colab'

In [None]:
# run this cell to to mount the google drive if you are using google colab

from google.colab import drive
drive.mount('/content/drive/My Drive/')

In [246]:
import pandas as pd
data = pd.read_csv("prices.csv")

## Question 2

### Drop null
- Drop null values if any

In [247]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 851264 entries, 0 to 851263
Data columns (total 7 columns):
date      851264 non-null object
symbol    851264 non-null object
open      851264 non-null float64
close     851264 non-null float64
low       851264 non-null float64
high      851264 non-null float64
volume    851264 non-null float64
dtypes: float64(5), object(2)
memory usage: 45.5+ MB


In [248]:
data.isna().count()    ## no null values

date      851264
symbol    851264
open      851264
close     851264
low       851264
high      851264
volume    851264
dtype: int64

In [249]:
data.dropna()   

Unnamed: 0,date,symbol,open,close,low,high,volume
0,2016-01-05 00:00:00,WLTW,123.430000,125.839996,122.309998,126.250000,2163600.0
1,2016-01-06 00:00:00,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
2,2016-01-07 00:00:00,WLTW,116.379997,114.949997,114.930000,119.739998,2489500.0
3,2016-01-08 00:00:00,WLTW,115.480003,116.620003,113.500000,117.440002,2006300.0
4,2016-01-11 00:00:00,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0
5,2016-01-12 00:00:00,WLTW,115.510002,115.550003,114.500000,116.059998,1098000.0
6,2016-01-13 00:00:00,WLTW,116.459999,112.849998,112.589996,117.070000,949600.0
7,2016-01-14 00:00:00,WLTW,113.510002,114.379997,110.050003,115.029999,785300.0
8,2016-01-15 00:00:00,WLTW,113.330002,112.529999,111.919998,114.879997,1093700.0
9,2016-01-19 00:00:00,WLTW,113.660004,110.379997,109.870003,115.870003,1523500.0


### Drop columns
- Now, we don't need "date", "volume" and "symbol" column
- drop "date", "volume" and "symbol" column from the data


In [250]:
df = data.iloc[:,2:6]

## Question 3

### Print the dataframe
- print the modified dataframe

In [251]:
df

Unnamed: 0,open,close,low,high
0,123.430000,125.839996,122.309998,126.250000
1,125.239998,119.980003,119.940002,125.540001
2,116.379997,114.949997,114.930000,119.739998
3,115.480003,116.620003,113.500000,117.440002
4,117.010002,114.970001,114.089996,117.330002
5,115.510002,115.550003,114.500000,116.059998
6,116.459999,112.849998,112.589996,117.070000
7,113.510002,114.379997,110.050003,115.029999
8,113.330002,112.529999,111.919998,114.879997
9,113.660004,110.379997,109.870003,115.870003


### Get features and label from the dataset in separate variable
- Let's separate labels and features now. We are going to predict the value for "close" column so that will be our label. Our features will be "open", "low", "high"
- Take "open" "low", "high" columns as features
- Take "close" column as label

In [252]:
X = df.drop("close",axis="columns")
X.head()

Unnamed: 0,open,low,high
0,123.43,122.309998,126.25
1,125.239998,119.940002,125.540001
2,116.379997,114.93,119.739998
3,115.480003,113.5,117.440002
4,117.010002,114.089996,117.330002


In [253]:
y = df.iloc[:,1:2]
y.head()

Unnamed: 0,close
0,125.839996
1,119.980003
2,114.949997
3,116.620003
4,114.970001


## Question 4

### Create train and test sets
- Split the data into training and testing

In [254]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

## Question 5

### Scaling
- Scale the data (features only)
- Use StandarScaler

In [255]:
from sklearn.preprocessing import StandardScaler

# Define the scaler 
scaler = StandardScaler().fit(X_train)

# Scale the train set
X_train = scaler.transform(X_train)

# Scale the test set
X_test = scaler.transform(X_test)

## Question 6

### Convert data to NumPy array
- Convert features and labels to numpy array

In [256]:
import numpy as np

In [257]:
y_train=np.array(y_train)
y_train

array([[62.98    ],
       [10.4     ],
       [73.060003],
       ...,
       [51.779999],
       [47.900002],
       [73.989998]])

In [258]:
y_test = np.array(y_test)
y_test

array([[56.990002],
       [12.42    ],
       [10.92    ],
       ...,
       [51.59    ],
       [69.260002],
       [73.160004]])

In [259]:
X_train.shape

(595884, 3)

In [260]:
X_test.shape

(255380, 3)

In [261]:
y_train.shape

(595884, 1)

In [262]:
y_test.shape

(255380, 1)

In [263]:
X_train = X_train.reshape(X_train.shape[0],X_train.shape[1],1)
print("X_train",X_train)
X_test = X_test.reshape(X_test.shape[0],X_test.shape[1],1)
print("X_test",X_test)

X_train [[[-0.09066067]
  [-0.08749153]
  [-0.09026979]]

 [[-0.72185845]
  [-0.72071592]
  [-0.72065945]]

 [[ 0.03814548]
  [ 0.03322552]
  [ 0.03312362]]

 ...

 [[-0.2507178 ]
  [-0.24636758]
  [-0.23132448]]

 [[-0.28256286]
  [-0.27672687]
  [-0.2747712 ]]

 [[ 0.04788915]
  [ 0.04282527]
  [ 0.03971722]]]
X_test [[[-0.1687286 ]
  [-0.16248968]
  [-0.16892133]]

 [[-0.69916289]
  [-0.69911645]
  [-0.69605141]]

 [[-0.71223363]
  [-0.71207613]
  [-0.71218204]]

 ...

 [[-0.2346764 ]
  [-0.23616786]
  [-0.23473901]]

 [[-0.01128568]
  [-0.01105338]
  [-0.00902798]]

 [[ 0.03125365]
  [ 0.03322547]
  [ 0.02264461]]]


## Question 7

### Define Model
- Initialize a Sequential model
- Add a Flatten layer
- Add a Dense layer with one neuron as output
  - add 'linear' as activation function


In [264]:
# Using Tensorflow Keras instead of the original Keras



from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Flatten

# define the model architecture

# Initialize the constructor
model = Sequential()

# Add an input layer 
#model.add(Dense(12, input_shape=(4,)))

# Add one hidden layer 
model.add(Flatten())

# Add an output layer 
model.add(Dense(1, activation='linear'))

## Question 8

### Compile the model
- Compile the model
- Use "sgd" optimizer
- for calculating loss, use mean squared error

In [265]:
model.compile(loss='mean_squared_error',
              optimizer='sgd')

## Question 9

### Fit the model
- epochs: 50
- batch size: 128
- specify validation data

In [266]:
history = model.fit(X_train, y_train, validation_data = (X_test,y_test),epochs=50, batch_size=128)

Train on 595884 samples, validate on 255380 samples
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Question 10

### Evaluate the model
- Evaluate the model on test data

In [267]:
model.evaluate(X_test, y_test)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



0.7341381271683308

In [268]:
y_pred=model.predict(X_test)
y_pred

array([[56.983585],
       [12.3386  ],
       [11.129799],
       ...,
       [51.156464],
       [70.065094],
       [73.29221 ]], dtype=float32)

### Manual predictions
- Test the predictions on manual inputs
- We have scaled out training data, so we need to transform our custom inputs using the object of the scaler
- Example of manual input: [123.430000,	122.30999, 116.250000]

In [278]:
input_data=[[123.430000, 122.30999, 116.250000]]
scaled_input= scaler.transform(input_data)
scaled_input
#scaled_input = scaled_input.reshape(scaled_input.shape[0],1)
model.predict(scaled_input)

array([[119.648895]], dtype=float32)

# Build a DNN

### Collect Fashion mnist data from tf.keras.datasets 

In [232]:
%matplotlib inline


import tensorflow as tf

### Change train and test labels into one-hot vectors

In [292]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

number = 10 


In [280]:
import numpy as np
np.unique(y_train)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8)

In [281]:
y_train = tf.keras.utils.to_categorical(y_train, number)   # Converting the target into categorical which is stored as numeric
y_test = tf.keras.utils.to_categorical(y_test, number)     # Keras converst these into 1-hot coded vectors as these are lables


print ('Train size:', x_train.shape[0])
print ('Test size:', x_test.shape[0])

Train size: 60000
Test size: 10000


In [282]:
y_train                 #One hot encoded vector

array([[0., 0., 0., ..., 0., 0., 1.],
       [1., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [1., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

In [283]:
y_test

array([[0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 1., 0.],
       [0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]], dtype=float32)

### Build the Graph

### Initialize model, reshape & normalize data

In [284]:
model = Sequential()
model.add(tf.keras.layers.Reshape((784,), input_shape = (28,28,)))
model.add(tf.keras.layers.BatchNormalization())

### Add two fully connected layers with 200 and 100 neurons respectively with `relu` activations. Add a dropout layer with `p=0.25`

In [285]:
from tensorflow.keras.layers import Dropout
model.add(Dense(200, activation='relu'))   #First hidden layer of 784  neurons, each neuron takes input 
                                                               # vector of size 784



model.add(Dense(100, activation='relu'))  

model.add(Dropout(0.25))



### Add the output layer with a fully connected layer with 10 neurons with `softmax` activation. Use `categorical_crossentropy` loss and `adam` optimizer and train the network. And, report the final validation.

In [286]:
model.add(Dense(10, activation='softmax'))


In [287]:
model.compile(loss="categorical_crossentropy", optimizer = "adam", metrics=["accuracy"])

In [288]:
#model.fit(x_train,y_train, epochs = 10, batch_size= 32)
model.fit(x_train, y_train,
                    batch_size=32,
                    epochs=10)

Train on 60000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x15c00cd6780>

In [289]:
model.evaluate(x_test,y_test)



[0.34804484552145004, 0.889]