# Stock prices dataset
The data is of tock exchange's stock listings for each trading day of 2010 to 2016.

## Description
A brief description of columns.
- open: The opening market price of the equity symbol on the date
- high: The highest market price of the equity symbol on the date
- low: The lowest recorded market price of the equity symbol on the date
- close: The closing recorded price of the equity symbol on the date
- symbol: Symbol of the listed company
- volume: Total traded volume of the equity symbol on the date
- date: Date of record

In this assignment, we will work on the stock prices dataset named "prices.csv". Task is to create a Neural Network to classify closing price for a stock based on some parameters.

In [3]:
# Initialize the random number generator
import random
random.seed(0)

# Ignore the warnings
#import warnings
#warnings.filterwarnings("ignore")

## Question 1

### Load the data
- load the csv file and read it using pandas
- file name is prices.csv

In [4]:
import pandas as pd
import numpy as np

In [5]:
df = pd.read_csv('prices.csv')

In [7]:
df

Unnamed: 0,date,symbol,open,close,low,high,volume
0,2016-01-05 00:00:00,WLTW,123.430000,125.839996,122.309998,126.250000,2163600.0
1,2016-01-06 00:00:00,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
2,2016-01-07 00:00:00,WLTW,116.379997,114.949997,114.930000,119.739998,2489500.0
3,2016-01-08 00:00:00,WLTW,115.480003,116.620003,113.500000,117.440002,2006300.0
4,2016-01-11 00:00:00,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0
...,...,...,...,...,...,...,...
205087,2011-09-26,HST,10.710000,11.190000,10.580000,11.210000,9152800.0
205088,2011-09-26,HSY,59.419998,60.000000,58.959999,60.000000,1638100.0
205089,2011-09-26,HUM,77.809998,78.629997,76.650002,79.000000,2264700.0
205090,2011-09-26,IBM,170.960007,174.509995,169.860001,174.699997,6745700.0


## Question 2

### Drop null
- Drop null values if any

In [8]:
df.dropna()

Unnamed: 0,date,symbol,open,close,low,high,volume
0,2016-01-05 00:00:00,WLTW,123.430000,125.839996,122.309998,126.250000,2163600.0
1,2016-01-06 00:00:00,WLTW,125.239998,119.980003,119.940002,125.540001,2386400.0
2,2016-01-07 00:00:00,WLTW,116.379997,114.949997,114.930000,119.739998,2489500.0
3,2016-01-08 00:00:00,WLTW,115.480003,116.620003,113.500000,117.440002,2006300.0
4,2016-01-11 00:00:00,WLTW,117.010002,114.970001,114.089996,117.330002,1408600.0
...,...,...,...,...,...,...,...
205086,2011-09-26,HSIC,63.070000,63.130001,61.389999,63.389999,506400.0
205087,2011-09-26,HST,10.710000,11.190000,10.580000,11.210000,9152800.0
205088,2011-09-26,HSY,59.419998,60.000000,58.959999,60.000000,1638100.0
205089,2011-09-26,HUM,77.809998,78.629997,76.650002,79.000000,2264700.0


### Drop columns
- Now, we don't need "date", "volume" and "symbol" column
- drop "date", "volume" and "symbol" column from the data


In [9]:
newstockdf = df.drop(['date','volume','symbol'],axis =1 )

## Question 3

### Print the dataframe
- print the modified dataframe

In [10]:
newstockdf.head()

Unnamed: 0,open,close,low,high
0,123.43,125.839996,122.309998,126.25
1,125.239998,119.980003,119.940002,125.540001
2,116.379997,114.949997,114.93,119.739998
3,115.480003,116.620003,113.5,117.440002
4,117.010002,114.970001,114.089996,117.330002


## Question 3

### Get features and label from the dataset in separate variable
- Let's separate labels and features now. We are going to predict the value for "close" column so that will be our label. Our features will be "open", "low", "high"
- Take "open" "low", "high" columns as features
- Take "close" column as label

In [11]:
X = newstockdf[['open','low','high']]
y= newstockdf[['close']]

## Question 4

### Create train and test sets
- Split the data into training and testing

In [12]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import numpy as np
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, BatchNormalization,Flatten
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV
import matplotlib.pyplot as plt
from keras.optimizers import SGD
import keras

In [13]:
X_train , X_test , y_train, y_test = train_test_split(X, y, test_size=0.30, random_state = 1)

## Question 5

### Scaling
- Scale the data (features only)
- Use StandarScaler

In [14]:
sc = StandardScaler()
#X_trainsc = sc.fit_transform(X_train)
#X_testsc = sc.transform(X_test)
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

## Question 6

### Convert data to NumPy array
- Convert features and labels to numpy array

In [17]:
y_train = np.array(y_train)
y_test = np.array(y_test)


In [16]:
X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

## Question 7

### Define Model
- Initialize a Sequential model
- Add a Flatten layer
- Add a Dense layer with one neuron as output
  - add 'linear' as activation function


In [18]:
#Initialize the model
model = tf.keras.Sequential()

#Flatten the layer
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(1, activation='linear'))

## Question 8

### Compile the model
- Compile the model
- Use "sgd" optimizer
- for calculating loss, use mean squared error

In [19]:
model.compile(optimizer='sgd', loss='mean_squared_error', metrics=['mse'])

## Question 9

### Fit the model
- epochs: 50
- batch size: 128
- specify validation data

In [20]:
output = model.fit(X_train, y_train , validation_data= (X_test, y_test), epochs = 50,batch_size=128)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


## Question 10

### Evaluate the model
- Evaluate the model on test data

In [21]:
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  4         
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [23]:
#Testing the model on test set
score = model.evaluate(X_test, y_test,batch_size=128)




In [24]:
score

[nan, nan]

### Manual predictions
- Test the predictions on manual inputs
- We have scaled out training data, so we need to transform our custom inputs using the object of the scaler
- Example of manual input: [123.430000,	122.30999, 116.250000]

In [27]:
maininput= np.array([[123.430000, 122.30999, 116.250000]])
model.predict(maininput)


array([[6352.6895]], dtype=float32)

In [28]:
maininputsc = sc.transform(maininput)

In [29]:
model.predict(maininputsc)

array([[120.337425]], dtype=float32)

# Build a DNN

### Collect Fashion mnist data from tf.keras.datasets 

In [186]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()

In [187]:
x_train.shape

(60000, 28, 28)

In [188]:
print(x_train[0].shape)

(28, 28)


### Change train and test labels into one-hot vectors

In [189]:
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

### Build the Graph

### Initialize model, reshape & normalize data

In [166]:
#x_train = x_train.astype('float32')
#x_test = x_test.astype('float32')

In [167]:
#x_train /= 255
#x_test /= 255

In [153]:
#x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
#x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

#img_rows, img_cols = 28, 28

(10000, 28, 28)

In [190]:
model_2 = tf.keras.models.Sequential()
model_2.add(tf.keras.layers.Reshape((784,),input_shape=(28,28,)))
model_2.add(tf.keras.layers.BatchNormalization())

### Add two fully connected layers with 200 and 100 neurons respectively with `relu` activations. Add a dropout layer with `p=0.25`

In [191]:
model_2.add(tf.keras.layers.Dense(200, activation='relu'))
#model_2.add(tf.keras.layers.Dropout(0.25))

In [192]:
model_2.add(tf.keras.layers.Dense(100, activation='relu'))
model_2.add(tf.keras.layers.Dropout(0.25))

### Add the output layer with a fully connected layer with 10 neurons with `softmax` activation. Use `categorical_crossentropy` loss and `adam` optimizer and train the network. And, report the final validation.

In [193]:
model_2.add(tf.keras.layers.Dense(10, activation='softmax'))

In [194]:
model_2.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [195]:
y_train.shape

(60000, 10)

In [196]:
output = model_2.fit(x_train, y_train , validation_data= (x_test, y_test), epochs = 20,batch_size=100)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [197]:
model_2.summary()

Model: "sequential_15"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
reshape_4 (Reshape)          (None, 784)               0         
_________________________________________________________________
batch_normalization_11 (Batc (None, 784)               3136      
_________________________________________________________________
dense_26 (Dense)             (None, 200)               157000    
_________________________________________________________________
dense_27 (Dense)             (None, 100)               20100     
_________________________________________________________________
dropout_14 (Dropout)         (None, 100)               0         
_________________________________________________________________
dense_28 (Dense)             (None, 10)                1010      
Total params: 181,246
Trainable params: 179,678
Non-trainable params: 1,568
___________________________________________

In [199]:
score = model_2.evaluate(x_test, y_test, verbose=0)

In [200]:
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Test loss: 0.3995426595211029
Test accuracy: 0.8888000249862671
