# Mini-Project #3 (ANN)

Student Name: **Subhadyuti Sahoo**
<br>
Course: **Adv Topics in Machine Learning**

### Importing Necessary Libraries, Modules and Classes

In [1]:
import os
import time
import datetime
import sys
import numpy as np
import scipy
import matplotlib.pyplot as plt
from matplotlib import rcParams
import seaborn as sns
import pandas as pd
import statistics
import sklearn
from sklearn.metrics import mean_squared_error
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential 
from tensorflow.keras.layers import Dense
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
import warnings
from numpy.random import seed
tf.random.set_seed(1234)
warnings.simplefilter("ignore")
warnings.filterwarnings("ignore")
np.set_printoptions(formatter={'float': lambda x: "{0:0.3f}".format(x)})

INFO:tensorflow:Enabling eager execution
INFO:tensorflow:Enabling v2 tensorshape
INFO:tensorflow:Enabling resource variables
INFO:tensorflow:Enabling tensor equality
INFO:tensorflow:Enabling control flow v2


<a id='Goal'></a>
<div class=" alert alert-warning">
    <b>Goal.</b>

In this mini-project, you will explore the possibility of using artificial neural networks to predict the stock market, in particular the price of a very interesting stock NASDAQ: NVDA (NVIDIA Corporation). Below is its historic price chart. You can see that its per share price has jumped from USD $22$ to USD $615$ in the past $12$ years, a return of $28$x. In other words, if you had invested USD $1,000$ in NVDA in $2015$, your account would now have a balance of about USD $28,000$. Of course, we cannot look back; we can only look forward. The question is then: since we have plenty of data (we can get the historic price of about every stock), can we build a machine learning model on it so that we can identify the next NVDA? 

</div>

<img src="./plotsAndFigures/NVDAStockPrices.png" width=700>

<a id='Description'></a>
<div class=" alert alert-warning">
    <b>Description.</b>

The project will use the historic adjusted closing price data of NVDA as the training/test data and build an artificial neural network (ANN) to predict the price in the future. The file NVDA.csv provided by NASDAQ contains the opening price, intra-day high, intra-day low, closing price, adjusted closing price and trading volume of each trading day since January $22$, $1999$ (for a total of $5,566$ trading days). For this project we will only use the adjusted closing price. We will $60:20:20$ split the data for training, validation and test, reserving the most recent $20\%$ data for test, as though we had traveled back in time to September $30, 2016$. You will have access to any data on or before that date to train/validate your model, and will use the data after that date to test your model.
    
</div>

### Importing the Dataset

In [2]:
# Forming the pandas dataframe
# entireDataSet = pd.read_csv('/content/drive/My Drive/AdvTopicsInML/NVDA.csv')   # for Google Colab
entireDataSet = pd.read_csv('NVDA.csv')   # for Jupyter Notebook

# Displaying the pandas dataFrame
display(entireDataSet)  

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1999-01-22,1.750000,1.953125,1.552083,1.640625,1.508412,67867200
1,1999-01-25,1.770833,1.833333,1.640625,1.812500,1.666436,12762000
2,1999-01-26,1.833333,1.869792,1.645833,1.671875,1.537143,8580000
3,1999-01-27,1.677083,1.718750,1.583333,1.666667,1.532354,6109200
4,1999-01-28,1.666667,1.677083,1.651042,1.661458,1.527566,5688000
...,...,...,...,...,...,...,...
5561,2021-03-01,555.000000,557.000000,542.130005,553.669983,553.669983,8802500
5562,2021-03-02,556.000000,556.820007,535.840027,536.250000,536.250000,6585500
5563,2021-03-03,537.049988,538.059998,511.950012,512.190002,512.190002,9408000
5564,2021-03-04,512.030029,519.000000,483.350006,494.809998,494.809998,14292400


### Extracting The Working Dataset

In [3]:
dSet = pd.DataFrame(entireDataSet['Adj Close'], columns=['Adj Close'])   # Working with only the Adjusted Closing Price

### Checking for Missing Values (if any)

In [4]:
# Checking if there are missing values in the workign dataset
result = (dSet.isna().values.any()) or (dSet.isnull().values.any())
if (result == True):
    n_missing_values = dSet.isna().sum().sum() + dSet.isnull().sum().sum()

# Displaying if there are any missing values in dataFrame
print('--- Checking for Missing Values ---')
print('Q. Are there any missing values in the dataset?')   
if (result == True):
    print('A. Yes')
    print('Q. How many?')
    print('A. ', n_missing_values)
else:
    print('A. No') 
print('-----------------------------------')
print('\n')

--- Checking for Missing Values ---
Q. Are there any missing values in the dataset?
A. No
-----------------------------------




### Splitting the Working Dataset

In [5]:
# Splitting the data after September 30, 2016
dates = entireDataSet['Date'].to_numpy()
trainval_indices = np.where(dates < '2016-10-01')
trainval_index_start = trainval_indices[0][0]
trainval_index_end = trainval_indices[0][-1]

# Splitting the working dataset into train, val and test datasets
n = (0.6 * len(dSet)) / trainval_index_end
trainDataSet = dSet[:int(n*trainval_index_end)]
valDataSet = dSet[int(n*trainval_index_end):trainval_index_end+1]
testDataSet = dSet[trainval_index_end+1:]
tempTestDates = entireDataSet['Date'][trainval_index_end+1:]

<a id='QuestionA'></a>
<div class=" alert alert-warning">
    <b>Question.</b>
 
To get you started, an interesting observation you can make is that NVDA price remained mostly flat between 2010 and 2015, and then it started to soar. In one sentence, explain why (remember this is a computer science class, not finance).

</div>

<a id='AnswerA'></a>
<div class=" alert alert-info">
    <b>Answer.</b>
 
The NVDA price started to soar since 2015 because universities, companies and machine learning enthusiasts across the globe came to realize that if CUDA cores in NVIDIA GPUs could be used for enhancing video quality during computer video games, such cores could also be used for training deep neural networks, especially for image- and video-related machine learning and deep learning problems, which lead to bulk-purchasing of NVIDIA GPUs.

</div>

<a id='QuestionB'></a>
<div class=" alert alert-warning">
    <b>Question.</b>
 
Now a quantitative research intern at a hedge fund designs a two-layer ANN with $\texttt{200}$ hidden neurons to predict the next day’s NVDA adjusted closing price using the past thirty days. Do you see any issue to train such a network on the data provided? Please explain (you will need to do some math here). 
 

</div>

### Creating the ANN Model containing 1 Hidden Layer with 200 Hidden Neurons for 30 Prior Days

In [6]:
model302001 = Sequential()
model302001.add(Dense(units=200, activation='relu', input_dim=30))
model302001.add(Dense(units=1))
model302001.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 200)               6200      
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 201       
Total params: 6,401
Trainable params: 6,401
Non-trainable params: 0
_________________________________________________________________


<img src="./plotsAndFigures/model302001.png">

<a id='AnswerB'></a>
<div class=" alert alert-info">
    <b>Answer.</b>
 
$$
\texttt{Number of Model Parameters}
= \Big( (30 + 1) \times 200 \Big) + \Big( (200 + 1) \times 1 \Big) 
= 6401
$$
    
The number of model parameters is more than $\texttt{210}$ times the number of training data (input neurons). This might cause overfitting during the training process which would then lead to erroneous predictions for validation and test datasets. 

</div>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task.</b>
 
So now we decide to build our own models to predict the next day’s NVDA adjusted closing price based on the past seven days. To extract the train/validation/test data, you may use a sliding window approach (i.e., the first sample would be the adjusted closing prices on the seven trading days between $1/22/99$ and $2/1/99$ and its label/target is USD $1.369541$, the adjusted closing price on $2/2/99$; the second sample would be the adjusted closing prices from $1/23/99$ to $2/2/99$ and the label/target would be that on $2/3/99$, etc). We will build $4$ two-layer ANNs, with $20$, $40$, $60$ and $80$ hidden neurons respectively. Use ReLU activation for all the neurons and mean squared error/sum of squared error as the objective function. 
 

</div>

### User-Defined Function to Form Training, Validation and Test Datasets Depending Upon Prior Days

In [7]:
def get_refurbished_datasets(dataset, prior_days):
  """
  Returns the reconfigured datasets for both 7 and 10 prior days
  
  Args: 
    dataset (vector, shape = [n, 1]): the column of interest from the entire, bigger dataset
    prior_days (int): the number of prior days for which the column of interest needs to be reconfigured
  """

  # Forming a dataset list to store values from the column of interest
  dSetList = []

  # Converting the received dataset into a numpy array for ease of operation
  dataset = dataset.to_numpy()

  # Forming the reconfigured datasets for 7 prior days
  if prior_days == 7:
    for i in range(len(dataset) - 7):
      dSetList.append([dataset[i], dataset[i+1], dataset[i+2], dataset[i+3],
                       dataset[i+4], dataset[i+5], dataset[i+6], dataset[i+7]])
    head = ['Day1','Day2','Day3','Day4','Day5','Day6','Day7','Day8']
    finalDataSet = pd.DataFrame(dSetList, columns = head)
    X_ = finalDataSet[['Day1','Day2','Day3','Day4','Day5','Day6','Day7']].to_numpy()
    y_ = finalDataSet['Day8'].to_numpy().reshape(-1,1)
    
  # Forming the reconfigured datasets for 10 prior days
  else:
    for i in range(len(dataset) - 10):
      dSetList.append([dataset[i], dataset[i+1], dataset[i+2], dataset[i+3],
                       dataset[i+4], dataset[i+5], dataset[i+6], dataset[i+7], 
                       dataset[i+8], dataset[i+9], dataset[i+10]])
    head = ['Day1','Day2','Day3','Day4','Day5','Day6','Day7','Day8','Day9','Day10','Day11']
    finalDataSet = pd.DataFrame(dSetList, columns = head)
    X_ = finalDataSet[['Day1','Day2','Day3','Day4','Day5','Day6','Day7','Day8','Day9','Day10']].to_numpy()
    y_ = finalDataSet['Day11'].to_numpy().reshape(-1,1)
  
  # Returning the reconfigured datasets
  return X_, y_

### Forming the Working Datasets for Training, Validation and Test for 7 Prior Days

In [8]:
# Forming the final training dataset
X_train, y_train = get_refurbished_datasets(dataset=trainDataSet, prior_days=7)
X_train = np.asarray(X_train).astype(np.float32)  # Converting to Tensor Form
y_train = np.asarray(y_train).astype(np.float32)  # Converting to Tensor Form

# Forming the final validation dataset
X_val, y_val = get_refurbished_datasets(dataset=valDataSet, prior_days=7)
X_val = np.asarray(X_val).astype(np.float32)  # Converting to Tensor Form
y_val = np.asarray(y_val).astype(np.float32)  # Converting to Tensor Form

# Forming the final validation dataset
X_test, y_test = get_refurbished_datasets(dataset=testDataSet, prior_days=7)
X_test = np.asarray(X_test).astype(np.float32)  # Converting to Tensor Form

# Printing out the shapes of each dataset
print('--- Shapes of Each Dataset (7 Prior Days) ---')
print(f'Training Features Dataset Shape: {X_train.shape}')
print(f'Training Labels Dataset Shape: {y_train.shape}')
print(f'Validation Features Dataset Shape: {X_val.shape}')
print(f'Validation Labels Dataset Shape: {y_val.shape}')
print(f'Test Features Dataset Shape: {X_test.shape}')
print(f'Test Labels Dataset Shape: {y_test.shape}')
print('---------------------------------------------')

--- Shapes of Each Dataset (7 Prior Days) ---
Training Features Dataset Shape: (3332, 7)
Training Labels Dataset Shape: (3332, 1)
Validation Features Dataset Shape: (1107, 7)
Validation Labels Dataset Shape: (1107, 1)
Test Features Dataset Shape: (1106, 7)
Test Labels Dataset Shape: (1106, 1)
---------------------------------------------


### Creating A User-Defined Function to Build Customized ANN Models

In [9]:
def build_model(init_empty_model, trainFeatures, n_hiddenLayers, n_neurons):
  """
  Returns the customized model 
  
  Args:
    init_empty_model (keras sequential model): the skeletion of the final model returned 
    trainFeatures (array, shape = [m, n]): the training dataset features
    n_hiddenLayers (int): the number of hidden layers to be added to the skeleton model
    n_neurons (int): the number of neurons each of the hidden layers are going to contain
  """

  # Calcuate the input dimensions to be used for building the model
  inputDim = trainFeatures.shape[1]

  # Putting the skeleton model into the variable which is going to get filled up soon
  filled_model = init_empty_model

  # Building the final model which is gonna be returned 
  # this is when n_hiddenLayers == 1
  filled_model.add(Dense(units = n_neurons,
                             
                         # Activation function for hidden layer
                         activation = 'relu',

                         input_dim = inputDim
                        )
                  )
  
  # this is when n_hiddenLayers == 2
  if n_hiddenLayers==2:
    filled_model.add(Dense(units = n_neurons, activation = 'relu'))
    
  # this is when n_hiddenLayers == 3
  if n_hiddenLayers==3:
    filled_model.add(Dense(units = n_neurons, activation = 'relu'))
    filled_model.add(Dense(units = n_neurons, activation = 'relu'))
  
  # Adding the final output layer to the final model
  filled_model.add(Dense(units=1))

  # Returning the final model
  return filled_model

### The ANN Models containing 1 Hidden Layer with Different No. of Hidden Neurons for 7 Prior Days

<img src="./plotsAndFigures/model_7_XX_1.png" width=900>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task.</b>
 
Plot the training and validation accuracy vs. training iteration (epochs) for each model (four figures in total).  

</div> 

### Creating A User-Defined Function In Order To Compile and Train Different ANN Models 

In [10]:
def compile_and_fit(model, trainFeatures, trainLabels, valFeatures,
                    valLabels, batchSize, n_epochs):
  """
  Returns a compiled model
  
  Args:
    model (keras sequential model): the filled model
    trainFeatures (array, shape = [m,n]): the training dataset features
    trainLabels (vector, shape = [m,1]): the training dataset labels
    valFeatures (array, shape = [p, n]): the validation dataset features
    valLabels (vector, shape = [p,1]): the validation dataset labels
    batchSize (int): the max no. of batches to be used for compiling the model
    n_epochs (int): the max no. iterations for simulating the model
  """

  # Compiling the model
  model.compile(# Objective (Loss) Function
                loss = 'mean_squared_error',  # meanSquaredError

                # Optimizer
                optimizer = 'adam'
               )
  
  # Training the model
  history = model.fit(x = trainFeatures,
                      y = trainLabels,
                      batch_size = batchSize, 
                      epochs = n_epochs,
                      validation_data = (valFeatures, valLabels))
  
  # Returning the compiled model
  return history 

### Creating A User-Defined Function to Plot Mean Squared Error Values v/s No. of Epochs

In [11]:
def plot_mse(history, n_epochs, n_hiddenLayers, n_neurons, prior_days):
  """
  Returns a plot with MSEs on y-axis and No. of Epochs on x-axis
  
  Args:
    history (dict): the dictionary containing everything about the compiled cmodel
    n_epcohs (int): the max no. of iterations used for simulating the model
    n_hiddenLayers (int): the no. of hidden layers in the ANN model
    n_neurons (int): the no. of neurons in each of the hidden layers
    prior_days (int): the no. of days for which data was collected for daily prediction thereafter
  """

  # Setting the string for title
  eachStr = []
  if n_hiddenLayers > 1:
    eachStr.append('Each')
    titleString = "MSE vs No. of Epochs for the ANN Model \n containing "+str(n_hiddenLayers)+" Hidden Layers with "+str(n_neurons)+" Neurons "+eachStr[0]+" ("+str(prior_days)+" Prior Days)"
  else:
    titleString = "MSE vs No. of Epochs for the ANN Model \n containing "+str(n_hiddenLayers)+" Hidden Layer with "+str(n_neurons)+" Neurons ("+str(prior_days)+" Prior Days)"

  # Estimating the MSE values for training and validation
  trainMSE = np.asarray(history.history['loss']).reshape(-1,1) 
  valMSE = np.asarray(history.history['val_loss']).reshape(-1,1)

  # Plotting Accuracy Scores v/s No. of Epochs 
  horzLine = np.linspace(1, n_epochs)
  horzLineData = np.array([0.00 for i in range(len(horzLine))])
  fig, ax = plt.subplots(figsize=(7,6)) 
  ax.plot(horzLine, horzLineData, 'k--', LineWidth=2)
  ax.plot(range(1, n_epochs+1), trainMSE, color='violet', label='Train')   
  ax.plot(range(1, n_epochs+1), valMSE, color='darkslategray', label='Validation')   
  ax.set_title(titleString, fontsize=16)  
  ax.set_ylabel('Mean Squared Errors', fontsize=15)   
  ax.set_xlabel('No. of Epochs', fontsize=15)  
  ax.set_xlim(left=1, right=n_epochs)
  ax.tick_params(axis='x', labelsize=14)
  ax.tick_params(axis='y', labelsize=14)
  ax.grid(axis='y', linestyle='-', alpha=0.5)
  plt.legend(prop={'size': 14}, ncol=1, labelspacing=0.05, loc='upper right')   
  plt.tight_layout()
  plt.show()   

### Plotting Mean Squared Errors v/s No. of Epochs for the ANN Models containing 1 Hidden Layer with Different No. of Hidden Neurons for 7 Prior Days

<img src="./plotsAndFigures/plotMSE_7_XX_1.png" width=900>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task.</b>
 
Also, for each trained model, plot your predicted adjusted closing prices from $10/1/16$ to now and the actual prices with respect to time.

</div>

### Creating A User-Defined Function for Plotting Real and Predicted Adjusted Closing Prices between 10/12/2016 and 03/05/2021

In [12]:
def plot_prices(dates, testLabels, predTestLabels, 
                n_hiddenLayers, n_neurons, prior_days):
  """
  Returns a plot with Adj Closing Prices on y-axis and Dates on x-axis
  
  Args:
    dates (list): list of dates
    testLabels (vector, shape = [p,1]): the test dataset labels
    predTestLabels (vector, shape = [p,1]): the predicted test dataset labels
    n_hiddenLayers (int): the no. of hidden layers in the ANN model
    n_neurons (int): the no. of neurons in each of the hidden layers
    prior_days (int): the no. of days for which data was collected for daily prediction thereafter
  """
  
  # Forming a pandas dataframe
  testDates = dates.iloc[prior_days:]
  dFrame = pd.DataFrame({'Date': testDates, 
                         'Real Prices': np.asarray(testLabels).astype(np.float64).flatten(), 
                         'Predicted Prices': predTestLabels.flatten()})

  # Setting the string for title
  eachStr = []
  if n_hiddenLayers > 1:
    eachStr.append('Each')
    titleString = "Adj Closing Prices vs Dates for the ANN Model \n containing "+str(n_hiddenLayers)+" Hidden Layers with "+str(n_neurons)+" Neurons "+eachStr[0]+" ("+str(prior_days)+" Prior Days)"
  else:
    titleString = "Adj Closing Prices vs Dates for the ANN Model \n containing "+str(n_hiddenLayers)+" Hidden Layer with "+str(n_neurons)+" Neurons ("+str(prior_days)+" Prior Days)"

  # Plotting the Predicted and Real Prices
  fig, ax = plt.subplots(figsize=(7,6)) 
  dFrame.plot(ax = ax, 
              x = "Date", 
              y = ["Real Prices", "Predicted Prices"],
              rot = 30)
  ax.tick_params(axis='x', labelsize=14)
  ax.tick_params(axis='y', labelsize=14)
  ax.set_title(titleString, fontsize=16) 
  ax.set_xlabel('Dates', fontsize=15)
  ax.set_ylabel('Adj Closing Prices \n (USD)', fontsize=15)
  ax.grid(axis='y', linestyle='-', alpha=0.5)
  ax.legend(prop={'size': 14}, ncol=1, labelspacing=0.05, loc='upper left')
  plt.show() 

### Plotting The Predicted and Real Prices from 10/1/2016 to now for the ANN Models containing 1 Hidden Layer with Different No. of Hidden Neurons for 7 Prior Days

<img src="./plotsAndFigures/plotPrices_7_XX_1.png" width=900>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task.</b>
 
Compute the average accuracy (MSE) of the adjusted closing prices on the test data.

</div>

### Plotting the MSE Values for Different ANN Models Trained So Far

<img src="./plotsAndFigures/barPlotMSEFirstFour.png">

<a id='QuestionC004'></a>
<div class=" alert alert-warning">
    <b>Question.</b>
 
Comparing the four models, what can you observe and why? Would any of them provide satisfactory accuracy?  

</div>

<a id='AnswerC004'></a>
<div class=" alert alert-info">
    <b>Answer.</b>
 
$\textbf{Observation}$: Although there are almost no perceptible differences in the real and adjusted closing prices for the different ANN models for $\texttt{7}$ prior days, the differences are highly manifest in the training, validation and test accuracy scores (MSEs). If only the MSEs of the test dataset are concerned, then the two-layer ANN model with $\texttt{20}$ hidden neurons in the hidden layer can provide satisfactory accuracy. 

</div>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task.</b>
 
Now re-do all the things from the previous part for two deeper ANN models: one with two hidden layers ($30$ neurons each), and one with three hidden layers ($20$ neurons each), and see if they can do a better job. Compare your results with the ANN model in previous part that has $60$ hidden neurons in one hidden layer. All these three models have the same number of neurons. 
 

</div>

### The ANN Models with Different No. of Hidden Layers and Different No. of Hidden Neurons for 7 Prior Days

<img src="./plotsAndFigures/model_7_30_20_1.png" width=900>

### Comparing the Training and Validation MSEs of the Above Two ANN Models with those of the ANN Model with 1 Hidden Layer and 60 Hidden Neurons for 7 Prior Days

<img src="./plotsAndFigures/plotMSE_7_60_30_20.png" width=900>

### Comparing the Real and Predicted Adjusted Closing Prices from 10/1/2016 to now of the Above Two ANN Models with those of the ANN Model with 1 Hidden Layer and 60 Hidden Neurons for 7 Prior Days

<img src="./plotsAndFigures/plotPrices_7_60_30_20.png" width=900>

### Plotting the MSE Values for Different ANN Models Trained So Far

<img src="./plotsAndFigures/barPlotMSEFirstSix.png">

<a id='QuestionD001'></a>
<div class=" alert alert-warning">
    <b>Question.</b>
 
What can you observe and why? Would any of them provide satisfactory accuracy?

</div>

<a id='AnswerD001'></a>
<div class=" alert alert-info">
    <b>Answer.</b>
 
$\textbf{Observation}$: Just like the previous one, if only the MSEs of the test dataset are visually compared, then the ANN model with $\texttt{30}$-$\texttt{30}$ hidden neurons can provide satisfactory accuracy amongst the ANN models with $\texttt{60}$, $\texttt{30}$-$\texttt{30}$ and $\texttt{20}$-$\texttt{20}$-$\texttt{20}$ hidden neurons in the hidden layer(s). Otherwise, there is no perceptible differences in the real and predicted adjusted closing prices for these $\texttt{3}$ ANN models.

</div>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task. </b>
 
Now re-do the previous two parts but the model now uses the data from past $10$ days.

</div>

### Forming the Working Datasets for Training, Validation and Test for 10 Prior Days

In [13]:
# Forming the final training dataset
X_train, y_train = get_refurbished_datasets(dataset=trainDataSet, prior_days=10)
X_train = np.asarray(X_train).astype(np.float32)  # Converting to Tensor Form
y_train = np.asarray(y_train).astype(np.float32)  # Converting to Tensor Form

# Forming the final validation dataset
X_val, y_val = get_refurbished_datasets(dataset=valDataSet, prior_days=10)
X_val = np.asarray(X_val).astype(np.float32)  # Converting to Tensor Form
y_val = np.asarray(y_val).astype(np.float32)  # Converting to Tensor Form

# Forming the final validation dataset
X_test, y_test = get_refurbished_datasets(dataset=testDataSet, prior_days=10)
X_test = np.asarray(X_test).astype(np.float32)  # Converting to Tensor Form

# Printing out the shapes of each dataset
print('--- Shapes of Each Dataset ---')
print(f'Training Features Dataset Shape: {X_train.shape}')
print(f'Training Labels Dataset Shape: {y_train.shape}')
print(f'Validation Features Dataset Shape: {X_val.shape}')
print(f'Validation Labels Dataset Shape: {y_val.shape}')
print(f'Test Features Dataset Shape: {X_test.shape}')
print(f'Test Labels Dataset Shape: {y_test.shape}')
print('------------------------------')

--- Shapes of Each Dataset ---
Training Features Dataset Shape: (3329, 10)
Training Labels Dataset Shape: (3329, 1)
Validation Features Dataset Shape: (1104, 10)
Validation Labels Dataset Shape: (1104, 1)
Test Features Dataset Shape: (1103, 10)
Test Labels Dataset Shape: (1103, 1)
------------------------------


### The ANN Models containing Different No. of Hidden Layers with Different No. of Hidden Neurons for 10 Prior Days

<img src="./plotsAndFigures/model_10_All.png" width=850>

### Plotting Training and Validation MSEs of the ANN Models containing Different No. of Hidden Layers with Different No. of Hidden Neurons for 10 Prior Days

<img src="./plotsAndFigures/plotMSE_10.png" width=900>

### Plotting Real and Predicted Prices from 10/1/2016 to now for the ANN Models containing Different No. of Hidden Layers with Different No. of Neurons for 10 Prior Days

<img src="./plotsAndFigures/plotPrices_10.png" width=900>

### Plotting the MSE Values for Different ANN Models Trained So Far

<img src="./plotsAndFigures/barPlotMSEAllTwelve.png">

<a id='QuestionE'></a>
<div class=" alert alert-warning">
    <b>Question. </b>
 
Do we get any benefits by using more data for the prediction? 

</div>

<a id='AnswerE'></a>
<div class=" alert alert-info">
    <b>Answer. </b>
 
$\textbf{Benefits}$: Once again, if the real and predicted closing prices are visually compared, then no benefits could be observed. However, if MSEs of the test dataset are compared, then it can be safely inferred that the ANN model with $\texttt{80}$ hidden neurons for $\texttt{10}$ prior days is the <u>best one</u>  in my case because the MSE of the adjusted closing prices of the test dataset for this model is the <u>lowest</u> amongst all the models. 



$\textbf{Final Conclusion}$: In my case, the ANN model with $\texttt{80}$ hidden neurons in the hidden layer for $\texttt{10}$ prior days is the <u>best model</u>.

</div>

<a id='StudentTask'></a>
<div class=" alert alert-warning">
    <b>Student Task.</b>
 
Use the best model you have to predict the adjusted closing price of NVDA on Wednesday ($3/24/2021$). 
 
</div>

### Prediction of Adjusted Closing Price for 3/24/2021

<img src="./plotsAndFigures/pricesLastTenDays.png" width=700>

<a id='AnswerF'></a>
<div class=" alert alert-info">
    <b>Answer.</b>
 
Using my best model, I predict the adjusted closing price of NVDA on Wednesday ($3/24/2021$) to be $\textbf{USD 532.89}$ (correct upto 2 places after decimal point). 
    
$\textbf{P.S.}$: The real (actual) adjusted closing price of NVDA on Wednesday ($3/24/2021$) was eventually $\textbf{USD 505.72}$ (correct upto 2 places after decimal point) which was way below what my best model predicted it to be!  
 
</div>