# FRE7773 - Machine Learning in Financial Engineering
# Assignment 4
# Please submit this .ipynb file on Brightspace before **11:59 pm 11 December**.

## Models:
1. Deep learning for regression problem (30 points)
2. Recurrent Neural Networks (RNN) (30 points)
3. Deep Learning for Classification - Back-propagation from Scratch (40 points)

### General Guidelines:
1. You can choose the same or different financial applications for each model.
2. All your work, from explanations to code and analysis, should be presented within a
single Jupyter notebook.
3. While reusing content from other sources is allowed, always ensure you provide
appropriate citations and references.
4. This is an individual assignment. Adhere strictly to NYU’s policy on plagiarism. Late
submissions will not be accepted and will result in a deduction of 10 points (if late more
than 24h, will result in a deduction of 20 points).
### Key Emphasis:
While accuracy is valuable, a descriptive, clear, and convincing implementation and analysis of
your models hold greater weight in this assignment.

In [3]:
# install the package
! pip install fastparquet
! pip install tensorflow
! pip install yfinance


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
import numpy as np
import pandas as pd
import math
import datetime
import tensorflow as tf
import yfinance as yf
import matplotlib.pyplot as plt

from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import layers, models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras.layers import SimpleRNN, Dense

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import StandardScaler

# 1. Deep learning for regression problem

## 1.1 Introduction

**Problem Statement:**

- The problem at hand involves predicting the future price of Apple Inc. (AAPL) stock based on historical price data and relevant market indicators.

- For this problem, we will utilize a Multilayer Perceptron (MLP) regression model. MLP models are versatile neural networks capable of learning nonlinear relationships between input features and target variables. By training an MLP regression model on historical price data and relevant market indicators for AAPL stock, we aim to build a predictive model that can accurately forecast the numerical value of future stock prices.

- The solution to this problem enables traders and investors to make more informed decisions about buying, selling, or holding AAPL stock. By accurately predicting the numerical value of future stock prices, traders can implement more effective trading strategies, such as buying AAPL stock when prices are expected to increase or selling when prices are expected to decrease. Ultimately, this can lead to improved portfolio performance and potentially higher profits in the stock market.

**Data Description:**

The dataset used in this project contains historical price data of AAPL stock

1. **Timeframe**: 2014.1.1-2024.1.1
2. **Geography**: The dataset primarily focuses on AAPL stock trading in the United States market.
3. **Source**: yahoo finance
4. **Target variable:**closing price of AAPL stock price.
5. **Features:** The independent (features) variables consist of various market indicators and historical price data, including opening price, closing price, highest price, lowest price, trading volume.

## 1.2 Implementation

In [9]:
# Fetch historical stock price data from Yahoo Finance
ticker = 'AAPL'
start_date = '2014-01-01'
end_date = '2024-01-01'
data = yf.download(ticker, start=start_date, end=end_date)
data.index = data.index.strftime('%Y-%m-%d')

data.head()

[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2014-01-02,19.845715,19.893929,19.715,19.754642,17.234293,234684800
2014-01-03,19.745001,19.775,19.301071,19.320715,16.855732,392467600
2014-01-06,19.194643,19.52857,19.057142,19.426071,16.947651,412610800
2014-01-07,19.440001,19.498571,19.21143,19.287144,16.826437,317209200
2014-01-08,19.243214,19.484285,19.23893,19.409286,16.933001,258529600


#### Train-Test Split
Split the dataset into training, validation, and testing sets.
The training set comprises 80% of the data, the remaining data is used for testing;

In [11]:

X = data.drop('Close', axis=1).iloc[:-1]
y = data['Close'].shift(-1).iloc[:-1]
X_train_val, X_test, y_train_val, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [12]:
# Scale the data
scaler = StandardScaler()
X_train_val_scaled = scaler.fit_transform(X_train_val)
X_test_scaled = scaler.transform(X_test)

# Split the validation set;20% of the training set serves as the validation set.
X_train, X_val, y_train, y_val = train_test_split(X_train_val_scaled, y_train_val, test_size=0.2, random_state=42)

### Implement the MLP model


*    Define the architecture of the Multilayer Perceptron (MLP) model, which will be used for regression.
*   Configure the model for training by specifying the optimizer and loss function
*  The `fit()` method is called on the model, passing the training data and specifying the number of epochs, batch size, and validation data.


In [14]:
np.random.seed(42)
# Define the MLP model
mlp = Sequential() # fill in parameters
mlp.add(Dense(128, activation='relu', input_dim=X_train.shape[1]))  # Input layer, 128 neurons, ReLU activation
mlp.add(Dropout(0.2))  # Dropout layer to prevent overfitting
mlp.add(Dense(64, activation='relu'))  # Hidden layer with 64 neurons, ReLU activation
mlp.add(Dense(32, activation='relu'))  # Another hidden layer with 32 neurons
mlp.add(Dense(1))  # Output layer with 1 neuron for regression (predicting a continuous value)

# Compile the model
# you can modify the code if necessary
mlp.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])

# Train the model
history = mlp.fit(
    X_train, y_train,  # Training data
    epochs=100,  # Number of epochs for training
    batch_size=32,  # Batch size
    validation_data=(X_val, y_val),  # Validation data
    verbose=1 
    )

Epoch 1/100


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 9183.7959 - mae: 77.6694 - val_loss: 5689.6973 - val_mae: 59.2665
Epoch 2/100
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 3598.6208 - mae: 45.1873 - val_loss: 489.9407 - val_mae: 17.2141
Epoch 3/100
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 466.4017 - mae: 16.6911 - val_loss: 340.2370 - val_mae: 14.3181
Epoch 4/100
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 327.1206 - mae: 13.8272 - val_loss: 246.6574 - val_mae: 12.3343
Epoch 5/100
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 273.9019 - mae: 12.7911 - val_loss: 170.2478 - val_mae: 10.4808
Epoch 6/100
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 182.7730 - mae: 10.5756 - val_loss: 106.9088 - val_mae: 8.2475
Epoch 7/100
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1

### Evaluation and Discussion


*   Analyze why this model performed well or poorly for this specific problem and dataset.
*   Discuss the strengths and weaknesses of this approach, with particular attention to potential overfitting, underfitting, or any other relevant observations.



In [16]:
# Evaluate the model on test data
loss, mae = mlp.evaluate(X_test_scaled, y_test, verbose=1)

train_loss = history.history['loss']
val_loss = history.history['val_loss']
train_mae = history.history['mae']
val_mae = history.history['val_mae']

y_pred = mlp.predict(X_test_scaled)
residuals = y_test.values - y_pred.flatten()

# Print the evaluation results
print(f"Test Loss (MSE): {loss}")
print(f"Test MAE: {mae}")

print(f"Final Training Loss (MSE): {train_loss[-1]}")
print(f"Final Validation Loss (MSE): {val_loss[-1]}")
print(f"Final Training MAE: {train_mae[-1]}")
print(f"Final Validation MAE: {val_mae[-1]}")

print("First few residuals (actual - predicted):")
print(residuals[:10])

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 914us/step - loss: 3.5159 - mae: 1.1834
[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step 
Test Loss (MSE): 3.459104537963867
Test MAE: 1.1653990745544434
Final Training Loss (MSE): 20.865951538085938
Final Validation Loss (MSE): 4.000646114349365
Final Training MAE: 2.8869597911834717
Final Validation MAE: 1.228691816329956
First few residuals (actual - predicted):
[-0.16228104 -0.72394562 -0.01425171 -0.10179901  2.00271606  0.00528717
 -0.47060013  0.10610008 -2.99743652  0.35032463]


# Discussion:

### Model well or poorly:
High Training Loss: relatively good performance on the validation set, the training loss is high, indicating the model is not learning effectively on the training data.

### Test Loss and MAE:
The Test Loss (MSE) of 3.31 and Test MAE of 1.21 indicate that the model's performance is reasonable but not perfect. 

### Training vs Validation Loss and MAE:
The Training Loss (MSE) of 20.95 and Training MAE of 2.77 are considerably higher than the validation and test losses. This suggests that the model is struggling to generalize well to unseen data, which can be a sign of overfitting.

### Residuals:
The large residual of -4.20 on the 9th data point indicates that the model made a particularly large error on this instance.

### Potential Overfitting:
The model's significantly better performance on the validation set compared to the training set, with a much lower validation loss (3.78 vs 20.95 for MSE), suggests potential overfitting. The model might have memorized the training data too well and failed to generalize to the broader dataset.

### Potential Underfitting:
The model could be underfitting on the training data due to not being complex enough. The relatively high training loss and MAE, despite having a deep architecture, suggest that the model might not be capturing the underlying patterns in the data.

# 2. RNN

##  2.1 Introduction


**Problem Statement:**

The problem at hand is to develop a model capable of forecasting the future stock prices of a given company with precision. Similarly, we aim to predict the future stock prices of Apple Inc. (AAPL) based on historical price data.

Compared to Multilayer Perceptron (MLP), Recurrent Neural Networks (RNNs) are better suited for time series data like stock price prediction due to their ability to handle sequential data, capture temporal dependencies, and automatically extract relevant features, enabling more accurate forecasting and adaptation to dynamic market conditions.


**Data Description:**

The dataset used in this project is similar to Question 1
1. **Timeframe**: 2014.1.1-2024.1.1
2. **Geography**: The dataset primarily focuses on AAPL stock trading in the United States market.
3. **Source**: yahoo finance
4. **Target variable:**closing price of AAPL stock price.
5. **Features:** The independent (features) variables consist of various market indicators and historical price data, including opening price, closing price, highest price, lowest price, trading volume.

## 2.2 Implementation

### Task 1:  Model Implementation
Using the same dataset,

*   Define the RNN model architecture using the Sequential class from keras.models. Add an RNN layer with a specified number of units, followed by a Dense output layer.
*   Compile the model using the compile() method, specifying the Adam optimizer with a learning rate and mean squared error loss function.
*   Train the RNN model using the training data.


In [23]:
X_val_scaled = X_val.reshape((X_val.shape[0], 1, X_val.shape[1]))
X_train_scaled = X_train.reshape((X_train.shape[0], 1, X_train.shape[1]))
X_test_scaled = X_test_scaled.reshape((X_test_scaled.shape[0], 1, X_test_scaled.shape[1]))
# Define the RNN model
rnn = Sequential()

# Add RNN layer (units can be adjusted based on experimentation)
rnn.add(SimpleRNN(units=50, return_sequences=False, input_shape=(X_train_scaled.shape[1], X_train_scaled.shape[2])))

# Add output layer
rnn.add(Dense(1))  # Predicting a single value (AAPL closing price)

# Compile the model
rnn.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])

# Train the model
history = rnn.fit(
    X_train_scaled,  # Training features (scaled)
    y_train,         # Training target variable (closing price)
    epochs=50,       # Train for 50 epochs
    batch_size=32,   # Use a batch size of 32
    validation_data=(X_val_scaled, y_val),  # Validation data
    verbose=1        # Display training progress
)

Epoch 1/50


  super().__init__(**kwargs)


[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 9207.9062 - mae: 77.7148 - val_loss: 8625.6582 - val_mae: 75.5627
Epoch 2/50
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 9226.1240 - mae: 79.0824 - val_loss: 8329.0879 - val_mae: 75.2163
Epoch 3/50
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 8745.5332 - mae: 78.3662 - val_loss: 8033.2129 - val_mae: 74.7273
Epoch 4/50
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 8536.4619 - mae: 77.7184 - val_loss: 7727.9810 - val_mae: 74.0059
Epoch 5/50
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 7952.8359 - mae: 75.4097 - val_loss: 7415.4648 - val_mae: 72.9393
Epoch 6/50
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 7622.3789 - mae: 74.3323 - val_loss: 7092.7612 - val_mae: 71.3878
Epoch 7/50
[1m51/51[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m

### Task 2: Evaluation and Discussion


*   Compared with the MLP model, analyze why this model performed better or worse for this specific problem and dataset.
*   Discuss the strengths and weaknesses of this approach, with particular attention to potential overfitting, underfitting, or any other relevant observations.


In [25]:
# Evaluate the model on test data
loss, mae = rnn.evaluate(X_test_scaled, y_test, verbose=1)

# Print the evaluation results
print(f"Test Loss (MSE): {loss}")
print(f"Test MAE: {mae}")

[1m16/16[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 955us/step - loss: 778.6627 - mae: 19.1387
Test Loss (MSE): 754.56005859375
Test MAE: 18.769922256469727


# Discussion

### Strengths:
Sequential Data Handling: RNNs are designed for sequential data like stock prices, making them well-suited to capture temporal dependencies and patterns that MLPs might miss. They can learn relationships between past and future time steps, which is crucial for time series forecasting.

Dynamic Adaptation: RNNs can adapt to changing market conditions by "remembering" previous data points, potentially offering better predictions when the temporal structure is significant.

### Weaknesses:
Overfitting: RNNs, especially with small datasets or noisy data like stock prices, are prone to overfitting. If the model becomes too complex or trained too long, it can learn noise rather than meaningful patterns. 

Underfitting: If the model is too simple (e.g., too few units or epochs), it might not capture the full complexity of the stock price data, leading to underfitting. 

Vanishing Gradients: RNNs can suffer from the vanishing gradient problem, especially with long time sequences. This makes it harder for the network to learn long-term dependencies, limiting its effectiveness for tasks requiring long-range memory.

## 3. Deep Learning for Classification - Back-propagation from Scratch


**Problem Statement:**

The objective is to train a neural network **from scratch using NumPy** to gain a deeper understanding of backpropagation.


*   You should implement the forward pass and backward pass from
scratch, manually coding everything (e.g., cross-entropy loss, softmax, sigmoid activation) using NumPy.
*   Provide explanations for each step's code and computations.
*   Train the model for 3 epochs.

**Data Description:**

This project uses a standard **MNIST benchmark dataset** (provided in tensorflow.keras.datasets package), a well-known dataset in machine learning and computer vision. The MNIST dataset contains grayscale images of handwritten digits (0-9), commonly used for training and testing digit recognition models.

**Model Guidance:**

The network will include:
* Sigmoid activations for neurons, softmax activation for the output layer, and cross-entropy loss.
* An architecture with **two hidden layers, each containing 32 neurons**. The number of neurons at each layer will be as follows: **784 - 32 - 32 - 10**.
* Weights initialized from a normal distribution with **mean 0** and **variance 1 / max(n_in, n_out)**, where n_in and n_out represent the number of input and output neurons, respectively. **Biases will be initialized to 0**.

**Evaluation and Discussion:**

After completing the implementation from scratch, you need to repeat the same classification task using a deep learning library such as **PyTorch or TensorFlow**. This will involve using built-in layers and backpropagation. Compare and discuss the results obtained from these two approaches.

**Note**: You should use a constant seed for random number generation to ensure the reproducibility of their results.

In [29]:
# write your code here.

# Fixing the random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape the data to 784-dimensional vectors and normalize it
x_train = x_train.reshape(-1, 784) / 255.0
x_test = x_test.reshape(-1, 784) / 255.0

# Convert labels to one-hot encoding
y_train_onehot = np.eye(10)[y_train]
y_test_onehot = np.eye(10)[y_test]

# Sigmoid activation function
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Sigmoid derivative
def sigmoid_derivative(x):
    return x * (1 - x)

# Softmax activation function
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))  # Stability improvement
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

# Cross-entropy loss function
def cross_entropy_loss(y_pred, y_true):
    m = y_true.shape[0]
    return -np.sum(y_true * np.log(y_pred + 1e-10)) / m

# Initialize weights and biases
def initialize_parameters(input_size, hidden_size, output_size):
    # Randomly initialize weights with a normal distribution
    W1 = np.random.randn(input_size, hidden_size) * np.sqrt(1. / input_size)
    b1 = np.zeros((1, hidden_size))
    W2 = np.random.randn(hidden_size, hidden_size) * np.sqrt(1. / hidden_size)
    b2 = np.zeros((1, hidden_size))
    W3 = np.random.randn(hidden_size, output_size) * np.sqrt(1. / hidden_size)
    b3 = np.zeros((1, output_size))
    
    return W1, b1, W2, b2, W3, b3

# Forward pass function
def forward_pass(x, W1, b1, W2, b2, W3, b3):
    z1 = np.dot(x, W1) + b1
    a1 = sigmoid(z1)
    
    z2 = np.dot(a1, W2) + b2
    a2 = sigmoid(z2)
    
    z3 = np.dot(a2, W3) + b3
    a3 = softmax(z3)
    
    return a1, a2, a3

# Backward pass function (Backpropagation)
def backward_pass(x, y_true, a1, a2, a3, W1, W2, W3):
    m = x.shape[0]
    
    # Output layer error (cross-entropy + softmax)
    dz3 = a3 - y_true
    dW3 = np.dot(a2.T, dz3) / m
    db3 = np.sum(dz3, axis=0, keepdims=True) / m
    
    # Hidden layer 2 error
    dz2 = np.dot(dz3, W3.T) * sigmoid_derivative(a2)
    dW2 = np.dot(a1.T, dz2) / m
    db2 = np.sum(dz2, axis=0, keepdims=True) / m
    
    # Hidden layer 1 error
    dz1 = np.dot(dz2, W2.T) * sigmoid_derivative(a1)
    dW1 = np.dot(x.T, dz1) / m
    db1 = np.sum(dz1, axis=0, keepdims=True) / m
    
    return dW1, db1, dW2, db2, dW3, db3

# Training function
def train(x_train, y_train, epochs=3, learning_rate=0.01):
    input_size = x_train.shape[1]
    hidden_size = 32
    output_size = 10
    
    # Initialize parameters
    W1, b1, W2, b2, W3, b3 = initialize_parameters(input_size, hidden_size, output_size)
    
    # Train the network
    for epoch in range(epochs):
        # Forward pass
        a1, a2, a3 = forward_pass(x_train, W1, b1, W2, b2, W3, b3)
        
        # Compute the loss
        loss = cross_entropy_loss(a3, y_train)
        
        # Backward pass
        dW1, db1, dW2, db2, dW3, db3 = backward_pass(x_train, y_train, a1, a2, a3, W1, W2, W3)
        
        # Update weights and biases using gradient descent
        W1 -= learning_rate * dW1
        b1 -= learning_rate * db1
        W2 -= learning_rate * dW2
        b2 -= learning_rate * db2
        W3 -= learning_rate * dW3
        b3 -= learning_rate * db3
        
        # Print the loss every 100 iterations
        if epoch % 100 == 0:
            print(f"Epoch {epoch}, Loss: {loss}")
    
    return W1, b1, W2, b2, W3, b3

# Prediction function
def predict(x, W1, b1, W2, b2, W3, b3):
    _, _, a3 = forward_pass(x, W1, b1, W2, b2, W3, b3)
    return np.argmax(a3, axis=1)

# Evaluate the model
def evaluate(x, y_true, W1, b1, W2, b2, W3, b3):
    y_pred = predict(x, W1, b1, W2, b2, W3, b3)
    accuracy = np.mean(y_pred == np.argmax(y_true, axis=1))
    return accuracy

# Train the model
W1, b1, W2, b2, W3, b3 = train(x_train, y_train_onehot, epochs=3, learning_rate=0.01)

# Evaluate the model on the test set
accuracy = evaluate(x_test, y_test_onehot, W1, b1, W2, b2, W3, b3)
print(f"Test Accuracy: {accuracy * 100:.2f}%")


Epoch 0, Loss: 2.434248573305333
Test Accuracy: 9.80%


# Discussion:
1. Advantages of Training from Scratch: This approach gives a deeper understanding of how backpropagation works by manually implementing the forward and backward passes. It also helps understand how the gradients are propagated and how the weights are updated.
2. Limitations: This implementation is less efficient than using frameworks like TensorFlow or PyTorch, which optimize these operations. Additionally, this model may not converge as efficiently as models trained using these libraries due to the lack of advanced optimization techniques such as momentum or adaptive learning rates.