# Neural Network

1. **Imports necessary packages**: The script uses pandas for data manipulation, numpy for numerical operations, Keras for building the neural network, and Scikit-learn for model evaluation and hyperparameter tuning.

2. **Loads data**: Training and testing data are loaded from CSV files. The `date` column is dropped from the features, and the target values are converted to 1D arrays.

3. **Defines a model creation function**: This function builds and compiles a Keras Sequential model with a variable number of layers, neurons per layer, dropout rate, and optimizer.

4. **Creates a KerasRegressor model**: This model uses the previously defined function for model creation.

5. **Defines a parameter grid for grid search**: The grid includes various combinations of layers, neurons, dropout rates, optimizers, batch sizes, and epochs.

    - `"layers"`: This parameter defines the number of layers in the neural network. The options are tuples representing different layer configurations, specifically `(11, 11)`, `(11, 22)`, and `(11, 121)`.

    - `"neurons"`: This parameter defines the number of neurons in each layer. The options are `1`, `11`, and `25`.

    - `"dropout_rate"`: This parameter defines the dropout rate for regularization to prevent overfitting. The options are `0.0`, `0.3`, and `0.5`.

    - `"optimizer"`: This parameter defines the optimization algorithm to use for training the neural network. The options are `"Adam"`, `"RMSprop"`, and `"Adagrad"`.

    - `"batch_size"`: This parameter defines the number of samples per gradient update. The options are `10`, `25`, and `50`.

    - `"epochs"`: This parameter defines the number of times the learning algorithm will work through the entire training dataset. The options are `10`, `25`, and `50`.

6. **Performs a grid search**: The grid search uses the KerasRegressor model, the parameter grid, and the negative mean squared error as the scoring method.

7. **Fits the grid search on the training data**: The best model from the grid search is then evaluated on the test data.

8. **Evaluates the best model**: Predictions are made on the test features, and these predictions are compared with the actual test targets to calculate the mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and R-squared score.

9. **Prints the evaluation metrics**: The script outputs the name of the model, the best parameters found by the grid search, and the evaluation metrics.

_**Note:** As Neural Network training takes a look time, the best parameters were discovered using only 10% of the data (about 50000 rows). Then using the discovered best parameters, all data (about 500,000 rows) were trained._

In [1]:
# Import packages
import pandas as pd
import numpy as np
from sklearn.model_selection import GridSearchCV
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from math import sqrt

# Set random seed for reproducibility
np.random.seed(30)

# Load the split data from csv (10% of total data)
X_train = pd.read_csv("X_train.csv", skiprows=lambda l: l % 10 != 0).drop(columns=["date"])
X_test = pd.read_csv("X_test.csv", skiprows=lambda l: l % 10 != 0).drop(columns=["date"])
y_train = np.ravel(pd.read_csv("y_train.csv", skiprows=lambda l: l % 10 != 0))
y_test = np.ravel(pd.read_csv("y_test.csv", skiprows=lambda l: l % 10 != 0))


# Function to create model, required for KerasRegressor
def create_model(layers, neurons, dropout_rate, optimizer):
    """
    Create a neural network model with the specified architecture.

    Parameters:
    - layers (list): List of integers representing the number of nodes in each hidden layer.
    - neurons (int): Number of neurons in the input layer.
    - dropout_rate (float): Dropout rate to be applied after each hidden layer.
    - optimizer (str): Name of the optimizer to be used for training.

    Returns:
    - model (Sequential): Compiled Keras model.
    """

    model = Sequential()
    for i, nodes in enumerate(layers):
        if i == 0:
            model.add(Dense(neurons, input_dim=X_train.shape[1], activation="relu"))
            model.add(Dropout(dropout_rate))
        else:
            model.add(Dense(nodes, activation="relu"))
            model.add(Dropout(dropout_rate))
    model.add(Dense(1))  # Note: no activation beyond this point

    model.compile(optimizer=optimizer, loss="mean_squared_error")

    # print the model summary
    # model.summary()

    return model


# Create the KerasRegressor model
model = KerasRegressor(build_fn=create_model, verbose=0)

# Define the grid search parameters
param_grid = {
    "layers": [(11, 11), (11, 22), (11, 121)],
    "neurons": [1, 11, 25],
    "dropout_rate": [0.0, 0.3, 0.5],
    "optimizer": ["Adam", "RMSprop", "Adagrad"],
    "batch_size": [10, 25, 50],
    "epochs": [10, 25, 50],
}

# Create the GridSearchCV to find the best parameters
grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring="neg_mean_squared_error",
    cv=3,
    # n_jobs=-1,
)

# Fit the grid search
grid_search.fit(X_train, y_train)

# Evaluate the best model
best_model = grid_search.best_estimator_

# Make predictions on the test set
predictions = best_model.predict(X_test)

# Calculate the evaluation metrics
mse = mean_squared_error(y_test, predictions)
rmse = sqrt(mse)
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

# Print the evaluation metrics
print(f"Model: {model.__class__.__name__}")
print(f"- Best Parameters: {grid_search.best_params_}")
print(f"- MSE: {mse}")
print(f"- RMSE: {rmse}")
print(f"- MAE: {mae}")
print(f"- R2 Score: {r2}")

  model = KerasRegressor(build_fn=create_model, verbose=0)


Model: KerasRegressor
- Best Parameters: {'batch_size': 10, 'dropout_rate': 0.3, 'epochs': 50, 'layers': (11, 121), 'neurons': 25, 'optimizer': 'Adam'}
- MSE: 304.5404722198552
- RMSE: 17.451087995304338
- MAE: 6.87880855847875
- R2 Score: 0.05844453471083666


In [2]:
# Set random seed for reproducibility
np.random.seed(30)

# Load the split data from csv (all data)
X_train = pd.read_csv("X_train.csv").drop(columns=["date"])
X_test = pd.read_csv("X_test.csv").drop(columns=["date"])
y_train = np.ravel(pd.read_csv("y_train.csv"))
y_test = np.ravel(pd.read_csv("y_test.csv"))

# Create the KerasRegressor model
model = KerasRegressor(build_fn=create_model, verbose=0)

# Define the grid search parameters
param_grid = {
    "layers": [grid_search.best_params_["layers"]],
    "neurons": [grid_search.best_params_["neurons"]],
    "dropout_rate": [grid_search.best_params_["dropout_rate"]],
    "optimizer": [grid_search.best_params_["optimizer"]],
    "batch_size": [grid_search.best_params_["batch_size"]],
    "epochs": [grid_search.best_params_["epochs"], 100],
}

# Create the GridSearchCV to find the best parameters
grid_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    scoring="neg_mean_squared_error",
    cv=5,
    # n_jobs=-1,
)

# Fit the grid search
grid_search.fit(X_train, y_train)

# Evaluate the best model
best_model = grid_search.best_estimator_

# Make predictions on the test set
predictions = best_model.predict(X_test)

# Calculate the evaluation metrics
mse = mean_squared_error(y_test, predictions)
rmse = sqrt(mse)
mae = mean_absolute_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

# Print the evaluation metrics
print(f"Model: {model.__class__.__name__}")
print(f"- Best Parameters: {grid_search.best_params_}")
print(f"- MSE: {mse}")
print(f"- RMSE: {rmse}")
print(f"- MAE: {mae}")
print(f"- R2 Score: {r2}")

  model = KerasRegressor(build_fn=create_model, verbose=0)


Model: KerasRegressor
- Best Parameters: {'batch_size': 10, 'dropout_rate': 0.3, 'epochs': 50, 'layers': (11, 121), 'neurons': 25, 'optimizer': 'Adam'}
- MSE: 349.3167812148078
- RMSE: 18.69001822403627
- MAE: 7.602593240184953
- R2 Score: 0.06122559010663098
