# Notebook 6: Neural Network Modeling of Spindle Current

In the notebook 4, we modeled the **spindle current** (`ACT_CURRENT_S`) using a **Random Forest Regressor** to understand how process variables influence current consumption during milling.
In this notebook, we will **develop a neural network model** to predict the spindle current using the same set of process features. The goal is to compare the performance of a **deep learning approach** with the previously used **Random Forest model**.

By completing this exercise, you will:

- Understand how to design and train a **feedforward neural network** for regression tasks  
- Learn how to **scale data** before training a neural network  
- Apply **performance metrics** such as **MAE**, **MSE**, and **R²** to evaluate model performance  
- Visualize **learning curves** and compare predicted vs. actual values  

### Given Information and Guidelines

You are provided with:
- Training, Validation and Testing datasets: `df_Train`, `df_Val`, and `df_Test`  
- A list of **features** and the **target variable**
- Evaluation metrics to use:  
  - `Mean Absolute Error (MAE)`  
  - `Mean Squared Error (MSE)`  
  - `R² Score`

>### The **neural network architecture and hyperparameters** (number of layers, units, dropout, learning rate, etc.) are **not provided**. You must **tune and select** them yourself through experimentation.

## Step 1: Import Libraries and Set Up Environment

Import all missing libraries in this cell, if there are more required ones depending on your neural network (e.g., tensorflow)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import glob
import os
import time
import numpy as np
import seaborn as sns
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler


# Consistent plot style
sns.set(style="whitegrid")

files = sorted(glob.glob("Rough_Train_New*.csv.gz"))
dfs = [pd.read_csv(f, compression="gzip") for f in files]
df_Train = pd.concat(dfs, ignore_index=True)
df_Test = pd.read_csv("Rough_Test_New.csv.gz", compression="gzip")
df_Test_2 = pd.read_csv("Rough_Test_D6D6D8.csv.gz", compression="gzip")
df_Val = pd.read_csv("Rough_Val_New.csv.gz", compression="gzip")


## Step 2: Data Verification
Before modeling, ensure that **no overlap** exists between training and test data.

Look into Notebook 4 for inspiration. 

## Step 3: Define Features and Target

In [None]:
# Define target variable and feature list
target_variables = ['ACT_CURRENT_S']

features = [
    'ACTIVE_TOOL_LENGTH', 'SPINDLE_SPEED', 'Fz_N', 'Fy_N', 'Fx_N',
    'MultiDexel:GridX-EngagementHeight', 'MultiDexel:GridY-EngagementHeight', 
    'MultiDexel:GridZ-EngagementHeight', 'Tool_Diameter', 'Feed_Rate', 
    'Feed_per_Tooth', 'Cutting_Speed', 'ae', 'Qw', 'F_xyz'
]


## Step 4: Data Preprocessing

Before training a neural network, the input features must be **standardized** to have zero mean and unit variance.

In [None]:
scaler = StandardScaler()

X_train = scaler.fit_transform(df_Train[features])
y_train = df_Train['ACT_CURRENT_S'].values

X_val = scaler.transform(df_Val[features])
y_val = df_Val['ACT_CURRENT_S'].values

X_test = scaler.transform(df_Test[features])
y_test = df_Test['ACT_CURRENT_S'].values

## Step 5: Build and train the neural network

### Guidelines:
- Use **Dense layers** with `ReLU` activations  
- Include **Dropout layers** for regularization  
- Experiment with different **numbers of layers** and **units per layer**  
- Compile the model using:
  - **Loss:** Mean Squared Error (`'mse'`)  
  - **Metric:** Mean Absolute Error (`'mae'`)  
  - **Optimizer:** `Adam` with a learning rate you choose  
- Implement **EarlyStopping** to prevent overfitting (monitor `'val_loss'`)

## Step 6: Model Evaluation

After training, evaluate your model using the following metrics:
- `R² Score`
- `Mean Absolute Error (MAE)`
- `Mean Squared Error (MSE)`

Compare the results against the **Random Forest model** from the previous notebook.

## Step 7: Visualization

Create the following visualizations to interpret model performance:

1. **Learning Curves** – plot training vs. validation loss  
   *(Hint: use `history.history['loss']` and `history.history['val_loss']`)*  

2. **Predictions vs. Actual Values** – compare predicted spindle current with true values  
   *(Hint: `plt.plot(y_test, label='Actual')` vs. `plt.plot(y_pred, label='Predicted')`)*