# Neural Network 
## Activity Classification

This notebook serves as a record of preliminary investigations into using a neural network model for LISA. Hopefully, it serves as a useful starting point for future work - but shouldn't be treated as 100% verified!

A version of this process also exists in `lisa/modeling/neural_net.py`, used to train a model on the full dataset on HPC. The resulting model is saved in `models/neural_net` - feel free to experiment with this pre-trained model.

### Introduction
We're using the tensorflow keras API, the most popular approach for neural nets: https://www.tensorflow.org/guide/keras

Neural nets consist of layers of interconnected nodes (neurons) that process data and learn patterns, like a human brain. They can be used for classification, regression, and even unsupervised tasks. They are generally more complex than other models, and so require more data and computational resources. 

Typically (but by no means in every case) they can be more successful than other types of models, but require much more tuning to get there, and are more prone to overfitting.

### Installation
Installing tensorflow/keras is slightly more involved than other models (i.e. scikitlearn) - see `post_install.sh` for macOS-specific differences - so be prepared for this to potentially take some work. Verify your successful installation with the cell below.

In [None]:
import keras_tuner as kt
import polars as pl
from sklearn.preprocessing import LabelEncoder
from tensorflow import keras

from lisa.config import MODELS_DIR, PROCESSED_DATA_DIR
from lisa.features import sequential_stratified_split, standard_scaler

# Check packages have installed correctly
print("Keras version: ", keras.__version__)
print("Keras tuner version: ", kt.__version__)

Load the training and validation data below. We need to convert the data to `numpy ndarrays`, and use a label encoder to convert the categorical variables to numbers.

In [None]:
df = pl.scan_parquet(PROCESSED_DATA_DIR / "P1.parquet")

X_train, X_val, y_train, y_val = sequential_stratified_split(
    df, 0.8, 800, ["ACTIVITY"]
)

label_encoder = LabelEncoder()

X_train, X_val, scaler = standard_scaler(X_train, X_val)

X_train = X_train.to_numpy()
X_val = X_val.to_numpy()

y_train = y_train.collect().to_numpy()
y_val = y_val.collect().to_numpy()

y_train = label_encoder.fit_transform(y_train)
y_val = label_encoder.transform(y_val)

Now we define our simple feedforward (information only flows forward; backpropagation is another type) model for 3-class classification. 
It has 3 layers: input (also a hidden layer), a second hidden layer, and output. 

We also have 3 hyperparameters specified; `neurons1` and `neurons2` determine the number of neurons in the first 2 layers, and `dropout` determines the fraction of neurons to randomly drop between layers (to prevent overfitting).

To run through the other details:
 - ReLU (Rectified Linear Unit): Activation function to introduce non-linearity.
 - Softmax: Activation function which outputs probabilities for each class. The class with the highest probability is the predicted class.
 - Adam: Adaptive optimization algorithm that adjusts learning rates during training for faster convergence.
 - Sparse Categorical Cross Entropy: Loss function, suitable for multi-class classification when the target labels are integers (not one-hot encoded).

In [None]:
def create_classifier(neurons1, neurons2, dropout):
    model = keras.models.Sequential([
        keras.layers.Dense(neurons1, activation='relu', input_shape=(X_train.shape[1],)),
        keras.layers.Dropout(dropout),
        keras.layers.Dense(neurons2, activation='relu'),
        keras.layers.Dropout(dropout),
        keras.layers.Dense(3, activation='softmax')  # 3 classes for classification
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model



Now we can just define a function specifying the hyperparameter value ranges we want to sample:

In [None]:
def build_classifier(hp):
    neurons1 = hp.Int("units1", min_value=32, max_value=512, step=32)
    neurons2 = hp.Int("units2", min_value=32, max_value=512, step=32)
    dropout = hp.Float("dropout", min_value=0.1, max_value=0.5, step=0.1)

    model = create_classifier(
        neurons1=neurons1, neurons2=neurons2, dropout=dropout
    )
    return model

Define the tuner used for hyperparameter optimisation. 

We'll use a typical Bayesian approach. 
`max_trials` specifies the number of trials we want to run (and so will impact time taken).

In [None]:
tuner = kt.BayesianOptimization(
    build_classifier,
    objective='val_loss',
    max_trials=5,
    overwrite=True)

Finally, we call `search()` to train the model and and find the best hyperparameters. 
You should get a nice output of the progress and current results.

`Epochs` and `batch_size` are also important parameters:
- `Epochs`: Number of full passes through the training set.
- `batch_size`: Number of training samples processed before updating model weights.

They will affect performance, runtime, and memory usage.

In [None]:
tuner.search(X_train, y_train, epochs=10, validation_data=(X_val, y_val), batch_size=32)

We can get the best model and hyperparameters, and take a closer look:

In [None]:
best_model = tuner.get_best_models()[0]
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]

print(best_hps.get('units1'))
print(best_hps.get('units2'))
print(best_hps.get('dropout'))

best_model.summary()

We can [save the trained model](https://www.tensorflow.org/guide/keras/serialization_and_saving). Save to a `keras` file rather than `pkl`.

In [None]:
best_model.save(MODELS_DIR / "best_model.keras")

## Speed Regression
The process for training a regression model appears to be fairly similar. 

The key differences is the single linear output for the final layer, and the loss and metrics functions in the compilation step. <br>We'll forgo the hyperparameter tuning and model saving this time.

In [None]:
# Define the regression model
regression_model = keras.models.Sequential([
    keras.layers.Dense(64, activation='relu', input_shape=(X_train.shape[1],)),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(32, activation='relu'),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(1, activation='linear')  # Single output for regression
])

# Compile the model
regression_model.compile(
    optimizer='adam',
    loss='mse',  # Mean Squared Error for regression
    metrics=['mae']  # Mean Absolute Error as an evaluation metric
)

# Summary of the model
regression_model.summary()

In [None]:
# Process the data for regression
X_train, X_val, y_train, y_val = sequential_stratified_split(
    df, 0.8, 800, ["SPEED"]
)

label_encoder = LabelEncoder()

X_train, X_val, scaler = standard_scaler(X_train, X_val)

X_train = X_train.to_numpy()
X_val = X_val.to_numpy()

y_train = y_train.collect().to_numpy()
y_val = y_val.collect().to_numpy()

y_train = label_encoder.fit_transform(y_train)
y_val = label_encoder.transform(y_val)

In [None]:
# Train the regression model
history = regression_model.fit(
    X_train, y_train,  # X_train: features, y_train: continuous target values
    epochs=5,
    batch_size=32,
    validation_split=0.2
)

# Evaluate the model
loss, mae = regression_model.evaluate(X_val, y_val)
print(f"Test MAE: {mae:.4f}")