# Unedited Code for Models

This file contains preliminary code for the following models:

1) Decision tree
2) CNN with heatmap
3) CNN with DT (parallel fusion architecture requiring different inputs for each model)
4) Logistic regression

These models will need to be trained on the data and their performance (precision, recall, f-score) measured against our main model, the sequential CNN-DT model (one input, passed through the CNN then to the DT). The code here is yet to be evaluated and may require major modifications for it to work.

All models are meant to predict two variables - process (values 'dressing' and 'grinding') and condition ('anomalous' and 'normal').

### Decision Tree

**Input:** Four features extracted from recordings - variance of acoustic emissions, energy of acoustic emissions, variance of current, energy of current

In [None]:
'''
To modify the given decision tree code to take as input a list of tuples with four
elements each and return predictions for two features, namely condition (values 'anomalous'/'normal')
and process ('dressing'/'grinding'), we need to adjust the input data handling and model training to
accommodate multi-output classification. Here's how you can do it:

Adjust Input Data Handling: Ensure the input data is structured correctly for multi-output classification.
The input data should contain features X and a tuple of labels y where y contains two arrays: one for
condition and one for process.

Modify Model Training: Use a single DecisionTreeClassifier with the capability to handle multi-output
classification directly or train two separate models for each output.

Update Prediction and Evaluation: Adjust the prediction and evaluation to handle two outputs.

Here's the modified code:
'''

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
import numpy as np

def TimeSeriesDT(data, args):
    X = np.array([list(tup) for tup in data[0]])  # Convert list of tuples to array
    y_condition = np.array([label[0] for label in data[1]])  # Extract condition labels
    y_process = np.array([label[1] for label in data[1]])  # Extract process labels
    y = np.vstack((y_condition, y_process)).T  # Stack condition and process labels for multi-output

    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=args.split, random_state=42)

    # Train model
    DF = DecisionTreeClassifier(random_state=42)  # gini by default
    DF.fit(X_train, y_train)
    y_pred = DF.predict(X_test)

    # Evaluate model
    accuracy_condition = accuracy_score(y_test[:, 0], y_pred[:, 0])
    accuracy_process = accuracy_score(y_test[:, 1], y_pred[:, 1])
    print(f"Accuracy Condition: {accuracy_condition}")
    print(f"Accuracy Process: {accuracy_process}")

    # Compute metrics for both outputs
    metrics_condition = compute_metrics(y_pred[:, 0], y_test[:, 0])
    metrics_process = compute_metrics(y_pred[:, 1], y_test[:, 1])
    print("Metrics for Condition:", metrics_condition)
    print("Metrics for Process:", metrics_process)

def acc_and_f1(preds, labels):
    f1 = f1_score(y_true=labels, y_pred=preds, average='macro')
    pre = precision_score(y_true=labels, y_pred=preds, average='macro')
    recall = recall_score(y_true=labels, y_pred=preds, average='macro')
    return {
        "f1": f1,
        "precision": pre,
        "recall": recall
    }

def compute_metrics(preds, labels):
    return acc_and_f1(preds, labels)

'''
This code assumes data[0] is a list of tuples representing the features and
data[1] is a list of tuples where each tuple contains two elements: the condition
label and the process label for each sample. The DecisionTreeClassifier is used
for multi-output classification by fitting and predicting on y, which is structured
as an array of shape (n_samples, 2), where the first column contains the condition
labels and the second column contains the process labels. The metrics are computed
separately for each output to evaluate the model's performance on predicting both
condition and process.
'''

### CNN with Heatmap

The first CNN uses a Grad-CAM heatmap. the second a simpler alternative. If time permits, Grad-CAM would be the preferred variant.

**Input:** Wavelet transformed data not subjected to manual feature extraction.

In [None]:
'''
To add a heatmap feature for visualizing the parameters that have the strongest bearing on the
CNN's classification decisions, you can use the Gradient-weighted Class Activation Mapping (Grad-CAM) technique.
Grad-CAM uses the gradients of any target concept (say, the output of the model for a particular class) flowing
into the final convolutional layer to produce a coarse localization map highlighting the important regions in
the image for predicting the concept.

However, since your model is for time series data and not images,
the visualization will highlight important time steps instead of spatial regions.
The implementation steps are as follows:

Modify the Model: Add hooks to capture the gradients and the activations of the last convolutional layer.
Grad-CAM Algorithm: Implement the Grad-CAM algorithm to use these gradients and activations to generate the heatmap.
Visualization: Plot the heatmap along with the original time series data.
Here's how you can modify your TimeSeriesCNN class and add a Grad-CAM visualization:
'''

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F

class TimeSeriesCNN(nn.Module):
    def __init__(self, args, device, sequence_length):
        super(TimeSeriesCNN, self).__init__()
        self.args = args
        self.device = device
        self.sequence_length = sequence_length
        self._build_model()
        self.gradients = None

    def _build_model(self):
        self.conv1 = nn.Sequential(
            nn.Conv1d(in_channels=5, out_channels=256, kernel_size=32),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=8)
        )

        self.conv2 = nn.Sequential(
            nn.Conv1d(in_channels=256, out_channels=64, kernel_size=4),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=int((self.sequence_length - 32)/8 - 4 + 1))
        )
        # Register hook
        self.conv2.register_backward_hook(self.save_gradients)

        self.mlp = nn.Sequential(
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 2)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        self.output_feature = x
        x = x.view(-1, 64)  # Flatten the tensor
        x = self.mlp(x)
        return F.log_softmax(x, dim=1)

    def save_gradients(self, module, grad_input, grad_output):
        self.gradients = grad_output[0]

    def get_activations_gradient(self):
        return self.gradients

    def get_activations(self, x):
        return self.output_feature

def generate_heatmap(weighted_activations):
    # Generate the heatmap
    heatmap = torch.mean(weighted_activations, dim=1).squeeze()
    heatmap = np.maximum(heatmap.cpu().detach().numpy(), 0)
    heatmap /= np.max(heatmap)
    return heatmap

def visualize_heatmap(original_time_series, heatmap):
    plt.matshow([heatmap], cmap='jet', aspect='auto')
    plt.colorbar()
    plt.show()

    # Overlay the heatmap on the original time series
    # Assuming original_time_series is a 2D array of shape (time_steps, features)
    # For simplicity, let's plot the heatmap against the first feature
    plt.plot(original_time_series[:, 0])
    plt.imshow(np.array([heatmap for _ in range(original_time_series.shape[1])]).T, cmap='jet', aspect='auto', alpha=0.5)
    plt.colorbar()
    plt.show()

'''
Note: This implementation assumes you have a single output feature map from the last convolutional layer (self.conv2).
The generate_heatmap function computes the mean of the gradients-weighted activations across the channels to produce
a 1D heatmap for time series data. The visualize_heatmap function is a simple visualization of the heatmap.
You might need to adjust the visualization part based on the specific structure of your time series data.

To use Grad-CAM, you would:

Perform a forward pass with the input data.
Perform a backward pass with the gradients from the target class.
Capture the output feature maps and gradients.
Compute the weighted activations to generate the heatmap.
Visualize the heatmap along with the original time series data.
'''


In [None]:
'''
For a simpler approach to visualize the influence of different time steps on the model's predictions without
implementing Grad-CAM, you can use the activations directly from the last convolutional layer. This method
won't provide as precise insights as Grad-CAM but can still offer useful visualizations to understand which
parts of the input time series the model is focusing on.

Here's a simplified approach:

Extract Activations: Modify the model to return the activations from the last convolutional layer in addition
to the final output.
Average Activations: Compute the average of these activations across all channels to get a single 1D array
representing the importance of each time step.
Normalize: Normalize this array to have values between 0 and 1.
Visualize: Plot these normalized values as a heatmap alongside the original time series data.
Here's how you can modify your TimeSeriesCNN class and add a simple heatmap visualization:
'''

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.nn.functional as F

class TimeSeriesCNN(nn.Module):
    def __init__(self, args, device, sequence_length):
        super(TimeSeriesCNN, self).__init__()
        self.args = args
        self.device = device
        self.sequence_length = sequence_length
        self._build_model()

    def _build_model(self):
        self.conv1 = nn.Sequential(
            nn.Conv1d(in_channels=5, out_channels=256, kernel_size=32),
            nn.BatchNorm1d(256),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=8)
        )

        self.conv2 = nn.Sequential(
            nn.Conv1d(in_channels=256, out_channels=64, kernel_size=4),
            nn.BatchNorm1d(64),
            nn.ReLU(),
            nn.MaxPool1d(kernel_size=int((self.sequence_length - 32)/8 - 4 + 1))
        )

        self.mlp = nn.Sequential(
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, 2)
        )

    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)
        activations = x
        x = x.view(-1, 64)  # Flatten the tensor
        x = self.mlp(x)
        return F.log_softmax(x, dim=1), activations

def visualize_simple_heatmap(original_time_series, activations):
    # Assuming activations is a tensor of shape (batch_size, num_channels, length)
    # Compute the average across channels
    heatmap = torch.mean(activations, dim=1).squeeze()
    heatmap = heatmap.cpu().detach().numpy()
    heatmap = (heatmap - np.min(heatmap)) / (np.max(heatmap) - np.min(heatmap))  # Normalize

    # Plotting
    plt.figure(figsize=(10, 4))
    plt.plot(original_time_series[:, 0], label='Original Time Series')
    plt.imshow(np.array([heatmap for _ in range(original_time_series.shape[1])]).T, cmap='hot', aspect='auto', alpha=0.5)
    plt.colorbar(label='Activation Intensity')
    plt.legend()
    plt.show()

# Example usage:
# model = TimeSeriesCNN(args, device, sequence_length)
# output, activations = model(input_data)
# visualize_simple_heatmap(original_time_series, activations)

'''
This approach provides a straightforward way to visualize which parts of the input time series are activating the neurons
in the last convolutional layer the most, giving you a rough idea of what the model is paying attention to. Remember, this
method averages across all channels and thus might lose some specificity in what different channels might be focusing on.
'''

### Parallel CNN-DT Model

**Input for CNN:** Wavelet transformed data not subjected to manual feature extraction.

**Input for DT:** Four features extracted from recordings - variance of acoustic emissions, energy of acoustic emissions, variance of current, energy of current

In [None]:
# Step 1: Train CNN
# - Train the CNN model on the original dataset.
# - Implement training loop, loss function, optimizer, etc., for the CNN.

# Step 2: Feature Extraction with CNN
# - Use the trained CNN model to extract features from the dataset.
# - This involves passing the dataset through the CNN and using the output of the last convolutional layer as features.

# Step 3: Train DT
# - Use the extracted features from Step 2 as input to train the Decision Tree classifier.
# - Implement the training process for the DT using the extracted features and corresponding labels.

# Step 4: Combine Predictions
# - For making predictions, input the data through both models to get their predictions.
# - Combine these predictions using a weighted scheme to make a final decision.
# - The weights can be determined based on validation performance or set empirically.

# Implementation in Python
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

class MixedModel:
    def __init__(self, cnn_model, dt_model, cnn_weight=0.5):
        self.cnn_model = cnn_model
        self.dt_model = dt_model
        self.cnn_weight = cnn_weight

    def train_cnn(self, train_loader, epochs=10):
        # Implement CNN training here
        pass

    def extract_features(self, data_loader):
        # Implement feature extraction using CNN here
        pass

    def train_dt(self, features, labels):
        # Implement DT training here
        pass

    def predict(self, data_loader):
        # Implement prediction method that combines CNN and DT predictions
        pass

# Assuming TimeSeriesCNN and TimeSeriesDT are defined as provided
# Initialize CNN and DT
cnn_model = TimeSeriesCNN(args, device, sequence_length)
dt_model = DecisionTreeClassifier()

# Create MixedModel instance
mixed_model = MixedModel(cnn_model, dt_model)

# Train CNN
# mixed_model.train_cnn(train_loader)

# Extract features using CNN
# features, labels = mixed_model.extract_features(data_loader)

# Train DT with extracted features
# mixed_model.train_dt(features, labels)

# Make predictions with the mixed model
# predictions = mixed_model.predict(test_loader)

### Logistic Regression
No need to edit this one, but comments welcome

**Input:** Four features extracted from recordings - variance of acoustic emissions, energy of acoustic emissions, variance of current, energy of current

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
import numpy as np

# Sample dataset
dataset = [
    (0.0028296275479566097, 0.014225336147977692, 106988.68985835543, 111653.46909800009, 'anomalous', 'grinding'),
    # Add the rest of your dataset here
]

# Extract features and labels
features = np.array([datapoint[:4] for datapoint in dataset])
labels_anomalous = np.array([datapoint[4] for datapoint in dataset])
labels_process = np.array([datapoint[5] for datapoint in dataset])

# Encode categorical labels
encoder_anomalous = LabelEncoder()
encoder_process = LabelEncoder()
labels_anomalous_encoded = encoder_anomalous.fit_transform(labels_anomalous)
labels_process_encoded = encoder_process.fit_transform(labels_process)

# Split the data into training and validation sets
X_train_anomalous, X_val_anomalous, y_train_anomalous, y_val_anomalous = train_test_split(features, labels_anomalous_encoded, test_size=0.2, random_state=42)
X_train_process, X_val_process, y_train_process, y_val_process = train_test_split(features, labels_process_encoded, test_size=0.2, random_state=42)

# Train logistic regression models
model_anomalous = LogisticRegression(max_iter=1000)
model_anomalous.fit(X_train_anomalous, y_train_anomalous)

model_process = LogisticRegression(max_iter=1000)
model_process.fit(X_train_process, y_train_process)

# Evaluate the models
y_pred_anomalous = model_anomalous.predict(X_val_anomalous)
y_pred_process = model_process.predict(X_val_process)

print("Accuracy for 'anomalous'/'normal':", accuracy_score(y_val_anomalous, y_pred_anomalous))
print("Accuracy for 'dressing'/'grinding':", accuracy_score(y_val_process, y_pred_process))