# V2 - Clean Code Development of Deep Model - with Adversarial Training
This is a *supremely* tidied up version of my earlier code-files to develop the Deep Learning Model.

This came as a result of learning to use:
1. **Torch**: Lack of prior experience! Big learning curve, but this is a fascinatingly nuanced library. Over time, I recognized & modularized re-used code into functions.
2. **wandb**: Intuitive but a learning curve too - the wandb project for these models is very messy!
3. **DL Concepts**: Earlier methods used only stratified validation methods and **NO** user embedding/adversarial training methods -- after understanding these concepts and learning how they should be implemented, i copied much of the previous code into this file & developed further.


This notebook is as used within Google Colab, to take advantage of torch.cuda() and the T4 GPU runtimes.
For simplification, I've distilled everything into 1 code-block, but the contents are as follows:

1. **Imports/Initialisations** (wandb, random_state seeds)
2. **Functions** for:
    1. Preparing the data as torch.Tensor's (user_ids, X, y) 
    2. Methods for the model: AdversarialGradReverse (returning $-\lambda\times L$ (Loss) to *confuse*/lower the accuracy of user-prediction - Adversarial Training!), Normalization automations, 
    3. WANDB automations
3. **The Model**: NewModel(nn.Module) - built from feature_extraction and user_embeds in .forward() for domain-adaptable predictive capacity
4. **Training Functions** (train-test split, confusion matrix, predictions on test-data)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

import json
import pandas as pd

path = '/path/to/my/data/dir'
predictions_path = '/path/to/my/predictions/dir'

df_train = pd.read_csv(path + '/train_signal_clean.csv', converters={'x': json.loads, 'y': json.loads, 'z': json.loads})
df_test = pd.read_csv(path + '/test_signal_clean.csv', converters={'x': json.loads, 'y': json.loads, 'z': json.loads})


import numpy as np
import seaborn as sns
import time
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.metrics import confusion_matrix

import matplotlib.pyplot as plt
plt.style.use('ggplot')

import torch
from torch import Tensor
from torch.nn import Module, Linear, Conv1d, MaxPool1d, ReLU, LSTM, CrossEntropyLoss, functional as F, Sequential, Embedding
from torch.optim import Adam, RMSprop
from torch.utils.data import DataLoader, TensorDataset, Subset, WeightedRandomSampler
from sklearn.metrics import f1_score, precision_score, recall_score

try:
  import wandb
except:
  !pip install wandb
  import wandb

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

WANDB_ENTITY = 'my_wandb_entity' # Sensitive - replaced with generic values
WANDB_PROJECT = 'my_wandb_project' # Sensitive - replaced with generic values

MY_SEED = 6

torch.manual_seed(MY_SEED)
def get_X_signal(df_signal):
    X = np.zeros((len(df_signal), 160, 3))
    for i in range(len(df_signal)):
        X[i, :, :] = np.array([
            df_signal['x'].iloc[i], 
            df_signal['y'].iloc[i], 
            df_signal['z'].iloc[i], 
        ]).transpose()
    return torch.tensor(X, dtype = torch.float32)

def extract_user_ids(df_signal):
    user_strings = df_signal['user'].values
    user_ids = np.array([int(u.split('_')[1]) for u in user_strings])
    return torch.tensor(user_ids, dtype=torch.long)

def plot_signal(row, title = None, test = False, seperate_ax = False):
    if seperate_ax:
        fig, axs = plt.subplots(nrows = 3, ncols = 1, figsize = (13, 10), layout = 'constrained')
        axs.ravel()
        for ax, c in zip(axs, ['x', 'y', 'z']):
            y = row[c]
            ax.plot(y, label = c)
        plt.legend(bbox_to_anchor = [1, 1])
        if title is None:
            if not test:
                title = row['id'] + ': ' + row['activity']
            else:
                title = row['id']
        plt.suptitle(title)
        plt.show()
    else:
        fig, ax = plt.subplots(figsize = (13, 3), layout = 'constrained')
            
        for c in ['x', 'y', 'z']:
            y = row[c]
            ax.plot(y, label = c)
        plt.legend(bbox_to_anchor = [1, 1])
        if title is None:
            if not test:
                title = row['id'] + ': ' + row['activity']
            else:
                title = row['id']
        plt.suptitle(title)
        plt.show()
        
def get_mean_std(X: Tensor, dims = (0, 1), device = device) -> tuple:
    return X.mean(dim = dims).to(device), X.std(dim = dims).to(device)

def normalize_X(X: Tensor, mean_std: tuple = None, dims = (0, 1)) -> Tensor:
    if mean_std == (None, None):
        return X
    if mean_std is None:
        mean_std = get_mean_std(X, dims = dims)
    
    return (X - mean_std[0]) / mean_std[1]

def extract_model_config(model: Module) -> dict:
    config = {}
    for name, module in model.named_modules():
        if name == "":
            continue

        layer_type = type(module).__name__
        layer_config = {"type": layer_type}

        if isinstance(module, Conv1d):
            layer_config.update({
                "in_channels": module.in_channels,
                "out_channels": module.out_channels,
                "kernel_size": module.kernel_size,
                "padding": module.padding,
            })
        elif isinstance(module, Linear):
            layer_config.update({
                "in_features": module.in_features,
                "out_features": module.out_features,
            })
        elif isinstance(module, LSTM):
            layer_config.update({
                "input_size": module.input_size,
                "hidden_size": module.hidden_size,
                "num_layers": module.num_layers,
                "bidirectional": module.bidirectional,
            })
        elif isinstance(module, MaxPool1d):
            layer_config.update({
                "kernel_size": module.kernel_size,
                "stride": module.stride,
            })
        elif isinstance(module, Embedding):
            layer_config.update({
                "num_embeddings": module.num_embeddings,
                "embedding_dim": module.embedding_dim,
            })

        config[name] = layer_config

    return config

def wandb_config_entry(model: Module, name: str, train_params: dict, entity: str, project: str) -> wandb.init:
    config = {}
    config['architecture'] = extract_model_config(model)
    config['training'] = train_params
    return wandb.init(name = name, project = project, entity = entity, config = config)

def train_test_datasets(X, y, user_ids, train_proportion, batch_size, normalize, device = device):
    sss = StratifiedShuffleSplit(n_splits=1, train_size=train_proportion, random_state=MY_SEED)
    indices = np.arange(len(y))
    train_idx, val_idx = next(sss.split(indices, y.numpy()))
    
    
    dataset = TensorDataset(X, y, user_ids)
    
    train_dataset, val_dataset = Subset(dataset, train_idx), Subset(dataset, val_idx)
    
    if normalize:
        mean, scale = get_mean_std(X[train_idx])
        train_X = normalize_X(X[train_idx], (mean, scale))
        val_X = normalize_X(X[val_idx], (mean, scale))
        train_dataset = TensorDataset(train_X, y[train_idx], user_ids[train_idx])
        val_dataset = TensorDataset(val_X, y[val_idx], user_ids[val_idx])
    else:
        
        mean, scale = None, None
        
    y_train_split = y[train_idx]
    sampler_weights = class_weights[y_train_split.numpy()]
    sampler = WeightedRandomSampler(sampler_weights, num_samples=len(sampler_weights))
    
    train_loader = DataLoader(train_dataset, batch_size=batch_size, sampler=sampler)
    val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
    return train_loader, train_dataset, val_dataset, val_loader, mean, scale

class AdversarialGradReverse(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, alpha):
        ctx.alpha = alpha
        return x

    @staticmethod
    def backward(ctx, grad_output):
        return -ctx.alpha * grad_output, None

def grad_reverse(x, alpha=1.0):
    return AdversarialGradReverse.apply(x, alpha)

class NewModel(Module):
    def __init__(self, num_classes, num_users, user_embedding_dim=16, num_filters=40, num_layers=2, bidirectional=True):
        super().__init__()
        

        self.relu = ReLU()
        self.conv1 = Conv1d(in_channels=3, out_channels=num_filters, kernel_size=5, padding='same')
        self.conv2 = Conv1d(in_channels=num_filters, out_channels=num_filters, kernel_size=7, padding='same')
        self.conv3 = Conv1d(in_channels=num_filters, out_channels=num_filters, kernel_size=9, padding='valid', stride=2)
        self.conv4 = Conv1d(in_channels=num_filters, out_channels=num_filters, kernel_size=11, padding='valid', stride=3)
        
        self.lstm1 = LSTM(num_filters, hidden_size=num_filters, batch_first=True, 
                          dropout=0.5, num_layers=num_layers, bidirectional=bidirectional)
        feature_dim = num_filters * 2 * (2 if bidirectional else 1)    
        self.activity_classifier = Sequential(
            Linear(feature_dim, num_classes)
        )        
        self.user_discriminator = Sequential(
            Linear(feature_dim, 64),
            ReLU(),
            Linear(64, num_users + 1)
        )
        
    def extract_features(self, x):
        x = x.permute(0, 2, 1)
        x = self.relu(self.conv1(x))
        x = self.relu(self.conv2(x))
        x = self.relu(self.conv3(x))
        # x = self.relu(self.conv4(x))
        
        x = x.permute(0, 2, 1)
        x, _ = self.lstm1(x)
        x = x.permute(0, 2, 1)
        avg_pool = torch.mean(x, dim=2)
        max_pool, _ = torch.max(x, dim=2)
        features = torch.cat((avg_pool, max_pool), dim=1)
        return features
    
    def forward(self, x, user_ids=None, alpha=1.0):
        features = self.extract_features(x)
        activity_logits = self.activity_classifier(features)
        if user_ids is not None:
            reversed_features = grad_reverse(features, alpha)
            user_logits = self.user_discriminator(reversed_features)
            
            return activity_logits, user_logits
        else:
            return activity_logits

def train_model(model, train_loader, val_dataset, optimizer, loss_fn, num_epochs, run, alpha_schedule, device = device):
    X_val, y_val, user_ids_val = val_dataset[:]
    
    X_val = X_val.to(device)
    y_val = y_val.to(device)
    user_ids_val = user_ids_val.to(device)
    
    for epoch in range(num_epochs):
        t = time.time()
        model.train()
        alpha = alpha_schedule(epoch, num_epochs)
        
        total_loss = 0.0
        activity_loss_sum = 0.0
        user_loss_sum = 0.0
        batch_count = 0
        
        for X_batch, y_batch, user_ids_batch in train_loader:
            X_batch = X_batch.to(device)
            y_batch = y_batch.to(device)
            user_ids_batch = user_ids_batch.to(device)
            optimizer.zero_grad()
            
            activity_logits, user_logits = model(X_batch, user_ids_batch, alpha)
            activity_loss = loss_fn(activity_logits, y_batch)
            user_loss = F.cross_entropy(user_logits, user_ids_batch)
            loss = activity_loss + user_loss
            loss.backward()
            optimizer.step()
            
            total_loss += loss.item()
            activity_loss_sum += activity_loss.item()
            user_loss_sum += user_loss.item()
            batch_count += 1
        avg_loss = total_loss / batch_count
        avg_activity_loss = activity_loss_sum / batch_count
        avg_user_loss = user_loss_sum / batch_count
        model.eval()
        with torch.no_grad():
            val_activity_logits, val_user_logits = model(X_val, user_ids_val, alpha)
            val_activity_loss = loss_fn(val_activity_logits, y_val)
            val_user_loss = F.cross_entropy(val_user_logits, user_ids_val)
            val_loss = val_activity_loss + val_user_loss
            _, predicted_activities = torch.max(val_activity_logits, 1)
            activity_accuracy = (predicted_activities == y_val).float().mean().item()
            _, predicted_users = torch.max(val_user_logits, 1)
            user_accuracy = (predicted_users == user_ids_val).float().mean().item()
        
        predicted_np = predicted_activities.cpu().numpy()
        y_val_np = y_val.cpu().numpy()

        precision = precision_score(y_val_np, predicted_np, average='weighted')
        recall = recall_score(y_val_np, predicted_np, average='weighted')
        f1 = f1_score(y_val_np, predicted_np, average='weighted')
        epoch_time = time.time() - t
        
        run.log({
            'epoch': epoch,
            'alpha': alpha,
            'train_loss': avg_loss,
            'train_activity_loss': avg_activity_loss,
            'train_user_loss': avg_user_loss,
            'val_loss': val_loss.item(),
            'val_activity_loss': val_activity_loss.item(),
            'val_user_loss': val_user_loss.item(),
            'activity_accuracy': activity_accuracy,
            'user_accuracy': user_accuracy,
            'f1_score': f1,
            'precision': precision,
            'recall': recall,
            'epoch_time': epoch_time
        })
        
        print(f"Epoch {epoch}: Train Loss = {avg_loss:.4f}, Val Loss = {val_loss.item():.4f}")
        print(f"  Activity Acc = {activity_accuracy:.4f}, User Acc = {user_accuracy:.4f} (lower is better)")
        print(f"  Precision = {precision:.4f}, Recall = {recall:.4f}, F1 Score = {f1:.4f}")
        print(f"  Alpha = {alpha:.4f}")

def alpha_schedule(epoch, max_epochs):
    """
    Gradually increase alpha (adversarial pressure)
    """
    return min(1.0, epoch / (max_epochs * 0.5))

def new_model_run(model, name, train_params, entity=WANDB_ENTITY, project=WANDB_PROJECT, 
                             X=None, y=None, user_ids=None, optimizer_class=RMSprop, 
                             loss_fn=None, show_conf_matrix=False, normalize=False, alpha=0.005):
    
    run = wandb_config_entry(model, name, train_params, entity, project)
    train_proportion = train_params['train_proportion']
    batch_size = train_params['batch_size']
    lr = train_params['lr']
    num_epochs = train_params['num_epochs']
    optimizer = optimizer_class(params=model.parameters(), lr=lr, weight_decay=alpha)
    
    train_loader, train_dataset, val_dataset, val_loader, mean, scale = train_test_datasets(
        X, y, user_ids, train_proportion, batch_size, normalize=normalize
    )

    train_model(
        model=model,
        train_loader=train_loader,
        val_dataset=val_dataset,
        optimizer=optimizer,
        loss_fn=loss_fn,
        num_epochs=num_epochs,
        run=run,
        alpha_schedule=alpha_schedule
    )
    
    run.finish()
    
    if show_conf_matrix:
        conf_matrix(model, val_loader)
    
    return mean, scale
    
def conf_matrix(model, val_loader):
    all_preds = []
    all_labels = []
    
    model.eval()
    with torch.no_grad():
        for X_batch, y_batch, user_ids_batch in val_loader:
            outputs, _ = model(X_batch, user_ids_batch, alpha=0)
            _, preds = torch.max(outputs, 1)

            all_preds.append(preds.cpu())
            all_labels.append(y_batch.cpu())
            
    y_pred = torch.cat(all_preds).numpy()
    y_true = torch.cat(all_labels).numpy()
    
    cm = confusion_matrix(y_true, y_pred)
    
    print(cm)

    plt.figure(figsize=(8,6))
    sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                xticklabels=le.classes_, yticklabels=le.classes_)
    plt.xlabel('Predicted')
    plt.ylabel('True')
    plt.title('Confusion Matrix')
    plt.show()

X_train = get_X_signal(df_train).to(device)
X_test = get_X_signal(df_test).to(device)

user_ids_train = extract_user_ids(df_train)
user_ids_test = extract_user_ids(df_test)
NUM_USERS = int(user_ids_train.max().item())

scale_mean, scale_std = get_mean_std(X_train)

labels_train = df_train['activity']
le = LabelEncoder()
y_train = Tensor(le.fit_transform(labels_train)).long()
NUM_CLASSES = len(le.classes_)

class_counts = np.bincount(y_train, minlength=NUM_CLASSES)
class_weights = 1.0 / class_counts
class_weights = class_weights * (len(y_train) / np.sum(class_weights*class_counts))
class_weights = torch.tensor(class_weights, dtype=torch.float32)

loss_fn = CrossEntropyLoss(weight=class_weights.to(device))

normalize = False
bidirectional = True
num_filters = 64
num_layers = 1
alpha = 0.005
user_embedding_dim = 64

model = NewModel(
    num_classes=NUM_CLASSES,
    num_users=NUM_USERS,
    user_embedding_dim=user_embedding_dim,
    num_filters=num_filters,
    num_layers=num_layers,
    bidirectional=bidirectional
)

model = model.to(device)
name = 'CNN-3-BiLSTM-1_userembed_150epoch_64_64'
train_params = {
    'train_proportion': 0.8,
    'batch_size': 64,
    'lr': 0.001,
    'num_epochs': 150
}

mean, scale = new_model_run(
    model=model,
    name=name,
    train_params=train_params,
    X=X_train,
    y=y_train,
    user_ids=user_ids_train,
    optimizer_class=Adam,
    loss_fn=loss_fn,
    normalize=normalize,
    alpha=alpha
)

def predict_on_test(model, X_test, batch_size=train_params['batch_size']):
    model.eval()
    test_dataset = TensorDataset(X_test)
    test_loader = DataLoader(test_dataset, batch_size=batch_size)
    
    all_preds = []
    
    with torch.no_grad():
        for (X_batch,) in test_loader:
            outputs = model(X_batch)
            _, preds = torch.max(outputs, 1)
            all_preds.append(preds)
            
    return torch.cat(all_preds).to('cpu').numpy()

  Expected `list[str]` but got `tuple` - serialized value may not be as expected
  Expected `list[str]` but got `tuple` - serialized value may not be as expected
  return self.__pydantic_serializer__.to_python(


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 0: Train Loss = 4.1319, Val Loss = 4.1508
  Activity Acc = 0.4084, User Acc = 0.0800 (lower is better)
  Precision = 0.4193, Recall = 0.4084, F1 Score = 0.3752
  Alpha = 0.0000


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 1: Train Loss = 3.4481, Val Loss = 3.7753
  Activity Acc = 0.5054, User Acc = 0.0776 (lower is better)
  Precision = 0.5002, Recall = 0.5054, F1 Score = 0.4610
  Alpha = 0.0667


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 2: Train Loss = 3.3969, Val Loss = 3.7840
  Activity Acc = 0.5070, User Acc = 0.0893 (lower is better)
  Precision = 0.4681, Recall = 0.5070, F1 Score = 0.4635
  Alpha = 0.1333


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 3: Train Loss = 3.3202, Val Loss = 3.6935
  Activity Acc = 0.5435, User Acc = 0.0947 (lower is better)
  Precision = 0.4844, Recall = 0.5435, F1 Score = 0.4944
  Alpha = 0.2000


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 4: Train Loss = 3.2788, Val Loss = 3.7338
  Activity Acc = 0.5357, User Acc = 0.0846 (lower is better)
  Precision = 0.4663, Recall = 0.5357, F1 Score = 0.4826
  Alpha = 0.2667


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 5: Train Loss = 3.2786, Val Loss = 3.7393
  Activity Acc = 0.5404, User Acc = 0.0753 (lower is better)
  Precision = 0.4878, Recall = 0.5404, F1 Score = 0.4867
  Alpha = 0.3333


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


Epoch 6: Train Loss = 3.2865, Val Loss = 3.7659
  Activity Acc = 0.5295, User Acc = 0.0668 (lower is better)
  Precision = 0.4789, Recall = 0.5295, F1 Score = 0.4813
  Alpha = 0.4000
Epoch 7: Train Loss = 3.2893, Val Loss = 3.7181
  Activity Acc = 0.5334, User Acc = 0.0714 (lower is better)
  Precision = 0.8655, Recall = 0.5334, F1 Score = 0.4936
  Alpha = 0.4667
Epoch 8: Train Loss = 3.2667, Val Loss = 3.6454
  Activity Acc = 0.8005, User Acc = 0.0722 (lower is better)
  Precision = 0.8466, Recall = 0.8005, F1 Score = 0.8127
  Alpha = 0.5333
Epoch 9: Train Loss = 3.2447, Val Loss = 3.5908
  Activity Acc = 0.8067, User Acc = 0.0745 (lower is better)
  Precision = 0.8888, Recall = 0.8067, F1 Score = 0.8216
  Alpha = 0.6000
Epoch 10: Train Loss = 3.2881, Val Loss = 3.5914
  Activity Acc = 0.8043, User Acc = 0.0675 (lower is better)
  Precision = 0.8852, Recall = 0.8043, F1 Score = 0.8154
  Alpha = 0.6667
Epoch 11: Train Loss = 3.2892, Val Loss = 3.6616
  Activity Acc = 0.7554, User Acc =

0,1
activity_accuracy,▁▂▂▃▃▃▃▃▆▆▆▆▇▇▇▇█▅▇██▇██▇█████
alpha,▁▁▂▂▃▃▄▄▅▅▆▆▇▇████████████████
epoch,▁▁▁▂▂▂▂▃▃▃▃▄▄▄▄▅▅▅▅▆▆▆▆▇▇▇▇███
epoch_time,▃▄▂▂▁▂▁▂▁▁▁▁▁▁▁█▃▃▃▃▃▅▃▄▄▆▃▂▂▃
f1_score,▁▂▂▂▂▂▂▂▆▇▆▆▇▇▇▇█▅▇██▇████████
precision,▁▂▂▂▂▂▂▇▇▇▇▇▇▇▇██▇████████████
recall,▁▂▂▃▃▃▃▃▆▆▆▆▇▇▇▇█▅▇██▇██▇█████
train_activity_loss,█▄▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▁▂▁▂▁▁▁▁▁▁▁▁▁
train_loss,█▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁
train_user_loss,█▄▃▁▁▂▂▃▂▂▃▃▃▄▃▄▃▄▅▄▄▄▄▄▄▄▄▄▄▃

0,1
activity_accuracy,0.91149
alpha,1.0
epoch,29.0
epoch_time,6.59713
f1_score,0.91576
precision,0.92891
recall,0.91149
train_activity_loss,0.15318
train_loss,3.12594
train_user_loss,2.97276


# Output Test Results

In [None]:
test_predictions = predict_on_test(model, X_test)
df_submission = pd.DataFrame({
    'id': df_test['id'],
    'predicted': le.inverse_transform(test_predictions)
})

df_submission=pd.concat([
    df_submission,
    pd.read_csv(path + '/predictions_allzero.csv')
]).to_csv(f'{predictions_path}/{name}.csv', index=False)