# Unsupervised Feature Alignment with Soft DTW Loss

In this exercise we will use a library to get a distance function on time-series and use that as loss. You will get to know dynamic time warping which can be used to measure the distance between sequences. You will also use a certain network architecture called Siamese network which can be used supervised but also unsupervised to map positive and negative examples into a feature space and then minimize the distance of their feature representation. In this notebook you will use both together to train a network unsupervised to create a representation where similar activities are close to each other and finally use a kNN classifier to calculate the performance on a downstream classification task. 

### Exercise Overview

In this exercise, you will:
1. Use a library for **SOFT** Dynamic Time Warping (DTW) to compute time-series similarity. (Soft is very important as normal DTW is not differentiable). Please look into pysdtw (Available with pip or https://github.com/toinsson/pysdtw) which is a pytorch based Cuda-ready version.
2. Implement a custom PyTorch Dataset for generating sequence pairs dynamically.
3. Train a Siamese network using Soft DTW as the loss function.
4. Evaluate the learned representations using a k-Nearest Neighbors (kNN) classifier.
5. Finetune using a single linear layer
6. Visualize and analyze the results.
7. (Bonus) Show the strength between the correlation between the DTW-distance of the input and the feature representation.

### Dataset
We will use the UCI Human Activity Recognition (HAR) dataset, which contains time-series data from smartphone accelerometers and gyroscopes.

**Important**: At the end you should write a report of adequate size, which will probably mean at least half a page. In the report you should describe how you approached the task. You should describe:
- Encountered difficulties (due to the method, e.g. "not enough training samples to converge", not technical like "I could not install a package over pip")
- Steps taken to alleviate difficulties
- General description of what you did, explain how you understood the task and what you did to solve it in general language, no code.
- Potential limitations of your approach, what could be issues, how could this be hard on different data or with slightly different conditions
- If you have an idea how this could be extended in an interesting way, describe it.

# Some explanations

## Dynamic Time Warping
Dynamic Time Warping (DTW) is an algorithm used to measure the similarity between two time series, even if they are out of sync in terms of speed or timing. Unlike traditional methods that align data point by point, DTW allows for non-linear alignment by "warping" the time axis. The idea is to find the optimal match between two sequences by stretching or compressing them along the time axis, minimizing the total distance between corresponding points. DTW does this by computing a cost matrix, where each entry represents the cost of aligning a point from one series with a point from the other. The path with the lowest cumulative cost is the optimal alignment.

As a distance function, DTW is useful for comparing time series that might have different lengths or varying speeds. For example, DTW can be applied to applications such as speech recognition, where two spoken phrases might be of different lengths or spoken at different speeds, but still convey the same meaning. By calculating the DTW distance, we can measure how similar two time series are, regardless of time shifts or distortions. 

Excellent introduction: https://www.youtube.com/watch?v=ERKDHZyZDwA

## Siamese Networks
Siamese Networks are a type of neural network architecture designed for comparing two inputs and measuring their similarity. Instead of directly predicting a single label for an input, Siamese Networks take in two input data points, pass them through identical networks (hence "siamese"), and compare the outputs. The network can be trained by stating if the two inputs are equal or uneqal. If they map into a feature space instead of immediately into an output label space, the distance in the feature space can be used in the loss function which is the task here.

In the context of this unsupervised representation learning, Siamese Networks can be used to learn meaningful features from unlabeled data. A popular technique related to this is *MoCo* (Momentum Contrast), which uses Siamese-like networks for contrastive learning. In MoCo, two different views (augmented versions) of the same data point are passed through two identical networks. One network is updated using the current model, while the other follows a momentum-based update rule. The networks are trained to bring the representations of similar views (positive pairs) closer together, while pushing the representations of dissimilar views (negative pairs) apart. This approach allows the model to learn useful representations for later fine-tuning on a classification task without needing explicit labels, relying instead on the assumption that augmented views of the same instance should be similar in the learned feature space.

In this exercise the Siamese network should be used to learn to structure the feature space, guided by the similarity calculation of the DTW. After it should be evaluated how well that method performs for classification. The comment about MoCo is only for information about this close topic, it is not necessary to use it at all.

## Part 1: Data Preparation

Load the UCI HAR dataset and implement a custom `Dataset` class to generate pairs dynamically.

In [1]:
import urllib.request

def unzip(filename, dest_path = None):
    # unzips a zip file in the folder of the notebook to the notebook
    with ZipFile(filename, 'r') as zObject: 
        # Extracting all the members of the zip  
        # into a specific location. a
        if dest_path is None:
            zObject.extractall(path=os.getcwd())
        else:
            zObject.extractall(path=dest_path)

import os
def download(url, filename):
    # download with check if file exists already
    if os.path.isfile(filename):
        return
    urllib.request.urlretrieve(url,filename)

from zipfile import ZipFile

# Un-comment lines below only if executing on Google-COLAB
# ![[ -f UCI_HAR.zip ]] || wget --no-check-certificate https://people.minesparis.psl.eu/fabien.moutarde/ES_MachineLearning/Practical_sequentialData/UCI_HAR.zip
# ![[ -f "UCI_HAR" ]] || unzip UCI_HAR.zip

download('https://people.minesparis.psl.eu/fabien.moutarde/ES_MachineLearning/Practical_sequentialData/UCI_HAR.zip','UCI_HAR.zip')

unzip('UCI_HAR.zip')

In [None]:
# Import necessary libraries
import numpy as np
import pandas as pd
from torch.utils.data import Dataset, DataLoader
import torch
import random

# Load the UCI HAR dataset (adjust file paths as needed)
X_train = pd.read_csv('./UCI_HAR/train/X_train.txt', delim_whitespace=True, header=None)
y_train = pd.read_csv('./UCI_HAR/train/y_train.txt', header=None).values.ravel()
X_test = pd.read_csv('./UCI_HAR/test/X_test.txt', delim_whitespace=True, header=None)
y_test = pd.read_csv('./UCI_HAR/test/y_test.txt', header=None).values.ravel()

# print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

# Define a custom Dataset for Siamese Network
class SiameseDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        # Define the logic for selecting pairs of sequences
        pair_label = round(random.randint(0, 1))

        if pair_label:
            # Positive pair (same class)
            idx2 = random.randint(0, len(self.data) - 1)
            while self.labels[idx2] != self.labels[idx]:
                idx2 = random.randint(0, len(self.data) - 1)

        else:
            # Negative pair (different class)
            idx2 = random.randint(0, len(self.data) - 1)
            while self.labels[idx2] == self.labels[idx]:
                idx2 = random.randint(0, len(self.data) - 1)

        seq1 = self.data.iloc[idx].values
        seq2 = self.data.iloc[idx2].values
        return torch.tensor(seq1, dtype=torch.float32), torch.tensor(seq2, dtype=torch.float32), pair_label

# Create train and test datasets using the custom Dataset class
train_dataset = SiameseDataset(X_train, y_train)
test_dataset = SiameseDataset(X_test, y_test)

## Part 2: Using a DTW Library

Use a library to compute the Dynamic Time Warping (DTW) distance between sequences. Implement a differentiable Soft DTW function to calculate this distance.

In [None]:
# Install and use a DTW library like pysdtw
# Implement a function to calculate Soft DTW distance between sequences
def dtw_distance(seq1, seq2):
    # Add your implementation here
    pass

# Test the DTW function with example sequences
seq1, seq2, label = train_dataset[0]
distance = dtw_distance(seq1, seq2)
print(f"DTW distance: {distance}")

## Part 3: Train a Siamese Network

Define and train a Siamese network in PyTorch using the Soft DTW loss function. Implement the network structure and training logic.

In [None]:
# Define a Siamese Network class
class SiameseNetwork(nn.Module):
    def __init__(self, input_size, hidden_size):
        # Initialize the network layers
        pass

    def forward_one(self, x):
        # Forward pass logic for one branch
        pass

    def forward(self, input1, input2):
        # Logic for processing both branches
        pass

# Train the network with the defined loss function and optimizer

## Part 4: Evaluate Representations with kNN

Use the trained network to extract embeddings and evaluate their quality using a kNN classifier.

In [None]:
# Extract embeddings for train and test data
# Use a kNN classifier to evaluate performance
# Calculate and print accuracy and confusion matrix

## Part 5: Fine-Tuning with a Linear Layer

Freeze the Siamese network and add a linear layer on top. Fine-tune the linear layer and re-evaluate the model.

In [None]:
# Freeze the Siamese network
# Add and train a linear layer for fine-tuning
# Evaluate the fine-tuned model