In [None]:
#|default_exp models.TST

# TST

This is an unofficial PyTorch implementation by Ignacio Oguiza of  - timeseriesAI@gmail.com based on:
* George Zerveas et al. A Transformer-based Framework for Multivariate Time Series Representation Learning, in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21), August 14--18, 2021. ArXiV version: https://arxiv.org/abs/2010.02803
* Official implementation: https://github.com/gzerveas/mvts_transformer

```bash
@inproceedings{10.1145/3447548.3467401,
author = {Zerveas, George and Jayaraman, Srideepika and Patel, Dhaval and Bhamidipaty, Anuradha and Eickhoff, Carsten},
title = {A Transformer-Based Framework for Multivariate Time Series Representation Learning},
year = {2021},
isbn = {9781450383325},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3447548.3467401},
doi = {10.1145/3447548.3467401},
booktitle = {Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &amp; Data Mining},
pages = {2114–2124},
numpages = {11},
keywords = {regression, framework, multivariate time series, classification, transformer, deep learning, self-supervised learning, unsupervised learning, imputation},
location = {Virtual Event, Singapore},
series = {KDD '21}
}
```


This paper uses 'Attention is all you need' as a major reference:
* Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). **Attention is all you need**. In Advances in neural information processing systems (pp. 5998-6008).

This implementation is adapted to work with the rest of the `tsai` library, and contain some hyperparameters that are not available in the original implementation. They are included to experiment with them. 

## TST arguments

Usual values are the ones that appear in the "Attention is all you need" and "A Transformer-based Framework for Multivariate Time Series Representation Learning" papers. 

The default values are the ones selected as a default configuration in the latter.

* c_in: the number of features (aka variables, dimensions, channels) in the time series dataset. dls.var
* c_out: the number of target classes. dls.c
* seq_len: number of time steps in the time series. dls.len
* max_seq_len: useful to control the temporal resolution in long time series to avoid memory issues. Default. None.
* d_model: total dimension of the model (number of features created by the model). Usual values: 128-1024. Default: 128.
* n_heads:  parallel attention heads. Usual values: 8-16. Default: 16.
* d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.
* d_v: size of the learned linear projection of values in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.
* d_ff: the dimension of the feedforward network model. Usual values: 256-4096. Default: 256.
* dropout: amount of residual dropout applied in the encoder. Usual values: 0.-0.3. Default: 0.1.
* activation: the activation function of intermediate layer, relu or gelu. Default: 'gelu'.
* n_layers: the number of sub-encoder-layers in the encoder. Usual values: 2-8. Default: 3.
* fc_dropout: dropout applied to the final fully connected layer. Usual values: 0.-0.8. Default: 0.
* y_range: range of possible y values (used in regression tasks). Default: None
* kwargs: nn.Conv1d kwargs. If not {}, a nn.Conv1d with those kwargs will be applied to original time series.

In [1]:
#@title ECG Data
# Connect to Google Drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)

# Install ECG Dependencies
!pip install ecg_plot
!pip install pc
!pip install wfdb

# Bert dependencies
!pip install transformers

# Required Imports
import pandas as pd
import numpy as np
import wfdb
import ast
import csv
import numpy as np
import random
import ecg_plot
import pc
import os
from wfdb.io.record import Record, rdrecord
from wfdb.plot.plot import plot_wfdb
from socket import socket
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets,transforms
from torch.utils.data.sampler import SubsetRandomSampler
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader
from transformers import BertTokenizer, BertModel

import math
from typing import Tuple
import copy

import torch
from torch import nn, Tensor
import torch.nn.functional as F
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.utils.data import dataset

import torchvision

Mounted at /content/gdrive
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting ecg_plot
  Downloading ecg_plot-0.2.8-py3-none-any.whl (9.2 kB)
Installing collected packages: ecg_plot
Successfully installed ecg_plot-0.2.8
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pc
  Downloading pc-0.0.3.tar.gz (5.7 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: pc
  Building wheel for pc (setup.py) ... [?25l[?25hdone
  Created wheel for pc: filename=pc-0.0.3-py3-none-any.whl size=6176 sha256=8f7f2e6285e466e8613819ff5772544518d218c5bbc5af923c53e46066869b0c
  Stored in directory: /root/.cache/pip/wheels/2a/1b/b9/98ee104e81e5d88884eadd6fe24abcfd1b8c30816d8611df14
Successfully built pc
Installing collected packages: pc
Successfully installed pc-0.0.3
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-whee

In [2]:
!pip install tsai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting tsai
  Downloading tsai-0.3.4-py3-none-any.whl (272 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m272.3/272.3 KB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
Collecting pyts>=0.12.0
  Downloading pyts-0.12.0-py3-none-any.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m34.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pyts, tsai
Successfully installed pyts-0.12.0 tsai-0.3.4


In [3]:
# Ankur's Path:
!unzip "/content/gdrive/MyDrive/Colab Notebooks/CardiacAbnormalityTransformerProject/physionet.org/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2.zip"
# Meghna's Path:
!unzip "/content/gdrive/MyDrive/Fall 2022/APS360/CardiacAbnormalityTransformerProject/physionet.org/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2.zip"
# Mark's path
!unzip "/content/gdrive/MyDrive/CardiacAbnormalityTransformerProject/physionet.org/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2.zip"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19338_lr.hea  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19339_lr.dat  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19339_lr.hea  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19340_lr.dat  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19340_lr.hea  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19341_lr.dat  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19341_lr.hea  
  inflating: ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/records100/19000/19342_lr.dat  
  inflating: ptb-xl-a-large-pub

In [4]:
# Chris
!unzip "/content/gdrive/MyDrive/CardiacAbnormalityTransformerProject/clinicalEmbeddings.zip"

# Ankur
!unzip "/content/gdrive/MyDrive/Colab Notebooks/CardiacAbnormalityTransformerProject/clinicalEmbeddings.zip"

#Meghna
!unzip "/content/gdrive/MyDrive/Fall 2022/APS360/CardiacAbnormalityTransformerProject/clinicalEmbeddings.zip"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: 6273.tensor             
  inflating: 6274.tensor             
  inflating: 6275.tensor             
  inflating: 6276.tensor             
  inflating: 6277.tensor             
  inflating: 6278.tensor             
  inflating: 6279.tensor             
  inflating: 6280.tensor             
  inflating: 6281.tensor             
  inflating: 6282.tensor             
  inflating: 6283.tensor             
  inflating: 6284.tensor             
  inflating: 6285.tensor             
  inflating: 6286.tensor             
  inflating: 6287.tensor             
  inflating: 6288.tensor             
  inflating: 6289.tensor             
  inflating: 6290.tensor             
  inflating: 6291.tensor             
  inflating: 6292.tensor             
  inflating: 6293.tensor             
  inflating: 6294.tensor             
  inflating: 6295.tensor             
  inflating: 6296.tensor             
  inflating: 6297.tenso

In [5]:
def load_raw_data(df, sampling_rate, path):
    if sampling_rate == 100:
        data = [wfdb.rdsamp(path+f)[0] for f in df.filename_lr]
    else:
        data = [wfdb.rdsamp(path+f) for f in df.filename_hr]
    data = np.array(data)
    return data

In [6]:
# Shortcut to preprocessing
# Load raw signal data
#df = pd.read_csv('/content/gdrive/MyDrive/Colab Notebooks/CardiacAbnormalityTransformerProject/undersampledY')
df = pd.read_csv('/content/gdrive/MyDrive/CardiacAbnormalityTransformerProject/undersampledY')
# Meghna's path
#df = pd.read_csv('/content/gdrive/MyDrive/Fall 2022/APS360/CardiacAbnormalityTransformerProject/undersampledY')
#df = df.head(20)
X = load_raw_data(df, 100, '/content/ptb-xl-a-large-publicly-available-electrocardiography-dataset-1.0.2/')

# Check how many ecg segments
X.shape

(11271, 1000, 12)

In [7]:
class M(Dataset):
    def __init__(self, diagnostics_superclass,X):
        self.labels = diagnostics_superclass
        self.data = X
        self.id = df.index.to_list()
        self.notes = df.report.to_list()
        self.map = {'[\'NORM\']':0,'[\'MI\']':1,'[\'STTC\']':2,'[\'CD\']':3,'[\'HYP\']':4, '[\'HYP\', \'MI\']':5, '[\'HYP\', \'CD\']':6,
                    '[\'HYP\', \'STTC\']':7,'[\'MI\', \'CD\']':8,'[\'MI\', \'STTC\']':9,'[\'CD\', \'STTC\']':10,'[\'HYP\', \'MI\', \'STTC\']':11,
                    '[\'HYP\', \'MI\', \'CD\']':12, '[\'MI\', \'CD\', \'STTC\']':13,'[\'HYP\', \'CD\', \'STTC\']':14,'[\'HYP\', \'MI\', \'CD\', \'STTC\']':15, 
                    '[\'CD\', \'NORM\']':16, '[\'CD\', \'NORM\', \'STTC\']':17, '[\'NORM\', \'STTC\']':18, '[\'HYP\', \'NORM\']':19, '[\'MI\', \'NORM\']':20, 
                    '[\'HYP\', \'MI\', \'NORM\']':21, '[\'HYP\', \'CD\', \'NORM\']':22, '[\'HYP\', \'MI\', \'CD\', \'NORM\']':23, '[\'HYP\', \'MI\', \'CD\', \'NORM\', \'STTC\']':24,
                    '[\'HYP\', \'NORM\', \'STTC\']':25, '[\'MI\', \'NORM\', \'STTC\']':26, '[\'HYP\', \'MI\', \'NORM\', \'STTC\']':27, '[\'HYP\', \'CD\', \'NORM\', \'STTC\']':28, '[\'MI\', \'CD\', \'NORM\', \'STTC\']':29}
    def __getitem__(self, i):
        return self.data[i][:250] , self.map["".join(self.labels[i])], torch.load("/content/"+ str(self.id[i]) + ".tensor") 
    def __len__(self):
        return len(self.labels)
diagnostics_superclass = df['diagnostic_superclass'].to_list()
diagnostics_superclass = np.array(diagnostics_superclass)
ds = M(diagnostics_superclass,X)
# dl = DataLoader(ds, batch_size=6)

In [8]:
def get_data_loader(train_data,val_data,batch_size):
  #dataloader = torch.utils.data.DataLoader(dataset,batch_size=batch_size,shuffle=True)
  return torch.utils.data.DataLoader(train_data,batch_size=batch_size,shuffle=True), torch.utils.data.DataLoader(val_data,batch_size=batch_size,shuffle=True)

def get_split_data(dataset):
  train_size = int(0.08 * len(dataset))
  valid_size = int(0.01 * len(dataset))
  test_size = len(dataset) - train_size - valid_size
  train_loader, val_loader, test_loader = torch.utils.data.random_split(dataset, [train_size, valid_size, test_size])
  return train_loader, val_loader, test_loader

In [9]:
train_data, val_data, test_data = get_split_data(ds)
train_loader, val_loader = get_data_loader(train_data,val_data,1)
print(len(train_loader))

901


## Imports

In [10]:
#|export
from tsai.imports import *
from tsai.utils import *
from tsai.models.layers import *
from tsai.models.utils import *

## TST

In [11]:
#|exporti
class _ScaledDotProductAttention(Module):
    def __init__(self, d_k:int): self.d_k = d_k
    def forward(self, q:Tensor, k:Tensor, v:Tensor, mask:Optional[Tensor]=None):

        # MatMul (q, k) - similarity scores for all pairs of positions in an input sequence
        scores = torch.matmul(q, k)                                         # scores : [bs x n_heads x q_len x q_len]
        
        # Scale
        scores = scores / (self.d_k ** 0.5)
        
        # Mask (optional)
        if mask is not None: scores.masked_fill_(mask, -1e9)
        
        # SoftMax
        attn = F.softmax(scores, dim=-1)                                    # attn   : [bs x n_heads x q_len x q_len]
        
        # MatMul (attn, v)
        context = torch.matmul(attn, v)                                     # context: [bs x n_heads x q_len x d_v]
        
        return context, attn

In [12]:
#|exporti
class _MultiHeadAttention(Module):
    def __init__(self, d_model:int, n_heads:int, d_k:int, d_v:int):
        r"""
        Input shape:  Q, K, V:[batch_size (bs) x q_len x d_model], mask:[q_len x q_len]
        """
        self.n_heads, self.d_k, self.d_v = n_heads, d_k, d_v
        
        self.W_Q = nn.Linear(d_model, d_k * n_heads, bias=False)
        self.W_K = nn.Linear(d_model, d_k * n_heads, bias=False)
        self.W_V = nn.Linear(d_model, d_v * n_heads, bias=False)
        
        self.W_O = nn.Linear(n_heads * d_v, d_model, bias=False)

    def forward(self, Q:Tensor, K:Tensor, V:Tensor, mask:Optional[Tensor]=None):
        
        bs = Q.size(0)

        # Linear (+ split in multiple heads)
        q_s = self.W_Q(Q).view(bs, -1, self.n_heads, self.d_k).transpose(1,2)       # q_s    : [bs x n_heads x q_len x d_k]
        k_s = self.W_K(K).view(bs, -1, self.n_heads, self.d_k).permute(0,2,3,1)     # k_s    : [bs x n_heads x d_k x q_len] - transpose(1,2) + transpose(2,3)
        v_s = self.W_V(V).view(bs, -1, self.n_heads, self.d_v).transpose(1,2)       # v_s    : [bs x n_heads x q_len x d_v]

        # Scaled Dot-Product Attention (multiple heads)
        context, attn = _ScaledDotProductAttention(self.d_k)(q_s, k_s, v_s)          # context: [bs x n_heads x q_len x d_v], attn: [bs x n_heads x q_len x q_len]

        # Concat
        context = context.transpose(1, 2).contiguous().view(bs, -1, self.n_heads * self.d_v) # context: [bs x q_len x n_heads * d_v]

        # Linear
        output = self.W_O(context)                                                  # context: [bs x q_len x d_model]
        
        return output, attn

In [13]:
t = torch.rand(16, 50, 128)
output, attn = _MultiHeadAttention(d_model=128, n_heads=3, d_k=8, d_v=6)(t, t, t)
output.shape, attn.shape

(torch.Size([16, 50, 128]), torch.Size([16, 3, 50, 50]))

In [14]:
#|exporti
def get_activation_fn(activation):
    if activation == "relu": return nn.ReLU()
    elif activation == "gelu": return nn.GELU()
    else: return activation()
#         raise ValueError(f'{activation} is not available. You can use "relu" or "gelu"')

class _TSTEncoderLayer(Module):
    def __init__(self, q_len:int, d_model:int, n_heads:int, d_k:Optional[int]=None, d_v:Optional[int]=None, d_ff:int=256, dropout:float=0.1, 
                 activation:str="gelu"):

        assert d_model // n_heads, f"d_model ({d_model}) must be divisible by n_heads ({n_heads})"
        d_k = ifnone(d_k, d_model // n_heads)
        d_v = ifnone(d_v, d_model // n_heads)

        # Multi-Head attention
        self.self_attn = _MultiHeadAttention(d_model, n_heads, d_k, d_v)

        # Add & Norm
        self.dropout_attn = nn.Dropout(dropout)
        self.batchnorm_attn = nn.Sequential(Transpose(1,2), nn.BatchNorm1d(d_model), Transpose(1,2))

        # Position-wise Feed-Forward
        self.ff = nn.Sequential(nn.Linear(d_model, d_ff), 
                                get_activation_fn(activation), 
                                nn.Dropout(dropout), 
                                nn.Linear(d_ff, d_model))

        # Add & Norm
        self.dropout_ffn = nn.Dropout(dropout)
        self.batchnorm_ffn = nn.Sequential(Transpose(1,2), nn.BatchNorm1d(d_model), Transpose(1,2))

    def forward(self, src:Tensor, mask:Optional[Tensor]=None) -> Tensor:

        # Multi-Head attention sublayer
        ## Multi-Head attention
        src2, attn = self.self_attn(src, src, src, mask=mask)
        ## Add & Norm
        src = src + self.dropout_attn(src2) # Add: residual connection with residual dropout
        src = self.batchnorm_attn(src)      # Norm: batchnorm 

        # Feed-forward sublayer
        ## Position-wise Feed-Forward
        src2 = self.ff(src)
        ## Add & Norm
        src = src + self.dropout_ffn(src2) # Add: residual connection with residual dropout
        src = self.batchnorm_ffn(src) # Norm: batchnorm

        return src

In [15]:
t = torch.rand(16, 50, 128)
output = _TSTEncoderLayer(q_len=50, d_model=128, n_heads=3, d_k=None, d_v=None, d_ff=512, dropout=0.1, activation='gelu')(t)
output.shape

torch.Size([16, 50, 128])

In [16]:
#|exporti
class _TSTEncoder(Module):
    def __init__(self, q_len, d_model, n_heads, d_k=None, d_v=None, d_ff=None, dropout=0.1, activation='gelu', n_layers=1):
        
        self.layers = nn.ModuleList([_TSTEncoderLayer(q_len, d_model, n_heads=n_heads, d_k=d_k, d_v=d_v, d_ff=d_ff, dropout=dropout, 
                                                            activation=activation) for i in range(n_layers)])

    def forward(self, src):
        output = src
        for mod in self.layers: output = mod(output)
        return output

In [59]:
#|export
class TST(Module):
    def __init__(self, c_in:int, c_out:int, seq_len:int, max_seq_len:Optional[int]=None, 
                 n_layers:int=3, d_model:int=128, n_heads:int=16, d_k:Optional[int]=None, d_v:Optional[int]=None,  
                 d_ff:int=256, dropout:float=0.1, act:str="gelu", fc_dropout:float=0., 
                 y_range:Optional[tuple]=None, verbose:bool=False, **kwargs):
        r"""TST (Time Series Transformer) is a Transformer that takes continuous time series as inputs.
        As mentioned in the paper, the input must be standardized by_var based on the entire training set.
        Args:
            c_in: the number of features (aka variables, dimensions, channels) in the time series dataset.
            c_out: the number of target classes.
            seq_len: number of time steps in the time series.
            max_seq_len: useful to control the temporal resolution in long time series to avoid memory issues.
            d_model: total dimension of the model (number of features created by the model)
            n_heads:  parallel attention heads.
            d_k: size of the learned linear projection of queries and keys in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.
            d_v: size of the learned linear projection of values in the MHA. Usual values: 16-512. Default: None -> (d_model/n_heads) = 32.
            d_ff: the dimension of the feedforward network model.
            dropout: amount of residual dropout applied in the encoder.
            act: the activation function of intermediate layer, relu or gelu.
            n_layers: the number of sub-encoder-layers in the encoder.
            fc_dropout: dropout applied to the final fully connected layer.
            y_range: range of possible y values (used in regression tasks).
            kwargs: nn.Conv1d kwargs. If not {}, a nn.Conv1d with those kwargs will be applied to original time series.

        Input shape:
            bs (batch size) x nvars (aka features, variables, dimensions, channels) x seq_len (aka time steps)
        """
        self.c_out, self.seq_len = c_out, seq_len
        
        # Input encoding
        q_len = seq_len
        self.new_q_len = False
        if max_seq_len is not None and seq_len > max_seq_len: # Control temporal resolution
            self.new_q_len = True
            q_len = max_seq_len
            tr_factor = math.ceil(seq_len / q_len)
            total_padding = (tr_factor * q_len - seq_len)
            padding = (total_padding // 2, total_padding - total_padding // 2)
            self.W_P = nn.Sequential(Pad1d(padding), Conv1d(c_in, d_model, kernel_size=tr_factor, padding=0, stride=tr_factor))
            pv(f'temporal resolution modified: {seq_len} --> {q_len} time steps: kernel_size={tr_factor}, stride={tr_factor}, padding={padding}.\n', verbose)
        elif kwargs:
            self.new_q_len = True
            t = torch.rand(1, 1, seq_len)
            q_len = nn.Conv1d(1, 1, **kwargs)(t).shape[-1]
            self.W_P = nn.Conv1d(c_in, d_model, **kwargs) # Eq 2
            pv(f'Conv1d with kwargs={kwargs} applied to input to create input encodings\n', verbose)
        else:
            self.W_P = nn.Linear(c_in, d_model) # Eq 1: projection of feature vectors onto a d-dim vector space

        # Positional encoding
        W_pos = torch.empty((q_len, d_model), device=default_device())
        nn.init.uniform_(W_pos, -0.02, 0.02)
        self.W_pos = nn.Parameter(W_pos, requires_grad=True)

        # Residual dropout
        self.dropout = nn.Dropout(dropout)

        # Encoder
        self.encoder = _TSTEncoder(q_len, d_model, n_heads, d_k=d_k, d_v=d_v, d_ff=d_ff, dropout=dropout, activation=act, n_layers=n_layers)
        self.flatten = Flatten()
        
        # Head
        self.head_nf = q_len * d_model
        self.head = self.create_head(self.head_nf, c_out, act=act, fc_dropout=fc_dropout, y_range=y_range)

    def create_head(self, nf, c_out, act="gelu", fc_dropout=0., y_range=None, **kwargs):
        layers = [get_activation_fn(act), Flatten()]
        if fc_dropout: layers += [nn.Dropout(fc_dropout)]
        layers += [nn.Linear(nf, c_out)]
        if y_range: layers += [SigmoidRange(*y_range)]
        return nn.Sequential(*layers)    
        

    def forward(self, x:Tensor, mask:Optional[Tensor]=None) -> Tensor:  # x: [bs x nvars x q_len]

        # Input encoding
        if self.new_q_len: u = self.W_P(x).transpose(2,1) # Eq 2        # u: [bs x d_model x q_len] transposed to [bs x q_len x d_model]
        else: u = self.W_P(x.transpose(2,1)) # Eq 1                     # u: [bs x q_len x nvars] converted to [bs x q_len x d_model]

        # Positional encoding
        u = self.dropout(u + self.W_pos)

        # Encoder
        z = self.encoder(u)                                             # z: [bs x q_len x d_model]
        z = z.transpose(2,1).contiguous()                               # z: [bs x d_model x q_len]

        # Classification/ Regression head
        return self.head(z)                                             # output: [bs x c_out]

In [None]:
# bs = 32
# c_in = 9  # aka channels, features, variables, dimensions
# c_out = 2
# seq_len = 5000

# xb = torch.randn(bs, c_in, seq_len)

# # standardize by channel by_var based on the training set
# xb = (xb - xb.mean((0, 2), keepdim=True)) / xb.std((0, 2), keepdim=True)

# # Settings
# max_seq_len = 256
# d_model = 128
# n_heads = 16
# d_k = d_v = None # if None --> d_model // n_heads
# d_ff = 256
# dropout = 0.1
# activation = "gelu"
# n_layers = 3
# fc_dropout = 0.1
# kwargs = {}

# model = TST(c_in, c_out, seq_len, max_seq_len=max_seq_len, d_model=d_model, n_heads=n_heads,
#             d_k=d_k, d_v=d_v, d_ff=d_ff, dropout=dropout, activation=activation, n_layers=n_layers,
#             fc_dropout=fc_dropout, **kwargs)
# test_eq(model.to(xb.device)(xb).shape, [bs, c_out])
# print(f'model parameters: {count_parameters(model)}')

In [18]:
#|eval: false
#|hide
from tsai.export import get_nb_name; nb_name = get_nb_name(locals())
from tsai.imports import create_scripts; create_scripts(nb_name)

ModuleNotFoundError: ignored

In [56]:
#@title Train TST
def evaluate(net, loader, criterion, batch_size):
  accurate = 0
  total = 0

  total_err = 0
  total_loss = 0
  total_epoch = 0
  enum = 1
  wavelet = 'db4'
  level = 5
  for images, labels, note in iter(loader):
    #############################################
    #To Enable GPU Usage
    coeffs = pywt.wavedec(images, wavelet, level=level, axis=1)
    denoised_coeffs = [pywt.threshold(data=c, mode='soft', value=0.1) for c in coeffs]
    images = pywt.waverec(denoised_coeffs, wavelet, axis=1)

    images = torch.tensor(images).float()
    if use_cuda and torch.cuda.is_available():
      images = images.cuda()
      labels = labels.cuda()
      note = note.cuda()
    #############################################
    output = net(images.transpose(1,2),note)
      
    loss = criterion(output,W[labels].float())

    values, indices = torch.max(output,dim=1)
    for i in range(labels.size(0)):
      if W[labels][i][indices[i]] == 0:
        total_err += 1
    total_loss += loss.item()
    total_epoch += len(labels)
    enum = enum + 1
  err = float(total_err) / total_epoch
  loss = float(total_loss) / (enum+1)
  return err, loss

In [57]:
def train_net(net,train_data,val_data,batch_size=64,learning_rate=0.01,num_epochs=30):
  train_loader, val_loader = get_data_loader(train_data,val_data,batch_size)
  # The loss function will be Binary Cross Entropy (BCE). In this case we
  # will use the BCEWithLogitsLoss which takes unnormalized output from
  # the neural network and scalar label.
  # Optimizer will be SGD with Momentum.
  criterion = nn.BCEWithLogitsLoss()
  optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)
  # Set up some numpy arrays to store the training/test loss/erruracy
  train_err = np.zeros(num_epochs)
  train_loss = np.zeros(num_epochs)
  val_err = np.zeros(num_epochs)
  val_loss = np.zeros(num_epochs)
  wavelet = 'db4'
  level = 5
   # training
  epoch = 0 # the number of iterations
  for epoch in range(num_epochs):
    total_train_loss = 0.0
    total_train_err = 0.0
    total_epoch = 0
    enum = 0
    for images, labels, note in iter(train_loader):
      optimizer.zero_grad()
      #############################################
      #To Enable GPU Usage
      coeffs = pywt.wavedec(images, wavelet, level=level, axis=1)
      denoised_coeffs = [pywt.threshold(data=c, mode='soft', value=0.1) for c in coeffs]
      images = pywt.waverec(denoised_coeffs, wavelet, axis=1)
 
      images = torch.tensor(images).float()
      if use_cuda and torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
        note = note.cuda()
      ############################################# 

      #output = net(torch.reshape(images,(batch_size,12,250)),note)
      output = net(images.transpose(1,2),note)
      loss = criterion(output,W[labels].float())
  
      loss.backward()
      optimizer.step()

      values, indices = torch.max(output,dim=1)
      for i in range(labels.size(0)):
        if W[labels][i][indices[i]] == 0:
          total_train_err += 1
      total_train_loss += loss.item()
      total_epoch += len(labels)
      enum = enum + 1
    train_err[epoch] = float(total_train_err) / total_epoch
    train_loss[epoch] = float(total_train_loss) / (enum+1)
    val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion, batch_size)
    print(("Epoch {}: Train Error: {}, Train loss: {} |"+
               "Validation Error: {}, Validation loss: {}").format(
                   epoch + 1,
                   train_err[epoch],
                   train_loss[epoch],
                   val_err[epoch],
                   val_loss[epoch]))
  # plotting
  plt.title("Training Curve")
  plt.plot(np.arange(1,num_epochs+1,1), train_loss, label="Train")
  plt.plot(np.arange(1,num_epochs+1,1), val_loss, label="Validation")
  plt.xlabel("Iterations")
  plt.ylabel("Loss")
  plt.legend(loc='best')
  plt.show()

  plt.title("Training Curve")
  plt.plot(np.arange(1,num_epochs+1,1), train_err, label="Train")
  plt.plot(np.arange(1,num_epochs+1,1), val_err, label="Validation")
  plt.xlabel("Iterations")
  plt.ylabel("Training Error")
  plt.legend(loc='best')
  plt.show()

In [60]:
bs = 4
c_in = 12  # aka channels, features, variables, dimensions
c_out = 5
seq_len = 250

xb = torch.randn(bs, c_in, seq_len)

# standardize by channel by_var based on the training set
xb = (xb - xb.mean((0, 2), keepdim=True)) / xb.std((0, 2), keepdim=True)

# Settings
max_seq_len = 250
d_model = 64
n_heads = 12
d_k = d_v = None # if None --> d_model // n_heads
d_ff = 64
dropout = 0.2
act = "gelu"
n_layers = 4
fc_dropout = 0.2
kwargs = {}
# kwargs = dict(kernel_size=5, padding=2)

model = TST(c_in, c_out, seq_len, max_seq_len=max_seq_len, d_model=d_model, n_heads=n_heads,
            d_k=d_k, d_v=d_v, d_ff=d_ff, dropout=dropout, act=act, n_layers=n_layers,
            fc_dropout=fc_dropout, **kwargs)
test_eq(model.to(xb.device)(xb).shape, [bs, c_out])
print(f'model parameters: {count_parameters(model)}')

model parameters: 8418250


In [44]:
import pywt

In [45]:
use_cuda = True
#{['NORM']:0,['MI']:1,['STTC']:2,['CD']:3,['HYP']:4, ['HYP','MI']:5, ['HYP','CD']:6,
 #                   ['HYP','STTC']:7,['MI','CD']:8,['MI','STTC']:9,['CD','STTC']:10,['HYP','MI','STTC']:11,
  #                  ['HYP','MI','CD']:12, ['HYP','MI','CD','STTC']:13}
W = np.array([[1,0,0,0,0],[0,1,0,0,0],[0,0,1,0,0],[0,0,0,1,0],[0,0,0,0,1],[0,1,0,0,1],[0,0,0,1,1],[0,0,1,0,1],[0,1,0,1,0],
              [0,1,1,0,0],[0,0,1,1,0],[0,1,1,0,1],[0,1,0,1,1],[0,1,1,1,0],[0,0,1,1,1],[0,1,1,1,1],[1,0,0,1,0],[1,0,1,1,0],[1,0,1,0,0],
              [1,0,0,0,1],[1,1,0,0,0],[1,1,0,0,1],[1,0,0,1,1],[1,1,0,1,1],[1,1,1,1,1],[1,0,1,0,1],[1,1,1,0,0],[1,1,1,0,1],[1,0,1,1,1],[1,1,1,1,0]])
W = torch.tensor(W)
if use_cuda and torch.cuda.is_available():
  W = W.cuda()

In [54]:
model.head

Sequential(
  (0): GELU(approximate='none')
  (1): fastai.layers.Flatten(full=False)
  (2): Dropout(p=0.2, inplace=False)
  (3): Linear(in_features=16000, out_features=5, bias=True)
)

In [61]:
#@title Instantiate & Test
use_cuda = True
if use_cuda and torch.cuda.is_available():
  model.cuda()
  print('CUDA is available!  Training on GPU ...')
else:
  print('CUDA is not available.  Training on CPU ...')

train_net(model,train_data,val_data,batch_size=2,learning_rate=0.0001,num_epochs=30)

CUDA is available!  Training on GPU ...
Epoch 1: Train Error: 0.660377358490566, Train loss: 0.6835143171844229 |Validation Error: 0.6785714285714286, Validation loss: 0.6511042909375553
Epoch 2: Train Error: 0.6448390677025527, Train loss: 0.6641221145349266 |Validation Error: 0.6339285714285714, Validation loss: 0.6314605680005304
Epoch 3: Train Error: 0.6293007769145395, Train loss: 0.6471123232514457 |Validation Error: 0.6339285714285714, Validation loss: 0.6134859395438227
Epoch 4: Train Error: 0.607103218645949, Train loss: 0.6293629577993292 |Validation Error: 0.625, Validation loss: 0.5969511917952833
Epoch 5: Train Error: 0.5926748057713651, Train loss: 0.6131278008750055 |Validation Error: 0.6696428571428571, Validation loss: 0.5847113615479963
Epoch 6: Train Error: 0.5915649278579356, Train loss: 0.6030370023398273 |Validation Error: 0.5982142857142857, Validation loss: 0.57050896513051
Epoch 7: Train Error: 0.5693673695893452, Train loss: 0.5945927736769735 |Validation Erro

KeyboardInterrupt: ignored

In [36]:
model

TST(
  (W_P): Linear(in_features=12, out_features=128, bias=True)
  (dropout): Dropout(p=0.1, inplace=False)
  (encoder): _TSTEncoder(
    (layers): ModuleList(
      (0): _TSTEncoderLayer(
        (self_attn): _MultiHeadAttention(
          (W_Q): Linear(in_features=128, out_features=120, bias=False)
          (W_K): Linear(in_features=128, out_features=120, bias=False)
          (W_V): Linear(in_features=128, out_features=120, bias=False)
          (W_O): Linear(in_features=120, out_features=128, bias=False)
        )
        (dropout_attn): Dropout(p=0.1, inplace=False)
        (batchnorm_attn): Sequential(
          (0): Transpose(1, 2)
          (1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): Transpose(1, 2)
        )
        (ff): Sequential(
          (0): Linear(in_features=128, out_features=128, bias=True)
          (1): GELU(approximate='none')
          (2): Dropout(p=0.1, inplace=False)
          (3): Linear(in_features=1