#  ECG Data Classification with MINA

## Overview


In this project, we will implement an advanced CNN+RNN model with attention mechanism to classify ECG recordings. Specifically, we face a binary classification problem, and the goal is to distinguish atrial fibrillation (AF), an alternative rhythm, from the normal sinus rhythm. 

In [156]:
import os
import random
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

# set seed
seed = 24
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)

# define data path
DATA_PATH = "lib/data/"

## 1 ECG Data 

We will be using a fraction of the data in the public [Physionet 2017 Challenge](https://physionet.org/content/challenge-2017/1.0.0/). More details can be found in the link.

ECG recordings were sampled at 300Hz, and for the purpose of this task, the data we use is separated into 10-second-segments. 

### 1.1 Preprocessing

Because the preprocessing of the data requires a tremendous amount of memory and time, here the data has already been preprocessed. 

Specifically, for each raw data (an ECG recording sampled at 300Hz), we did the following:
1. split the dataset into training/validation/test sets with a ratio of [placeholder]
2. for each recording, we normalize the data to have a mean of 0 and a standard deviation of 1
3. slide and cut the recording into overlapping 10-second-segments (stride = $\frac{5}{3}$ second for class 0, and $\frac{5}{30}$ second for class 1 to oversample).
4. use FIR bandpass filter to transform the data from 1 channel to 4 channels.

The last step of the data preprocessing is computing the knowledge. As we can see below, the AF signals exhibit different patterns at different levels. We computed knowledge features at different levels to guide the attention mechanism. More details are in Section 2.
![Beat/Rhythm/Frequency](img/Data.png)

### 1.2 Load the Data

Due to the resource constraints, the data and knowledge features have already been computed. Let's load them below.

In [157]:
train_dict = pd.read_pickle(os.path.join(DATA_PATH, 'train.pkl'))
test_dict = pd.read_pickle(os.path.join(DATA_PATH, 'test.pkl'))

print(f"There are {len(train_dict['Y'])} training data, {len(test_dict['Y'])} test data")
print(f"Shape of X: {train_dict['X'][:, 0,:].shape} = (#channels, n)")
print(f"Shape of beat feature: {train_dict['K_beat'][:, 0, :].shape} = (#channels, n)")
print(f"Shape of rhythm feature: {train_dict['K_rhythm'][:, 0, :].shape} = (#channels, M)")
print(f"Shape of frequency feature: {train_dict['K_freq'][:, 0, :].shape} = (#channels, 1)")

There are 1696 training data, 425 test data
Shape of X: (4, 3000) = (#channels, n)
Shape of beat feature: (4, 3000) = (#channels, n)
Shape of rhythm feature: (4, 60) = (#channels, M)
Shape of frequency feature: (4, 1) = (#channels, 1)


In [4]:
# some exploration
print(train_dict.keys())

dict_keys(['X', 'Y', 'K_beat', 'K_rhythm', 'K_freq'])


In [5]:
# some exploration
for key, value in train_dict.items():
    print(key, ": ", value.shape)

X :  (4, 1696, 3000)
Y :  (1696,)
K_beat :  (4, 1696, 3000)
K_rhythm :  (4, 1696, 60)
K_freq :  (4, 1696, 1)


In [6]:
train_dict['Y'][0]

0

In [11]:
train_dict['X'][:, 1, :]#.squeeze().shape#

(4, 3000)

We need to define a ECGDataset class, and then define the DataLoader as well. 

In [158]:
from torch.utils.data import Dataset

class ECGDataset(Dataset):

    def __init__(self, data_dict):

        self.data = data_dict


    def __len__(self):

        return len(self.data['Y'])


    def __getitem__(self, i):
        """
        Generates one sample of data: return the ((X, K_beat, K_rhythm, K_freq), Y) for the i-th data.
        """
        X = self.data['X'][:, i, :]
        Y = self.data['Y'][i]
        K_beat = self.data['K_beat'][:, i, :]
        K_rhythm = self.data['K_rhythm'][:, i, :]
        K_freq = self.data['K_freq'][:, i, :]
        return ((X, K_beat, K_rhythm, K_freq), Y)
        
        
from torch.utils.data import DataLoader
def load_data(dataset, batch_size=128):
    """
    Return a DataLoader instance basing on a Dataset instance, with batch_size specified.
    Note that since the data has already been shuffled, we set shuffle=False
    """
    def my_collate(batch):
        """
        :param batch: this is essentially [dataset[i] for i in [...]]
        batch[i] should be ((Xi, Ki_beat, Ki_rhythm, Ki_freq), Yi)
        output: ((X, K_beat, K_rhythm, K_freq), Y)
            each output variable is a batched version of what's in the input *batch*
            For each output variable - it should be either float tensor or long tensor (for Y). If applicable, channel dim precedes batch dim
            e.g. the shape of each Xi is (# channels, n). In the output, X should be of shape (# channels, batch_size, n)
        """
        X_data, Y = zip(*batch)
        X, K_beat, K_rhythm, K_freq = zip(*X_data)
        X = torch.tensor(X, dtype=torch.float)
        X = torch.transpose(X, 1, 0)
        X = X.contiguous()
        K_beat = torch.tensor(K_beat, dtype=torch.float)
        K_beat = torch.transpose(K_beat, 1, 0)
        K_beat = K_beat.contiguous()
        K_rhythm = torch.tensor(K_rhythm, dtype=torch.float)
        K_rhythm = torch.transpose(K_rhythm, 1, 0)
        K_freq = torch.tensor(K_freq, dtype=torch.float)
        K_freq = torch.transpose(K_freq, 1, 0)
        Y = torch.tensor(Y, dtype=torch.long)
        return (X, K_beat, K_rhythm, K_freq), Y

    return torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False, collate_fn=my_collate)


train_loader = load_data(ECGDataset(train_dict))
test_loader = load_data(ECGDataset(test_dict))

In [42]:
# example to understand batch in my_collate():
batch = [(('a', 'b', 'c', 'd'), 'e'), ((1, 2, 3, 4), 5)]
X_data, Y = zip(*batch)
X, K_beat, K_rhythm, K_freq = zip(*X_data)

In [44]:
list(zip(*batch))

[(('a', 'b', 'c', 'd'), (1, 2, 3, 4)), ('e', 5)]

In [43]:
Y

('e', 5)

In [45]:
list(zip(*X_data))

[('a', 1), ('b', 2), ('c', 3), ('d', 4)]

In [46]:
X

('a', 1)

In [47]:
K_beat

('b', 2)

In [48]:
K_rhythm

('c', 3)

In [49]:
K_freq

('d', 4)

In [9]:
# example to better understand the data

loader_iter = iter(train_loader)
(X, K_beat, K_rhythm, K_freq), Y = next(loader_iter)
print(X.shape)
print(Y.shape)
print(K_beat.shape)
print(K_rhythm.shape)
print(K_freq.shape)

In [65]:
print(X.shape)
X

torch.Size([4, 128, 3000])


tensor([[[ 1.2629e+00,  1.2760e+00,  1.2936e+00,  ..., -1.9800e-01,
          -4.8844e-02,  1.7489e-01],
         [ 3.7879e-01,  3.3828e-01,  3.0588e-01,  ...,  2.2322e-02,
          -1.0085e-02, -3.4389e-02],
         [-1.3664e-01, -1.5056e-01, -1.7144e-01,  ..., -4.3687e-03,
          -2.5253e-02, -3.9176e-02],
         ...,
         [-1.9771e-01, -2.1619e-01, -2.3204e-01,  ..., -1.5282e-01,
          -1.6074e-01, -1.6602e-01],
         [-2.6389e-01, -2.7908e-01, -2.9428e-01,  ...,  2.9059e-01,
           2.6781e-01,  2.7540e-01],
         [ 6.4430e-01,  5.6095e-01,  4.7760e-01,  ...,  2.3026e+00,
           1.5831e+00,  8.1101e-01]],

        [[ 6.5649e-03,  1.9695e-02,  3.2848e-02,  ..., -1.0684e-01,
          -1.0704e-01, -1.0531e-01],
         [ 1.9691e-03,  5.6763e-03,  8.9659e-03,  ..., -1.7435e-01,
          -1.7249e-01, -1.7095e-01],
         [-7.1029e-04, -2.1959e-03, -3.8469e-03,  ...,  2.0905e-02,
           2.0541e-02,  2.0000e-02],
         ...,
         [-1.0278e-03, -3

In [66]:
print(Y.shape)
Y

torch.Size([128])


tensor([0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1,
        0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1,
        0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1,
        0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0,
        1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1,
        0, 0, 0, 1, 1, 0, 0, 0])

In [67]:
print(K_beat.shape)
K_beat

torch.Size([4, 128, 3000])


tensor([[[0.0000e+00, 1.3161e-02, 1.7548e-02,  ..., 5.7031e-02,
          1.4916e-01, 2.2374e-01],
         [0.0000e+00, 4.0508e-02, 3.2406e-02,  ..., 2.4305e-02,
          3.2406e-02, 2.4305e-02],
         [0.0000e+00, 1.3923e-02, 2.0884e-02,  ..., 3.4807e-02,
          2.0884e-02, 1.3923e-02],
         ...,
         [0.0000e+00, 1.8484e-02, 1.5843e-02,  ..., 5.2811e-03,
          7.9217e-03, 5.2811e-03],
         [0.0000e+00, 1.5191e-02, 1.5191e-02,  ..., 4.5574e-02,
          2.2787e-02, 7.5957e-03],
         [0.0000e+00, 8.3353e-02, 8.3353e-02,  ..., 5.5715e-01,
          7.1947e-01, 7.7211e-01]],

        [[0.0000e+00, 1.3130e-02, 1.3153e-02,  ..., 1.2912e-03,
          2.0594e-04, 1.7347e-03],
         [0.0000e+00, 3.7072e-03, 3.2896e-03,  ..., 2.1735e-03,
          1.8561e-03, 1.5420e-03],
         [0.0000e+00, 1.4856e-03, 1.6511e-03,  ..., 7.5172e-05,
          3.6390e-04, 5.4106e-04],
         ...,
         [0.0000e+00, 2.1409e-03, 2.2971e-03,  ..., 2.6491e-03,
          2.690

In [68]:
print(K_rhythm.shape)
K_rhythm

torch.Size([4, 128, 60])


tensor([[[2.2597e-01, 2.5406e-01, 1.5397e-02,  ..., 3.1772e-01,
          1.9605e-02, 1.1986e-01],
         [1.4608e-01, 1.3917e+00, 4.0025e-01,  ..., 1.3893e-01,
          2.2768e-01, 1.4389e+00],
         [1.9493e+00, 2.7210e-01, 8.9493e-02,  ..., 1.7915e+00,
          3.0423e-01, 1.1694e-01],
         ...,
         [7.4320e-02, 6.4594e-01, 2.3840e-01,  ..., 3.7538e-02,
          6.7724e-02, 5.9460e-01],
         [1.5563e+00, 2.8111e-01, 7.0884e-01,  ..., 1.7832e+00,
          3.9169e-01, 1.0505e-01],
         [3.1824e-01, 1.1837e-02, 2.0031e-01,  ..., 2.0892e-02,
          7.1600e-02, 9.7305e-01]],

        [[1.9403e-01, 9.1282e-02, 5.8968e-02,  ..., 7.0103e-02,
          3.8834e-02, 1.2493e-02],
         [3.1508e-02, 9.9468e-02, 4.6085e-02,  ..., 3.7933e-02,
          1.1804e-02, 1.2178e-01],
         [1.2805e-01, 6.1724e-02, 8.1987e-03,  ..., 1.4678e-01,
          6.6127e-02, 4.4027e-02],
         ...,
         [2.2237e-02, 4.9779e-02, 2.1465e-02,  ..., 2.7595e-02,
          9.154

In [69]:
print(K_freq.shape)
K_freq

torch.Size([4, 128, 1])


tensor([[[5.5387e+00],
         [1.0000e+01],
         [8.2317e+00],
         [7.0163e+00],
         [1.2804e+00],
         [9.6590e+00],
         [7.0819e-01],
         [8.6440e-01],
         [6.8304e+00],
         [9.1337e+00],
         [9.9332e+00],
         [2.2602e+01],
         [9.7953e+00],
         [8.1712e+00],
         [6.8707e+00],
         [2.9273e+00],
         [9.3838e+00],
         [8.8979e+00],
         [8.8408e+00],
         [1.9380e+01],
         [8.5938e+00],
         [8.1198e+00],
         [9.4539e+00],
         [7.0049e+00],
         [5.8857e+00],
         [8.2901e+00],
         [9.8349e+00],
         [1.6851e+01],
         [9.5970e+00],
         [3.1653e+00],
         [1.2374e+01],
         [6.9046e+00],
         [7.4591e+00],
         [7.5990e+00],
         [9.3953e+00],
         [5.6439e+00],
         [9.2764e+00],
         [9.8193e+00],
         [9.4656e+00],
         [1.2144e+00],
         [1.2128e+00],
         [9.1712e+00],
         [8.3574e+00],
         [4

## 2 Model Defintions

Now, let us implement a model that involves RNN, CNN and attention mechanism. More specifically, we will implement [MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals](https://www.ijcai.org/Proceedings/2019/0816.pdf).

### 2.1 Knowledge-guided attention
Knowledge-guided attention is an attention mechanism that introduces prior knowledge (such as features proposed by human experts) in the features used by the attention mechanism. We will first define the general KnowledgeAttn module, and use it at different levels later.

There are three steps:
* 1\. concatenate the input ($X$) and knowledge ($K$).
* 2\. pass it through a linear layer, a tanh, another linear layer, and softmax: $attn = softmax(V^\top \tanh(W^\top \begin{bmatrix}X\\K\end{bmatrix}))$
* 3\. use attention values to sum $X$: $output = \sum_{i=1}^D attn_i x_i$ where $attn_i$ is a scalar and $x_i$ is a vector.

In [160]:
import torch.nn.functional as F

class KnowledgeAttn(nn.Module):
    def __init__(self, input_features, attn_dim):
        """
        This is the general knowledge-guided attention module.
        It will transform the input and knowledge with 2 linear layers, computes attention, and then aggregate.
        :param input_features: the number of features for each
        :param attn_dim: the number of hidden nodes in the attention mechanism
        We define the following 2 linear layers WITHOUT bias (with the names provided)
                att_W: a Linear layer of shape (input_features + n_knowledge, attn_dim)
                att_v: a Linear layer of shape (attn_dim, 1)
            init the weights using self.init() (already given)
        """
        super(KnowledgeAttn, self).__init__()
        self.input_features = input_features
        self.attn_dim = attn_dim
        self.n_knowledge = 1

        self.att_W = nn.Linear(input_features + self.n_knowledge, attn_dim, bias=False)
        self.att_v = nn.Linear(attn_dim, 1, bias=False)

        self.init()

    def init(self):
        nn.init.normal_(self.att_W.weight)
        nn.init.normal_(self.att_v.weight)

    @classmethod
    def attention_sum(cls, x, attn):
        """
        :param x: of shape (-1, D, nfeatures)
        :param attn: of shape (-1, D, 1)
        Eeturn the weighted sum of x along the middle axis with weights even in attn. output shoule be (-1, nfeatures)
        """
        return torch.sum(attn*x, dim=1)


    def forward(self, x, k):
        """
        :param x: shape of (-1, D, input_features)
        :param k: shape of (-1, D, 1)
        :return:
            out: shape of (-1, input_features), the aggregated x
            attn: shape of (-1, D, 1)
        Steps:
            concatenate the input x and knowledge k together (on the last dimension)
            pass the concatenated output through the learnable Linear transforms
                first att_W, then tanh, then att_v
                the output shape should be (-1, D, 1)
            to get attention values, apply softmax on the output of linear layer
            aggregate x using the attention values via self.attention_sum, and return
        """
        concat = torch.cat((x, k), -1)
        concat = self.att_v(torch.tanh(self.att_W(concat)))
        attn = F.softmax(concat, dim=1)
        out = self.attention_sum(x, attn)
        return out, attn

In [77]:
# example to understand the model:
input_features, attn_dim = 3000, 16
x = X
k = K_freq
print(x.shape)
print(k.shape)

torch.Size([4, 128, 3000])
torch.Size([4, 128, 1])


In [78]:
n_knowledge = 1
att_W = nn.Linear(input_features + n_knowledge, attn_dim, bias=False)
att_v = nn.Linear(attn_dim, 1, bias=False)
nn.init.normal_(att_W.weight)
nn.init.normal_(att_v.weight)

Parameter containing:
tensor([[-0.8838,  2.8390,  0.3181, -0.9566, -0.6491, -0.3402, -1.2464, -0.0099,
         -0.0447, -0.8738,  0.8336,  0.4724, -0.7647, -1.2130,  2.8262,  1.4449]],
       requires_grad=True)

In [80]:
concat = torch.cat((x, k), -1)
print(concat.shape)
concat = att_v(torch.tanh(att_W(concat)))
print(concat.shape)
attn = F.softmax(concat, dim=1)
print(attn.shape)
out = torch.sum(attn*x, dim=1)
print(out.shape)

torch.Size([4, 128, 3001])
torch.Size([4, 128, 1])
torch.Size([4, 128, 1])
torch.Size([4, 3000])


In [22]:
m = KnowledgeAttn(2, 2)
m.att_W.weight.data = torch.tensor([[0.3298,  0.7045, -0.1067],
                                    [0.9656,  0.3090,  1.2627]], requires_grad=True)
m.att_v.weight.data = torch.tensor([[-0.2368,  0.5824]], requires_grad=True)

x = torch.tensor([[[-0.6898, -0.9098], [0.0230,  0.2879], [-0.2534, -0.3190]],
                  [[ 0.5412, -0.3434], [0.0289, -0.2837], [-0.4120, -0.7858]]])
k = torch.tensor([[ 0.5469,  0.3948, -1.1430], [0.7815, -1.4787, -0.2929]]).unsqueeze(2)
out, attn = m(x, k)

tout = torch.tensor([[-0.2817, -0.2531], [0.2144, -0.4387]])
tattn = torch.tensor([[[0.3482], [0.4475], [0.2043]],
                      [[0.5696], [0.1894], [0.2410]]])

In [23]:
x = torch.tensor([[[-0.6898, -0.9098], [0.0230,  0.2879], [-0.2534, -0.3190]],
                  [[ 0.5412, -0.3434], [0.0289, -0.2837], [-0.4120, -0.7858]]])
print(x.shape)
k = torch.tensor([[ 0.5469,  0.3948, -1.1430], [0.7815, -1.4787, -0.2929]]).unsqueeze(2)
print(k.shape)

torch.Size([2, 3, 2])
torch.Size([2, 3, 1])


In [161]:
def float_tensor_equal(a, b, eps=1e-3):
    return torch.norm(a-b).abs().max().tolist() < eps

def testKnowledgeAttn():
    m = KnowledgeAttn(2, 2)
    m.att_W.weight.data = torch.tensor([[0.3298,  0.7045, -0.1067],
                                        [0.9656,  0.3090,  1.2627]], requires_grad=True)
    m.att_v.weight.data = torch.tensor([[-0.2368,  0.5824]], requires_grad=True)

    x = torch.tensor([[[-0.6898, -0.9098], [0.0230,  0.2879], [-0.2534, -0.3190]],
                      [[ 0.5412, -0.3434], [0.0289, -0.2837], [-0.4120, -0.7858]]])
    k = torch.tensor([[ 0.5469,  0.3948, -1.1430], [0.7815, -1.4787, -0.2929]]).unsqueeze(2)
    out, attn = m(x, k)

    tout = torch.tensor([[-0.2817, -0.2531], [0.2144, -0.4387]])
    tattn = torch.tensor([[[0.3482], [0.4475], [0.2043]],
                          [[0.5696], [0.1894], [0.2410]]])
    assert float_tensor_equal(attn, tattn), "The attention values are wrong"
    assert float_tensor_equal(out, tout), "output of the attention module is wrong"
    
testKnowledgeAttn()

## 2.2 MINA

We will now use the knowledge-guided attention mechanism to construct MINA. The overall structure is show below. From "Input" to "Sliding Window Segmentation" has already been done in the data preprocessing part, and in this section we will need to define things above "Segment"
![MINAstructure](img/MINA_structure.png)


Here, CNN (`BeatNet`) is used to capture beat information, Bi-LSTM (`RhythmNet`) is used to capture rhythm level information, and the from $c^{(i)}$ to $p$ is aggregating frequency levle infomration (`FreqNet`). Note that although the input has 4 channels, we actually need to handle each channel separately because they have different meanings after we did the FIR. Thus, we will need 4 `BeatNet`s, 4 `RhythmNet`s, and 1 `FreqNet`. 
 

MINA has three different knowledge guided attention mechanisms:
 - Beat Level $K_{beat}$: extract beat knowledge which is represented by the first-order difference and a convolutional operation $Conv_\alpha$ for each segment
 - Rhythm Level $K_{rhythn}$: extract rhythm features represented by the standard deviation on each segment
 - Frequency Level $K_{freq}$: frequency features are represented by the power spectral density (PSD), which is a popular measure of energy in signal processing.

### 2.2.1 BeatNet
For BeatNet, the attention $\alpha$ is computed by the following:
    $$\alpha = softmax(V_\alpha^\top \tanh(W_\alpha^\top \begin{bmatrix} \mathbf{L}\\\mathbf{K}_{beat} \end{bmatrix}))$$
Here, $L$ is output by the convolutional layers, and $K_{beat}$ is the computed beat level knowledge features.

In [162]:
class BeatNet(nn.Module):
    #Attention for the CNN step/ beat level/local information
    def __init__(self, n=3000, T=50,
                 conv_out_channels=64):
        """
        :param n: size of each 10-second-data
        :param T: size of each smaller segment used to capture local information in the CNN stage
        :param conv_out_channels: also called number of filters/kernels
        We will define a network that does two things. Specifically:
            1. use one 1-D convolutional layer to capture local informatoin, on x and k_beat (see forward())
                conv: The kernel size should be set to 32, and the number of filters should be set to *conv_out_channels*. Stride should be *conv_stride*
                conv_k: same as conv, except that it has only 1 filter instead of *conv_out_channels*
            2. an attention mechanism to aggregate the convolution outputs. Specifically:
                attn: KnowledgeAttn with input_features equaling conv_out_channels, and attn_dim=att_cnn_dim
        """
        super(BeatNet, self).__init__()
        self.n, self.M, self.T = n, int(n/T), T
        self.conv_out_channels = conv_out_channels
        self.conv_kernel_size = 32
        self.conv_stride = 2
        #Define conv and conv_k, the two Conv1d modules
        self.conv = nn.Conv1d(1, self.conv_out_channels, kernel_size=self.conv_kernel_size, stride=self.conv_stride)
        self.conv_k = nn.Conv1d(1, 1, kernel_size=self.conv_kernel_size, stride=self.conv_stride)

        self.att_cnn_dim = 8
        #Define attn, the KnowledgeAttn module
        self.attn = KnowledgeAttn(self.conv_out_channels, attn_dim=self.att_cnn_dim)


    def forward(self, x, k_beat):
        """
        :param x: shape (batch, n)
        :param k_beat: shape (batch, n)
        :return:
            out: shape (batch, M, self.conv_out_channels)
            alpha: shape (batch * M, N, 1) where N is a result of convolution
        Steps:
            [Given] reshape the data - convert x/k_beat of shape (batch, n) to (batch * M, 1, T), where n = MT
                If you define the data carefully, you could use torch.Tensor.view() for all reshapes in this HW
            apply convolution on x and k_beat
                pass the reshaped x through self.conv, and then ReLU
                pass the reshaped k_beat through self.conv_k, and then ReLU
            (at this step, you might need to swap axix 1 & 2 to align the dimensions depending on how you defined the layers)
            pass the conv'd x and conv'd knowledge through self.attn to get the output (*out*) and attention (*alpha*)
            [Given] reshape the output *out* to be of shape (batch, M, self.conv_out_channels)
        """
        x = x.view(-1, self.T).unsqueeze(1)
        k_beat = k_beat.view(-1, self.T).unsqueeze(1)
        x = F.relu(self.conv(x))
        k_beat = F.relu(self.conv_k(k_beat))        
        out, alpha = self.attn(x.transpose(1,2), k_beat.transpose(1,2))
        out = out.view(-1, self.M, self.conv_out_channels)
        return out, alpha

In [109]:
# try 2.2.1

x0 = X[0]
print(x0.shape)
k_beat0 = K_beat[0]
print(k_beat0.shape)
x = x0.contiguous().view(-1, 50).unsqueeze(1)
k_beat = k_beat0.contiguous().view(-1, 50).unsqueeze(1)
print(x.shape)
print(k_beat.shape)

torch.Size([128, 3000])
torch.Size([128, 3000])
torch.Size([7680, 1, 50])
torch.Size([7680, 1, 50])


In [110]:
n=3000
T=50
M = int(n/T)
conv_out_channels=64
conv_kernel_size = 32
conv_stride = 2
conv = nn.Conv1d(1, conv_out_channels, kernel_size=conv_kernel_size, stride=conv_stride)
conv_k = nn.Conv1d(1, 1, kernel_size=conv_kernel_size, stride=conv_stride)

In [111]:
x = F.relu(conv(x))
k_beat = F.relu(conv_k(k_beat))
print(x.shape)
print(k_beat.shape)

torch.Size([7680, 64, 10])
torch.Size([7680, 1, 10])


In [112]:
att_cnn_dim = 8
att = KnowledgeAttn(conv_out_channels, attn_dim=att_cnn_dim)

In [113]:
out, attn = att(x.transpose(1,2), k_beat.transpose(1,2))
print(out.shape)
print(attn.shape)

torch.Size([7680, 64])
torch.Size([7680, 10, 1])


In [114]:
out = out.view(-1, M, conv_out_channels)
print(out.shape)

torch.Size([128, 60, 64])


In [163]:
_testm = BeatNet(12 * 34, 34, 56)
assert isinstance(_testm.conv, torch.nn.Conv1d) and isinstance(_testm.conv_k, torch.nn.Conv1d), "Should use nn.Conv1d"
assert _testm.conv.bias.shape == torch.Size([56]) and _testm.conv.weight.shape == torch.Size([56,1,32]), "conv definition is incorrect"
assert _testm.conv_k.bias.shape == torch.Size([1]) and _testm.conv_k.weight.shape == torch.Size([1, 1, 32]), "conv_k definition is incorrect"
assert isinstance(_testm.attn, KnowledgeAttn), "Should use one KnowledgeAttn Module"

_out, _alpha =_testm(torch.randn(37, 12*34), torch.randn(37, 12*34))
assert _alpha.shape == torch.Size([444,2,1]), "The attention's dimension is incorrect"
assert _out.shape==torch.Size([37, 12,56]), "The output's dimension is incorrect"
del _testm, _out, _alpha

### 2.2.2 RhythmNet
For Rhythm, the attention $\beta$ is computed by the following:
    $$\beta = softmax(V_\beta^\top \tanh(W_\beta^\top \begin{bmatrix} \mathbf{H}\\\mathbf{K}_{rhythm} \end{bmatrix}))$$
Here, $\mathbf{H}$ is output by the Bi-LSTMs, and $K_{rhythm}$ is the computed rhythm level knowledge features.

In [164]:
class RhythmNet(nn.Module):
    def __init__(self, n=3000, T=50, input_size=64, rhythm_out_size=8):
        """
        :param n: size of each 10-second-data
        :param T: size of each smaller segment used to capture local information in the CNN stage
        :param input_size: This is the same as the # of filters/kernels in the CNN part.
        :param rhythm_out_size: output size of this netowrk
        Steps: We will define a network that does two things to handle rhythms. Specifically:
            1. use a bi-directional LSTM to process the learned local representations from the CNN part
                lstm: bidirectional, 1 layer, batch_first, and hidden_size should be set to *rnn_hidden_size*
            2. an attention mechanism to aggregate the convolution outputs. Specifically:
                attn: KnowledgeAttn with input_features equaling lstm output, and attn_dim=att_rnn_dim
            3. output layers
                fc: a Linear layer making the output of shape (..., self.out_size)
                do: a Dropout layer with p=0.5
        """
        #input_size is the cnn_out_channels
        super(RhythmNet, self).__init__()
        self.n, self.M, self.T = n, int(n/T), T
        self.input_size = input_size

        self.rnn_hidden_size = 32
        ### define lstm: LSTM Input is of shape (batch size, M, input_size)
        self.lstm = nn.LSTM(input_size=self.input_size, num_layers=1, bidirectional=True, batch_first=True, hidden_size=self.rnn_hidden_size)

        ### Attention mechanism: define attn to be a KnowledgeAttn
        self.att_rnn_dim = 8
        self.attn = KnowledgeAttn(input_features=self.rnn_hidden_size*2, attn_dim=self.att_rnn_dim)

        ### Define the Dropout and fully connecte layers (fc and do)
        self.out_size = rhythm_out_size
        self.fc = nn.Linear(in_features=self.rnn_hidden_size*2, out_features=self.out_size)
        self.do = nn.Dropout(p=0.5)
        
        
    def forward(self, x, k_rhythm):
        """
        :param x: shape (batch, M, self.input_size)
        :param k_rhythm: shape (batch, M)
        :return:
            out: shape (batch, self.out_size)
            beta: shape (batch, M, 1)
        Steps:
            reshape the k_rhythm->(batch, M, 1)
            pass the reshaped x through lstm
            pass the lstm output and knowledge through attn
            pass the result through fully connected layer - ReLU - Dropout
            denote the final output as *out*, and the attention output as *beta*
        """

        k_rhythm = k_rhythm.unsqueeze(2)
        x, _ = self.lstm(x)
        out, beta = self.attn(x, k_rhythm)
        out = self.do(F.relu(self.fc(out)))

        return out, beta

In [115]:
# try 2.2.2
x = out
print(x.shape)
k_rhythm = K_rhythm[0]
print(k_rhythm.shape)

torch.Size([128, 60, 64])
torch.Size([128, 60])


In [116]:
n=3000
T=50
input_size=64
rhythm_out_size=8
rnn_hidden_size = 32

In [117]:
k_rhythm = k_rhythm.unsqueeze(2)
print(k_rhythm.shape)

torch.Size([128, 60, 1])


In [118]:
lstm = nn.LSTM(input_size=input_size, num_layers=1, bidirectional=True, batch_first=True, hidden_size=rnn_hidden_size)
x, _ = lstm(x)
print(x.shape)

torch.Size([128, 60, 64])


In [119]:
att_rnn_dim = 8
attn = KnowledgeAttn(64, attn_dim=att_rnn_dim)
out, beta = attn(x, k_rhythm)
print(out.shape)
print(beta.shape)

torch.Size([128, 64])
torch.Size([128, 60, 1])


In [120]:
fc = nn.Linear(64, rhythm_out_size)
do = nn.Dropout(p=0.5)
out = do(F.relu(fc(out)))
print(fc.weight.shape)
print(out.shape)

torch.Size([8, 64])
torch.Size([128, 8])


In [165]:
_B, _M, _T = 17, 23, 31
_testm = RhythmNet(_M * _T, _T, 37)
assert isinstance(_testm.lstm, torch.nn.LSTM), "Should use nn.LSTM"
assert _testm.lstm.bidirectional, "LSTM should be bidirectional"
assert isinstance(_testm.attn, KnowledgeAttn), "Should use one KnowledgeAttn Module"
assert isinstance(_testm.fc, nn.Linear) and _testm.fc.weight.shape == torch.Size([8,64]), "The fully connected is incorrect"
assert isinstance(_testm.do, nn.Dropout), "Dropout layer is not defined correctly"

_out, _beta = _testm(torch.randn(_B, _M, 37), torch.randn(_B, _M))
assert _beta.shape == torch.Size([_B,_M,1]), "The attention's dimension is incorrect"
assert _out.shape==torch.Size([_B, 8]), "The output's dimension is incorrect"
del _testm, _out, _beta,  _B, _M, _T

In [103]:
# exploration
_B, _M, _T = 17, 23, 31
_testm = RhythmNet(_M * _T, _T, 37)
n=_M * _T
T=_T
input_size=37
_testm.fc.weight.shape# == torch.Size([8,64])

torch.Size([8, 64])

In [104]:
_out, _beta = _testm(torch.randn(_B, _M, 37), torch.randn(_B, _M))
assert _beta.shape == torch.Size([_B,_M,1]), "The attention's dimension is incorrect"
assert _out.shape==torch.Size([_B, 8]), "The output's dimension is incorrect"
del _testm, _out, _beta,  _B, _M, _T

In [105]:
_B, _M, _T = 17, 23, 31
_testm = RhythmNet(_M * _T, _T, 37)
n=_M * _T
T=_T
input_size=37
rnn_hidden_size = 32
lstm = nn.LSTM(input_size=input_size, num_layers=1, bidirectional=True, batch_first=True, hidden_size=rnn_hidden_size)

test_x = torch.randn(_B, _M, 37)
test_k = torch.randn(_B, _M)
print(test_x.shape)
print(test_k.shape)
test_k = test_k.unsqueeze(2)
test_x, _ = lstm(test_x)
print(test_x.shape)
print(test_k.shape)
#out = self.do(F.relu(self.fc(out)))

torch.Size([17, 23, 37])
torch.Size([17, 23])
torch.Size([17, 23, 64])
torch.Size([17, 23, 1])


### 2.2.3 FreqNet
The attention $\gamma$ is computed by the following:
    $$\gamma = softmax(V_\gamma^\top \tanh(W_\gamma^\top \begin{bmatrix} \mathbf{Q}\\\mathbf{K}_{freq} \end{bmatrix}))$$
Here, $\mathbf{Q}$ is output of the RhythmNets, and $K_{freq}$ is the computed frequency level knowledge features.

In [176]:
class FreqNet(nn.Module):
    def __init__(self, n_channels=4, n=3000, T=50):
        """
        :param n_channels: number of channels (F in the paper). We will need to define this many BeatNet & RhythmNet nets.
        :param n: size of each 10-second-data
        :param T: size of each smaller segment used to capture local information in the CNN stage
        Steps: This is the main network that orchestrates the previously defined attention modules:
            1. define n_channels many BeatNet and RhythmNet modules. (Hint: use nn.ModuleList)
                beat_nets: for each beat_net, pass parameter conv_out_channel into the init()
                rhythm_nets: for each rhythm_net, pass conv_out_channel as input_size, and self.rhythm_out_size as the output size
            2. define frequency (channel) level knowledge-guided attention module
                attn: KnowledgeAttn with input_features equaling rhythm_out_size, and attn_dim=att_channel_dim
            3. output layer: a Linear layer for 2 classes output
        """
        super(FreqNet, self).__init__()
        self.n, self.M, self.T = n, int(n / T), T
        self.n_class = 2
        self.n_channels = n_channels
        self.conv_out_channels=64
        self.rhythm_out_size=8

        self.beat_nets = nn.ModuleList()
        self.rhythm_nets = nn.ModuleList()
        
        #use self.beat_nets.append() and self.rhythm_nets.append() to append 4 BeatNets/RhythmNets
        for _ in range(self.n_channels):
            self.beat_nets.append(BeatNet(n, T, conv_out_channel))
            self.rhythm_nets.append(RhythmNet(n, T, conv_out_channel, rhythm_out_size))


        self.att_channel_dim = 2
        ### Add the frequency attention module using KnowledgeAttn (attn)
        self.attn = KnowledgeAttn(input_features=self.rhythm_out_size, attn_dim=self.att_channel_dim)

        ### Create the fully-connected output layer (fc)
        self.fc = nn.Linear(in_features=self.rhythm_out_size, out_features=self.n_class)  


    def forward(self, x, k_beats, k_rhythms, k_freq):
        """
        We need to use the attention submodules to process data from each channel separately, and then pass the
            output through an attention on frequency for the final output

        :param x: shape (n_channels, batch, n)
        :param k_beats: (n_channels, batch, n)
        :param k_rhythms: (n_channels, batch, M)
        :param k_freq: (n_channels, batch, 1)
        :return:
            out: softmax output for each data point, shpae (batch, n_class)
            gama: the attention value on channels
        Steps:
            1. [Given] pass each channel of x through the corresponding beat_net, then rhythm_net.
                We will discard the attention (alpha and beta) outputs for now
                Using ModuleList for self.beat_nets/rhythm_nets is necessary for the gradient to propagate
            2. [Given] stack the output from 1 together into a tensor of shape (batch, n_channels, rhythm_out_size)
            3. pass result from 2 and k_freq through attention module, to get the aggregated result and *gama*
                You might need to do use k_freq.permute() to tweak the shape of k_freq
            4. pass aggregated result from 3 through the final fully connected layer.
            5. Apply Softmax to normalize output to a probability distribution (over 2 classes)
        """
        new_x = [None for _ in range(self.n_channels)]
        for i in range(self.n_channels):
            tx, _ = self.beat_nets[i](x[i], k_beats[i])
            new_x[i], _ = self.rhythm_nets[i](tx, k_rhythms[i])
        x = torch.stack(new_x, 1) 

        out, gama = self.attn(x, k_freq.transpose(0,1))
        out = F.softmax(self.fc(out), dim=1)

        return out, gama

In [138]:
# try 2.2.3

loader_iter = iter(train_loader)
(X, K_beat, K_rhythm, K_freq), Y = next(loader_iter)
print(X.shape)
print(Y.shape)
print(K_beat.shape)
print(K_rhythm.shape)
print(K_freq.shape)

n_channels=4
n=3000
T=50
M= int(n / T)
n_class = 2
conv_out_channel=64
rhythm_out_size=8

beat_nets = nn.ModuleList([BeatNet(n, T, conv_out_channel)]*n_channels)
rhythm_nets = nn.ModuleList([RhythmNet(n, T, conv_out_channel, rhythm_out_size)]*n_channels)

new_x = [None for _ in range(n_channels)]
for i in range(n_channels):
    #tx, _ = beat_nets[i](X[i].contiguous(), K_beat[i].contiguous())
    tx, _ = beat_nets[i](X[i], K_beat[i])
    new_x[i], _ = rhythm_nets[i](tx, K_rhythm[i])
x = torch.stack(new_x, 1)  # shape (batch, n_channels, rhythm_out_size)
print(x.shape)


torch.Size([4, 128, 3000])
torch.Size([128])
torch.Size([4, 128, 3000])
torch.Size([4, 128, 60])
torch.Size([4, 128, 1])
torch.Size([128, 4, 8])


In [139]:
# 3. pass result from 2 and k_freq through attention module, to get the aggregated result and *gama*
#     You might need to do use k_freq.permute() to tweak the shape of k_freq
att_channel_dim = 2
attn = KnowledgeAttn(input_features=rhythm_out_size, attn_dim=att_channel_dim)
out, gama = attn(x, K_freq.transpose(0,1))
print(out.shape)
print(gama.shape)

torch.Size([128, 8])
torch.Size([128, 4, 1])


In [140]:
# 4. pass aggregated result from 3 through the final fully connected layer.
# 5. Apply Softmax to normalize output to a probability distribution (over 2 classes)
fc = nn.Linear(in_features=rhythm_out_size, out_features=n_class) 
out = F.softmax(fc(out), dim=1)
print(out.shape)

torch.Size([128, 2])


In [177]:
_B, _M, _T = 17, 59, 109
_testm = FreqNet(n=_M * _T, T=_T)
assert isinstance(_testm.attn, KnowledgeAttn), "Should use one KnowledgeAttn Module"
assert isinstance(_testm.fc, nn.Linear) and _testm.fc.weight.shape == torch.Size([2,8]), "The fully connected is incorrect"
assert isinstance(_testm.beat_nets, nn.ModuleList), "beat_nets has to be a ModuleList"

_out, _gamma = _testm(torch.randn(4, _B, _M * _T), torch.randn(4, _B, _M * _T), torch.randn(4, _B, _M), torch.randn(4, _B, 1))
assert _gamma.shape == torch.Size([_B, 4, 1]), "The attention's dimension is incorrect"
assert _out.shape==torch.Size([_B, 2]), "The output's dimension is incorrect"
del _testm, _out, _gamma,  _B, _M, _T

# 3 Training and Evaluation
In this part we will define the training procedures, train the model, and evaluate the model on the test set.

In [172]:
def train_model(model, train_dataloader, n_epoch=5, lr=0.003, device=None):
    import torch.optim as optim
    """
    :param model: The instance of FreqNet that we are training
    :param train_dataloader: the DataLoader of the training data
    :param n_epoch: number of epochs to train
    :return:
        model: trained model
        loss_history: recorded training loss history - should be just a list of float
    Steps:
        Specify the optimizer (*optimizer*) to be optim.Adam
        Specify the loss function (*loss_func*) to be CrossEntropyLoss
        Within the loop, do the normal training procedures:
            pass the input through the model
            pass the output through loss_func to compute the loss
            zero out currently accumulated gradient, use loss.basckward to backprop the gradients, then call optimizer.step
    """
    device = device or torch.device('cpu')
    model.train()

    loss_history = []

    # your code here
    optimizer = optim.Adam(model.parameters(), lr=lr)
    loss_func = nn.CrossEntropyLoss()
    # raise NotImplementedError

    for epoch in range(n_epoch):
        curr_epoch_loss = []
        for (X, K_beat, K_rhythm, K_freq), Y in train_dataloader:
            # your code here
            optimizer.zero_grad()
            y_hat, _ = model(X, K_beat, K_rhythm, K_freq)
            loss = loss_func(y_hat, Y)
            loss.backward()
            optimizer.step()                        
            # raise NotImplementedError
            curr_epoch_loss.append(loss.cpu().data.numpy())
        print(f"epoch{epoch}: curr_epoch_loss={np.mean(curr_epoch_loss)}")
        loss_history += curr_epoch_loss
    return model, loss_history

def eval_model(model, dataloader, device=None):
    """
    :return:
        pred_all: prediction of model on the dataloder.
            Should be an 2D numpy float array where the second dimension has length 2.
        Y_test: truth labels. Should be an numpy array of ints
    TODO:
        evaluate the model using on the data in the dataloder.
        Add all the prediction and truth to the corresponding list
        Convert pred_all and Y_test to numpy arrays (of shape (n_data_points, 2))
    """
    device = device or torch.device('cpu')
    model.eval()
    pred_all = []
    Y_test = []
    for (X, K_beat, K_rhythm, K_freq), Y in dataloader:
        # your code here
        y_hat, _ = model(X, K_beat, K_rhythm, K_freq)
        pred_all.append(y_hat.detach().numpy())
        Y_test.append(Y)
        # raise NotImplementedError
    pred_all = np.concatenate(pred_all, axis=0)
    Y_test = np.concatenate(Y_test, axis=0)

    return pred_all, Y_test

In [132]:
out

tensor([[0.5386, 0.4614],
        [0.5399, 0.4601],
        [0.5303, 0.4697],
        [0.5342, 0.4658],
        [0.5251, 0.4749],
        [0.5348, 0.4652],
        [0.5298, 0.4702],
        [0.5361, 0.4639],
        [0.5379, 0.4621],
        [0.5390, 0.4610],
        [0.5416, 0.4584],
        [0.5320, 0.4680],
        [0.5447, 0.4553],
        [0.5360, 0.4640],
        [0.5387, 0.4613],
        [0.5328, 0.4672],
        [0.5432, 0.4568],
        [0.5353, 0.4647],
        [0.5306, 0.4694],
        [0.5309, 0.4691],
        [0.5337, 0.4663],
        [0.5380, 0.4620],
        [0.5355, 0.4645],
        [0.5348, 0.4652],
        [0.5321, 0.4679],
        [0.5306, 0.4694],
        [0.5327, 0.4673],
        [0.5245, 0.4755],
        [0.5369, 0.4631],
        [0.5227, 0.4773],
        [0.5372, 0.4628],
        [0.5358, 0.4642],
        [0.5298, 0.4702],
        [0.5321, 0.4679],
        [0.5351, 0.4649],
        [0.5457, 0.4543],
        [0.5315, 0.4685],
        [0.5354, 0.4646],
        [0.5

In [133]:
out[:,1]

tensor([0.4614, 0.4601, 0.4697, 0.4658, 0.4749, 0.4652, 0.4702, 0.4639, 0.4621,
        0.4610, 0.4584, 0.4680, 0.4553, 0.4640, 0.4613, 0.4672, 0.4568, 0.4647,
        0.4694, 0.4691, 0.4663, 0.4620, 0.4645, 0.4652, 0.4679, 0.4694, 0.4673,
        0.4755, 0.4631, 0.4773, 0.4628, 0.4642, 0.4702, 0.4679, 0.4649, 0.4543,
        0.4685, 0.4646, 0.4729, 0.4640, 0.4747, 0.4666, 0.4730, 0.4572, 0.4691,
        0.4717, 0.4603, 0.4688, 0.4680, 0.4613, 0.4534, 0.4575, 0.4686, 0.4642,
        0.4674, 0.4674, 0.4734, 0.4709, 0.4681, 0.4722, 0.4639, 0.4600, 0.4537,
        0.4651, 0.4663, 0.4669, 0.4635, 0.4717, 0.4707, 0.4702, 0.4670, 0.4647,
        0.4592, 0.4638, 0.4696, 0.4679, 0.4680, 0.4590, 0.4666, 0.4653, 0.4677,
        0.4620, 0.4636, 0.4628, 0.4593, 0.4559, 0.4760, 0.4618, 0.4609, 0.4740,
        0.4657, 0.4640, 0.4651, 0.4766, 0.4704, 0.4721, 0.4632, 0.4650, 0.4580,
        0.4671, 0.4642, 0.4661, 0.4671, 0.4686, 0.4680, 0.4706, 0.4761, 0.4668,
        0.4577, 0.4745, 0.4709, 0.4758, 

In [173]:
device = torch.device('cpu')
n_epoch = 4
lr = 0.003
n_channel = 4
n_dim=3000
T=50

model = FreqNet(n_channel, n_dim, T)
model = model.to(device)

model, loss_history = train_model(model, train_loader, n_epoch=n_epoch, lr=lr, device=device)
pred, truth = eval_model(model, test_loader, device=device)
#pd.to_pickle((pred, truth), "./deliverable.pkl")

epoch0: curr_epoch_loss=0.6938453316688538
epoch1: curr_epoch_loss=0.6530133485794067
epoch2: curr_epoch_loss=0.5634876489639282
epoch3: curr_epoch_loss=0.5047913193702698


In [154]:
pred

array([[0.4815316 , 0.51846844],
       [0.3961958 , 0.6038043 ],
       [0.42789656, 0.5721035 ],
       [0.40555933, 0.59444064],
       [0.39976516, 0.6002349 ],
       [0.39584902, 0.60415095],
       [0.44617122, 0.55382884],
       [0.40487388, 0.5951261 ],
       [0.40420148, 0.59579855],
       [0.40179858, 0.59820145],
       [0.41622755, 0.5837724 ],
       [0.4386548 , 0.5613452 ],
       [0.40757766, 0.59242237],
       [0.4035367 , 0.5964633 ],
       [0.40318662, 0.5968134 ],
       [0.40344015, 0.5965599 ],
       [0.39568657, 0.60431343],
       [0.39402115, 0.60597885],
       [0.41545358, 0.5845464 ],
       [0.38942513, 0.61057484],
       [0.4073527 , 0.5926474 ],
       [0.39171174, 0.6082883 ],
       [0.3933879 , 0.6066121 ],
       [0.39287063, 0.60712934],
       [0.4291128 , 0.57088727],
       [0.38944063, 0.6105594 ],
       [0.43307915, 0.5669209 ],
       [0.39747265, 0.6025273 ],
       [0.41073215, 0.58926785],
       [0.41249874, 0.5875012 ],
       [0.

In [155]:
truth

array([0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1,
       1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1,
       0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1,
       0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0,
       1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0,
       0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0,
       1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0,
       1, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1,
       0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1,
       1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1,
       0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1,

In [174]:
def evaluate_predictions(truth, pred):
    """
    TODO: Evaluate the performance of the predictoin via AUROC, and F1 score

    each prediction in pred is a vector representing [p_0, p_1].
    When defining the scores we are interesed in detecting class 1 only
    (Hint: use roc_auc_score and f1_score from sklearn.metrics, be sure to read their documentation)
    return: auroc, f1
    """
    from sklearn.metrics import roc_auc_score, f1_score

    # your code here
    y_pred = pred[:, 1]
    auroc = roc_auc_score(truth, y_pred)
    y_pred = y_pred > 0.5
    f1 = f1_score(truth, y_pred)
    # raise NotImplementedError

    return auroc, f1

In [175]:
'''
AUTOGRADER CELL. DO NOT MODIFY THIS.
'''
pred, truth = eval_model(model, test_loader, device=device)
auroc, f1 = evaluate_predictions(truth, pred)
print(f"AUROC={auroc} and F1={f1}")

assert auroc > 0.8 and f1 > 0.7, "Performance is too low {}. Something's probably off.".format((auroc, f1))

AUROC=0.9161640798226164 and F1=0.8397291196388261
