In [334]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [335]:
import warnings
warnings.filterwarnings("ignore")

In [336]:
import os
import numpy as np
import torch
import torch.nn as nn
import pytorch_lightning as pl
from pytorch_lightning import seed_everything
seed_everything(seed=7)

Global seed set to 7


7

# Full pipeline for a new WeaSEL problem
**This tutorial walks you through how to
make Weasel run on *your own***:
 - ***Data***
 - ***Set of Labeling functions***
 - ***End-model***

by completing the full pipeline on a synthetic example.

# Simulating the data...
In this tutorial, the data and LFs have no semantics attached, and are completely synthetic.
This notebook is meant as an illustration for how you would go about using Weasel
on a new problem.

Therefore we will just create random data features. We will however assume that the data are image-like, and thus
call for a custom end-model (a CNN).

**Note:**
***For sake of clarity we will make most important arguments/parameters that
need/can be defined explicit. In practice though, you will likely want to make an analogous Yaml config
(e.g. with [this template](../configs/template.yaml)) for your
problem like [this one](configs/profTeacher_full.yaml) and use Hydra to read & modify it + instantiate
all modules like in [this example notebook](1_bias_bios.ipynb).***

In [337]:
n, n_evaluation = 10_000, 1_000  # number of training and test samples
n_channels = 3  #  e.g. could be RGB
height = width = 28  # grid resolution

X_train = np.random.randn(n, n_channels, height, width)
X_test = np.random.randn(n_evaluation, n_channels, height, width)

As this is a WeaSEL problem, we ***do not know any ground truth training labels***.
For evaluation purposes we will usually want to have access to a small gold-labeled test set.
To simulate this part of the pipeline we will also generate such labels,
assuming that there are $C=3$ classes.
Note though, that this is not needed to train Weasel.

In [338]:
C = 3
possible_labels = list(range(C))
Y_test = np.random.choice(possible_labels, size=n_evaluation)

As in the whole library, we assume that you have a label matrix $L \in \{-1, 0, .., C-1\}^{n\times m}$
available. Here:
 - $n$ is the number of training samples
 - $m$ is the number of labeling functions (LF)/heuristics/rules
 - $C$ is the number of classes
 - $L_{i,j} = -1$ means that LF $j$ abstained from labeling example $i$.

If your problem is not yet at this stage, e.g. no LFs have been defined or applied to the training set,
you'll have to start with that. The [Snorkel library](https://github.com/snorkel-team/snorkel)
is a neat library for this step of the pipeline.

Now, we will create 10 synthetic LF (without semantics), assuming
that all LFs abstain 85% of the time, while voting for one of the three classes uniformly at random.

Of course, in a real setting LFs will depend on the data and most likely not be independent of each other as below.

In [339]:
m = 10
ABSTAIN = -1

possible_LF_outputs = [ABSTAIN] + list(range(C))
label_matrix = np.empty((n, m))
for LF in range(m):
    label_matrix[:, LF] = np.random.choice(
        possible_LF_outputs, size=n, p=[0.85] + [(1 - 0.85)*1/C for _ in range(C)]
    )

label_matrix.shape

(10000, 10)

# From data to DataModule
Having checked off the raw data components, we now have to
map them to a format usable by a pl.Trainer. We recommend to either subclass our
[AbstractWeaselDataModule](../weasel/datamodules/base_datamodule.py) (a specific ``pl.LightningDataModule``
suitable for training Weasel, see [ProfTeacher_DataModule](datamodules/ProfTeacher_datamodule.py)),
 or simply passing the raw components to BasicWeaselDataModule as below.

In [340]:
from weasel.datamodules.base_datamodule import BasicWeaselDataModule
weasel_datamodule = BasicWeaselDataModule(
    label_matrix=label_matrix,
    X_train=X_train,
    X_test=X_test,
    Y_test=Y_test,
    batch_size=256,
    val_test_split=(200, 800)  # 200 validation, 800 test points will be split from (X_test, Y_test)
)
        

## Defining an End-model

Having set up the data part, you'll now have to choose your favorite neural net as the end-model <br>
 (the one that you want to use as ``predictions = end-model(X)`` eventually).

Here, since we are simulating image-like features, we will be using a toy CNN.
 To do so, just subclass the ``DownstreamBaseModel`` abstract class like you would any nn.Module (and override any methods as needed) as below.
 You can analogously define any other fancy neural net as the end-model.

 While you only have to override the ``__init__`` and ``forward`` methods, ``DownstreamBaseModel``
 actually is a appropriately defined LightningModule that allows you to easily [run baselines without Weasel](../weasel/models/downstream_models/README.md).


In [355]:
from weasel.models.downstream_models.base_model import DownstreamBaseModel
class MyCNN(DownstreamBaseModel):
    def __init__(self, in_channels,
                 hidden_dim,
                 conv_layers: int,
                 n_classes: int,
                 kernel_size=(3, 3),
                 *args, **kwargs):
        super().__init__()
        # Good practice:
        self.out_dim = n_classes
        self.example_input_array = torch.randn((1, in_channels, height, width))

        cnn_modules = []

        in_dim = in_channels
        for layer in range(conv_layers):
            cnn_modules += [
                nn.Conv2d(in_dim, hidden_dim, kernel_size),
                nn.GELU(),
                nn.MaxPool2d(2, 2)
            ]
            in_dim = hidden_dim

        self.convs = nn.Sequential(*cnn_modules)

        self.flattened_dim = torch.flatten(
            self.convs(self.example_input_array), start_dim=1
        ).shape[1]

        mlp_modules = [
            nn.Linear(self.flattened_dim, int(self.flattened_dim/2)),
            nn.GELU()
        ]
        mlp_modules += [nn.Linear(int(self.flattened_dim/2), n_classes)]
        self.readout = nn.Sequential(*mlp_modules)

    def forward(self, X: torch.Tensor, readout=True):
        conv_out = self.convs(X)
        flattened = torch.flatten(conv_out, start_dim=1)
        if not readout:
            return flattened
        logits = self.readout(flattened)
        return logits # We predict the raw logits in forward!


In [342]:
cnn_end_model = MyCNN(in_channels=n_channels, hidden_dim=16, conv_layers=2, n_classes=C)


# Coupling end-model into Weasel
Now that we have the data and end-model defined, we just need to pass them
to Weasel as follows:

In [343]:
from weasel.models import Weasel
weasel = Weasel(
    end_model=cnn_end_model,
    num_LFs=m,
    n_classes=C,
    encoder={'hidden_dims': [32, 10]},
    optim_encoder={'name': 'adam', 'lr': 1e-4},
    optim_end_model=['_target=torch.optim.Adam', 'lr=1e-4']  # different way of getting the same optim with Hydra
)


## Training Weasel and end-model

Before fitting Weasel and the end-model, we now just need to instantiate a pl.Trainer instance
(we will checkpoint the best model w.r.t. F1-macro performance on a small validation set that is split off the test set, although
this of course makes little sense in this simulated example).

In [344]:
from pytorch_lightning.callbacks import ModelCheckpoint
checkpoint_callback = ModelCheckpoint(monitor="Val/f1_macro", mode="max")

trainer = pl.Trainer(
    gpus=0,  # >= 1 to use GPU(s)
    max_epochs=3,  # since just for illustratory purposes
    logger=False,
    deterministic=True,
    callbacks=[checkpoint_callback]
)

trainer.fit(model=weasel, datamodule=weasel_datamodule)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name          | Type       | Params
---------------------------------------------
0 | end_model     | MyCNN      | 83.6 K
1 | encoder       | MLPEncoder | 1.1 K 
2 | accuracy_func | Softmax    | 0     
---------------------------------------------
84.7 K    Trainable params
0         Non-trainable params
84.7 K    Total params
0.339     Total estimated model params size (MB)
Global seed set to 7


Data split sizes for training, validation, testing: 10000 200 800


Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

## Evaluation

Now that Weasel has finished training, we can evaluate on the held-out test set to see how well Weasel did,
That is, evaluate how good the ``predictions = end-model(X_test)`` are with respect to our gold test labels
*Y_test*.
<br>Note that the LFs, *L*, and Weasel are not needed anymore after training/for prediction.
Indeed, *we will retrieve the best CNN end-model from the saved Weasel checkpoint*, which can now be used
for predictions on the image-like features only.

In [345]:
# The below will give the same test results
# test_stats = trainer.test(datamodule=weasel_datamodule, ckpt_path='best')

final_cnn_model = weasel.load_from_checkpoint(
    trainer.checkpoint_callback.best_model_path
).end_model
# Test the stand-alone, fully-trained CNN model (the metrics have of course no meaning in this simulated example):
test_statd = pl.Trainer().test(model=final_cnn_model, test_dataloaders=weasel_datamodule.test_dataloader())


GPU available: False, used: False
TPU available: False, using: 0 TPU cores


Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'Test/accuracy': 0.35875,
 'Test/brier': 0.33304962510873465,
 'Test/f1_macro': 0.24905422953533293,
 'Test/f1_micro': 0.35875}
--------------------------------------------------------------------------------


##  Using features as auxiliary input to the encoder
Weasel can also use some form of the data features as auxiliary input beyond the LFs for the encoder model.
In our paper we found this to consistently lead to slightly better performances.
We'll now briefly show how to make this happen for our simulated problem and CNN.
For that, we first need to define which features the encoder will use. The default, and currently only supported encoder
is a MLP - when not using auxiliary features it predicts per ``MLP(L)``, while when using additional features
they are concatenated to $L$.
Therefore, we'll have to somehow return flattened/one-dimensional features in the ``get_encoder_features(.)``
method. One option could be to just flatten the input image-like data across all dimensions.
The other option is to return an intermediate representation, here we return the inputs right between
the convolutions and the readout MLP of our CNN:

In [356]:
class MyCNN(MyCNN):
    def get_encoder_features(self, X):
        return self(X, readout=False).detach()

Now we just have to set the ``use_aux_input_for_encoder=True`` flag and the encoder will automatically
include the input ``get_encoder_features(.)`` for prediction.

In [358]:
cnn_end_model2 = MyCNN(in_channels=n_channels, hidden_dim=16, conv_layers=2, n_classes=C)

weasel2 = Weasel(
    end_model=cnn_end_model2,
    num_LFs=m,
    n_classes=C,
    use_aux_input_for_encoder=True,
    encoder={'hidden_dims': [32, 10]},
    optim_encoder={'name': 'adam', 'lr': 1e-4},
    optim_end_model={'name': 'adam', 'lr': 1e-4}
)
trainer = pl.Trainer(
    gpus=0,  # >= 1 to use GPU(s)
    max_epochs=3,  # since just for illustratory purposes
    logger=False,
    deterministic=True,
    callbacks=[checkpoint_callback]
)

trainer.fit(model=weasel2, datamodule=weasel_datamodule)

GPU available: False, used: False
TPU available: False, using: 0 TPU cores

  | Name          | Type       | Params
---------------------------------------------
0 | end_model     | MyCNN      | 83.6 K
1 | encoder       | MLPEncoder | 13.9 K
2 | accuracy_func | Softmax    | 0     
---------------------------------------------
97.5 K    Trainable params
0         Non-trainable params
97.5 K    Total params
0.390     Total estimated model params size (MB)
Global seed set to 7


Validation sanity check: 0it [00:00, ?it/s]

Training: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Validating: 0it [00:00, ?it/s]

Note how the number of parameters in the encoder is way higher than before, since the
input shape now includes the whole flattened hidden CNN representation.
<br>Funnily, it actually improves performance on this simulated example! :D


In [360]:
test_stats = trainer.test(datamodule=weasel_datamodule, ckpt_path='best')


Testing: 0it [00:00, ?it/s]

--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'Test/accuracy': 0.37875,
 'Test/brier': 0.3327239181449149,
 'Test/f1_macro': 0.25437041512531283,
 'Test/f1_micro': 0.37875}
--------------------------------------------------------------------------------
