In this notebook, we will :

* See how we can represent a polygon as a tensor 
* Implement a basic Transformer model
* Train our Transformer on randomly generated polygons
* Submit the results for the Cartesius competition

# 0. Setup

## 0.1 Ensure GPU is accessible

If you cannot see the GPU, on the right tab, go to `Settings` and set `Accelerator` to `GPU`.

Also ensure `Internet` is enabled.

In [None]:
!nvidia-smi

## 0.2 Download and install `cartesius` package

You need to set the secret `GH_PAT` to your github PAT, in order to clone the repository.

In [None]:
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
GH_PAT = user_secrets.get_secret("GH_PAT")

!git clone https://{GH_PAT}@github.com/TeamSPWK/cartesius.git
%cd cartesius
!pip install -e . -qqq
!pip install torchtext==0.10 -qqq

## 0.3 Login with `wandb`

We use `wandb` to keep track of experiments and compare runs.

You need to set the secret `WANDB_KEY` in order to login to `wandb`. You can get your `wandb` key by visiting `https://wandb.ai/authorize`.

In [None]:
import wandb
from kaggle_secrets import UserSecretsClient

user_secrets = UserSecretsClient()
WANDB_KEY = user_secrets.get_secret("WANDB_KEY")

wandb.login(key=WANDB_KEY)

# 1. The data

We will be using `cartesius` package to randomly generate polygons, and train our model on this data.

In [None]:
from cartesius.data import PolygonDataset


train_data = PolygonDataset(
    x_range=[-50, 50],          # Range for the center of the polygon (x)
    y_range=[-50, 50],          # Range for the center of the polygon (y)
    avg_radius_range=[1, 10],   # Average radius of the generated polygons. Here it will either generate polygons with average radius 1, or 10
    n_range=[6, 8, 11],         # Number of points in the polygon. here it will either generate polygons with 6, 8 or 11 points
)

Let's check how the generated polygon look like.

In [None]:
import matplotlib.pyplot as plt
from cartesius.utils import print_polygon


def disp(*polygons):
    plt.clf()
    for p in polygons:
      print_polygon(p)
    plt.gca().set_aspect(1)
    plt.axis("off")
    plt.show()


polygon, labels = train_data[0]
disp(polygon)
print(labels)

_Note : You can rerun this cell several time, everytime a different polygon is generated._

# 2. The tokenizer

Polygons are defined by a list of Points, and each point is defined as a XY coordinates.

But we need to define a way to represent this into a **tensor**. We should also ensure several samples can be **batched** into one tensor.

So let's define a `Tokenizer` class that will takes care of that :

In [None]:
from shapely.geometry import Polygon
import torch
from cartesius.tokenizers import Tokenizer

PAD_COORD = (0, 0)

class TransformerTokenizer(Tokenizer):
    """Tokenizer for Transformer model.
    
    This is a basic tokenizer, used with Transformer model. It just uses the coordinates
    of the polygon and pad them appropriately.
    
    Args:
        max_seq_len (int): Maximum sequence length. An exception will be raised if you
            try to tokenize a polygon with more points than this.
    """

    def __init__(self, max_seq_len, *args, **kwargs):  # pylint: disable=unused-argument
        super().__init__()

        self.max_seq_len = max_seq_len

    def tokenize(self, polygons):
        poly_coords = [list(p.boundary.coords) if isinstance(p, Polygon) else list(p.coords) for p in polygons]
        pad_size = max(len(p_coords) for p_coords in poly_coords)

        if pad_size > self.max_seq_len:
            raise RuntimeError(f"Polygons are too big to be tokenized ({pad_size} > {self.max_seq_len})")

        masks = []
        tokens = []
        for p_coords in poly_coords:
            m = [1 if i < len(p_coords) else 0 for i in range(pad_size)]
            p = p_coords + [PAD_COORD for _ in range(pad_size - len(p_coords))]

            masks.append(m)
            tokens.append(p)

        return {
            "polygon": torch.tensor(tokens),
            "mask": torch.tensor(masks, dtype=torch.bool),
        }

# 3. The model : Transformer

Now we just have to define a model that takes as inputs the tensor representing a polygon, and encode it into a polygon representation.

We will use a basic Transformer model for this, and extract the representation of the first token as the polygon representation.

In [None]:
from torch import nn


class Transformer(nn.Module):
    """Basic Transformer implementation for Cartesius.
    
    Args:
        d_model (int): Dimension for the Transformer Encoder Layer.
        max_seq_len (int): Maximum sequence length.
        n_heads (int): Number of attention heads for the Transformer Encoder Layer.
        d_ff (int): Hidden size of the FF network in the Transformer Encoder Layer.
        dropout (float): Dropout for the Transformer Encoder Layer.
        activation (str): Activation function to use in the Transformer Encoder Layer.
        n_layers (int): Number of layers in the Transformer Encoder.
    """

    def __init__(self, d_model, max_seq_len, n_heads, d_ff, dropout, activation, n_layers):
        super().__init__()

        # Embeddings
        self.coord_embeds = nn.Linear(2, d_model, bias=False)
        self.position_embeds = nn.Embedding(max_seq_len, d_model)

        # Transformer encoder
        encoder_layers = nn.TransformerEncoderLayer(d_model=d_model,
                                                    nhead=n_heads,
                                                    dim_feedforward=d_ff,
                                                    dropout=dropout,
                                                    activation=activation,
                                                    batch_first=True)
        self.encoder = nn.TransformerEncoder(encoder_layers, n_layers)

    def forward(self, polygon, mask):
        batch_size, seq_len, _ = polygon.size()
        device = polygon.device

        # Embed polygon's coordinates
        coord_emb = self.coord_embeds(polygon)
        pos_emb = self.position_embeds(torch.arange(seq_len, device=device).repeat((batch_size, 1)))
        emb = coord_emb + pos_emb

        # Encode polygon
        hidden = self.encoder(emb, src_key_padding_mask=~mask)

        # Extract a representation for the whole polygon
        poly_feat = hidden[:, 0, :]
        return poly_feat

# 4. Training

Now that we defined our model, it's time to train it !

We will use the Pytorch-lightning module provided in `cartesius` to train the model.

After training, the model will run on the test set and write the predictions in a file.

In [None]:
# First, let's define our hyperparameters
PROJECT_NAME = "Cartesius"
SAVE_DIR = "results"

SEED = 1234

D_MODEL = 128
MAX_SEQ_LEN = 256
N_HEADS = 8
D_FF = 256
DROPOUT = 0
ACTIVATION = "gelu"
N_LAYERS = 3
TASK_DROPOUT = 0.1

WATCH_MODEL = True
MAX_TIME = "00:12:00:00"   # Maximum 12h of training

GRAD_CLIP = 40
AUTO_LR_FIND = True
LR = 3e-4

In [None]:
import pytorch_lightning as pl

# Set seed
pl.seed_everything(SEED, workers=True)

In [None]:
# Define our tokenizer + model
tokenizer = TransformerTokenizer(max_seq_len=MAX_SEQ_LEN)

encoder = Transformer(
    d_model=D_MODEL,
    max_seq_len=MAX_SEQ_LEN,
    n_heads=N_HEADS,
    d_ff=D_FF,
    dropout=DROPOUT,
    activation=ACTIVATION,
    n_layers=N_LAYERS,
)

In [None]:
from cartesius.data import PolygonDataModule
from cartesius import PolygonEncoder
from cartesius.tasks import TASKS

# Create the tasks we will train our model on
tasks = {n: t(d_model=D_MODEL, task_dropout=TASK_DROPOUT) for n, t in TASKS.items()}

# Create the PL modules for training
model = PolygonEncoder(tasks, encoder, lr=LR)
data = PolygonDataModule(tasks, tokenizer)

In [None]:
from pytorch_lightning.callbacks.early_stopping import EarlyStopping
from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint

# Create the trainer
wandb_logger = pl.loggers.WandbLogger(project=PROJECT_NAME, config={
    "seed": SEED,
    "d_model": D_MODEL,
    "max_seq_len": MAX_SEQ_LEN,
    "n_heads": N_HEADS,
    "d_ff": D_FF,
    "dropout": DROPOUT,
    "activation": ACTIVATION,
    "n_layers": N_LAYERS,
    "task_dropout": TASK_DROPOUT,
    "grad_clip": GRAD_CLIP,
    "auto_lr_find": AUTO_LR_FIND,
    "lr": LR,
})
if WATCH_MODEL:
    wandb_logger.watch(model, log="all")
mc = ModelCheckpoint(monitor="val_loss", mode="min", filename="{step}-{val_loss:.4f}")
trainer = pl.Trainer(
    gpus=1,
    logger=wandb_logger,
    callbacks=[EarlyStopping(monitor="val_loss", mode="min", verbose=True), mc],
    gradient_clip_val=GRAD_CLIP,
    max_time=MAX_TIME,
    auto_lr_find=AUTO_LR_FIND,
    default_root_dir=SAVE_DIR,
    num_sanity_val_steps=-1,
)

In [None]:
# Train !
trainer.tune(model, datamodule=data)
trainer.fit(model, datamodule=data)

# 5. Testing & Submission

In [None]:
# Test !
ckpt = mc.best_model_path
_ = trainer.test(model, datamodule=data, ckpt_path=ckpt)

In [None]:
!cp submission.csv ../