# PROTAC-STAN Demo
- This is a code demo of PROTAC-STAN for PROTAC degradation prediction. It takes about 3 minutes to run the whole pipeline.

## Setup
In this step, we setup the notebook environment and import required modules.

In [1]:
import toml
import torch

import wandb
from data_loader import PROTACLoader, collate_fn
from model import PROTAC_STAN

## Configuration
In this step, we configure the running settings and model settings.

[`wandb`](https://wandb.ai/) is the AI developer platform to train and fine-tune your AI models and develop your AI applications with confidence. Here, we set `mode="disabled"` for convenience.

In [2]:
from main import setup_seed
from pprint import pprint

cfg = toml.load('config_demo.toml')
model_cfg = cfg['model']
train_cfg = cfg['train']

setup_seed(model_cfg['seed'])

wandb.init(
    mode="disabled",
    project='protac-stan',
    config=cfg,
    group=f'run_bz{train_cfg["batch_size"]}_lr{train_cfg["learning_rate"]}',
)


pprint(cfg)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Running on:', device)

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


{'model': {'clf': {'class': 2, 'embed': 192, 'hidden': 64},
           'desc': 'model parameters',
           'protac': {'edge_dim': 3,
                      'embed': 64,
                      'feature': 146,
                      'hidden': 128},
           'protein': {'embed': 1280, 'hidden': 128, 'out_dim': 64},
           'seed': 21332,
           'tan': {'heads': 2, 'in_dims': [1, 1, 1]},
           'type': 'PROTAC-STAN-Demo'},
 'train': {'batch_size': 4,
           'desc': 'train parameters',
           'learning_rate': 0.0005,
           'num_epochs': 50,
           'train_ratio': 0.8}}
Running on: cuda


## DataLoader
In this step, we specifiy train/test dataloader. The demo dataset are stored in `data/demo`.

In [3]:
import pandas as pd

df = pd.read_csv('data/PROTAC-fine/protac-fine.csv')
df = df.sample(100, random_state=47) # sample 100 for demo
df.to_csv('data/demo/demo.csv', index=False)

In [4]:
train_loader, test_loader = PROTACLoader(
    root='data/demo', 
    name='demo',
    batch_size=train_cfg['batch_size'], 
    collate_fn=collate_fn, 
    train_ratio=train_cfg['train_ratio']
)

Processing...


data/demo/processed/demo
Cleaned Dataset: 
Total size:  100
Train size:  80
Test size:  20
Dropped overlapping:
Train size:  80
Test size:  20


Done!


## Building Model
In this step, we set up our model with configurations.

In [5]:
model = PROTAC_STAN(model_cfg)
print(model)

PROTAC_STAN(
  (protac_encoder): MolecularEncoder(
    (lin): Linear(in_features=146, out_features=64, bias=True)
    (bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): EdgedGCNConv(
    	(node_lin): Linear(in_features=64, out_features=128, bias=False)
    	(edge_lin): Linear(in_features=3, out_features=128, bias=False)
    )
    (conv2): EdgedGCNConv(
    	(node_lin): Linear(in_features=128, out_features=64, bias=False)
    	(edge_lin): Linear(in_features=3, out_features=64, bias=False)
    )
  )
  (e3_ligase_encoder): ProteinEncoder(
    (adapter): Linear(in_features=1280, out_features=128, bias=True)
    (fc): Linear(in_features=128, out_features=64, bias=True)
  )
  (poi_encoder): ProteinEncoder(
    (adapter): Linear(in_features=1280, out_features=128, bias=True)
    (fc): Linear(in_features=128, out_features=64, bias=True)
  )
  (tan): TAN(
    (x_net): FCNet(
      (fcnet): Sequential(
        (0): Dropout(p=0.2, inplace=False)
   



## Training and Testing

In [6]:
from main import train, test

model = train(
    model, train_loader, test_loader, device, 
    lr=train_cfg['learning_rate'], 
    num_epochs=train_cfg['num_epochs'], 
)

torch.save(model.state_dict(), f'./demo_model_state_dict.pt') # save model state_dict
wandb.finish()

# Expected Output are as follows:

Epoch: 1/50, train loss: 0.650
Best model updated with roc_auc=0.5000!
Test Accuracy: 55.00 %
Test Loss: 0.6900
Test ROC AUC: 0.5000
Test F1 Score: 0.0000
Epoch: 2/50, train loss: 0.582
Best model updated with roc_auc=0.5960!
Test Accuracy: 60.00 %
Test Loss: 0.6809
Test ROC AUC: 0.5960
Test F1 Score: 0.5556
Epoch: 3/50, train loss: 0.596
Best model updated with roc_auc=0.6616!
Test Accuracy: 65.00 %
Test Loss: 0.6540
Test ROC AUC: 0.6616
Test F1 Score: 0.6667
Epoch: 4/50, train loss: 0.558
Test Accuracy: 65.00 %
Test Loss: 0.6370
Test ROC AUC: 0.6616
Test F1 Score: 0.6667
Epoch: 5/50, train loss: 0.534
Best model updated with roc_auc=0.7071!
Test Accuracy: 70.00 %
Test Loss: 0.6060
Test ROC AUC: 0.7071
Test F1 Score: 0.7000
Epoch: 6/50, train loss: 0.578
Best model updated with roc_auc=0.7424!
Test Accuracy: 75.00 %
Test Loss: 0.6011
Test ROC AUC: 0.7424
Test F1 Score: 0.7059
Epoch: 7/50, train loss: 0.475
Test Accuracy: 65.00 %
Test Loss: 0.6008
Test ROC AUC: 0.6414
Test F1 Score: 0.