# PROTAC-STAN Demo
- This is a code demo of PROTAC-STAN for PROTAC degradation prediction. It takes about 5 minutes to run the whole pipeline.
- [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/PROTACs/PROTAC-STAN/blob/main/demo.ipynb) (click Runtime → Run all (Ctrl+F9)

## Setup
In this step, we setup the notebook environment and import required modules.

In [1]:
if 'google.colab' in str(get_ipython()):
    print('Running on CoLab')
    !pip install torch_geometric==2.5.1
    !pip install rdkit==2023.9.2
    !git clone https://github.com/PROTACs/PROTAC-STAN
    %cd PROTAC-STAN
else:
    print('Not running on CoLab')

Not running on CoLab


In [2]:
import toml
import torch

import wandb
from data_loader import PROTACLoader, collate_fn
from model import PROTAC_STAN
import warnings

warnings.filterwarnings('ignore')



## Configuration
In this step, we configure the running settings and model settings.

[`wandb`](https://wandb.ai/) is the AI developer platform to train and fine-tune your AI models and develop your AI applications with confidence. Here, we set `mode="disabled"` for convenience.

In [27]:
from main import setup_seed
from pprint import pprint

cfg = toml.load('config_demo.toml')
model_cfg = cfg['model']
train_cfg = cfg['train']

setup_seed(model_cfg['seed'])
# mode代表wandb的运行模式；如果你想要在你的账号上看到运行结果，可以将其设置为"online"
wandb.init(
    mode="online",
    project='protac-stan',
    config=cfg,
    group=f'run_bz{train_cfg["batch_size"]}_lr{train_cfg["learning_rate"]}',
)


pprint(cfg)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Running on:', device)

{'model': {'clf': {'class': 2, 'embed': 192, 'hidden': 64},
           'desc': 'model parameters',
           'protac': {'edge_dim': 3,
                      'embed': 64,
                      'feature': 146,
                      'hidden': 128},
           'protein': {'embed': 1280, 'hidden': 128, 'out_dim': 64},
           'seed': 21332,
           'tan': {'heads': 2, 'in_dims': [1, 1, 1]},
           'type': 'PROTAC-STAN-Demo'},
 'train': {'batch_size': 4,
           'desc': 'train parameters',
           'learning_rate': 0.0005,
           'num_epochs': 5,
           'train_ratio': 0.8}}
Running on: cuda


## DataLoader
In this step, we specifiy train/test dataloader. The demo dataset are stored in `data/demo`.

In [4]:
# import pandas as pd

# df = pd.read_csv('data/PROTAC-fine/protac-fine.csv')
# df = df.sample(100, random_state=47) # sample 100 for demo
# df.to_csv('data/demo/demo.csv', index=False)

In [24]:
train_loader, test_loader = PROTACLoader(
    root='data/protacdb3', 
    name='protac_fine_with_e3uniprot',
    batch_size=train_cfg['batch_size'], 
    collate_fn=collate_fn, 
    train_ratio=train_cfg['train_ratio']
)

Cleaned Dataset: 
Total size:  3200
Train size:  2560
Test size:  640
Dropped overlapping:
Train size:  2560
Test size:  432


## Building Model
In this step, we set up our model with configurations.

In [25]:
model = PROTAC_STAN(model_cfg)
print(model)

PROTAC_STAN(
  (protac_encoder): MolecularEncoder(
    (lin): Linear(in_features=146, out_features=64, bias=True)
    (bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (conv1): EdgedGCNConv(
    	(node_lin): Linear(in_features=64, out_features=128, bias=False)
    	(edge_lin): Linear(in_features=3, out_features=128, bias=False)
    )
    (conv2): EdgedGCNConv(
    	(node_lin): Linear(in_features=128, out_features=64, bias=False)
    	(edge_lin): Linear(in_features=3, out_features=64, bias=False)
    )
  )
  (e3_ligase_encoder): ProteinEncoder(
    (adapter): Linear(in_features=1280, out_features=128, bias=True)
    (fc): Linear(in_features=128, out_features=64, bias=True)
  )
  (poi_encoder): ProteinEncoder(
    (adapter): Linear(in_features=1280, out_features=128, bias=True)
    (fc): Linear(in_features=128, out_features=64, bias=True)
  )
  (tan): TAN(
    (x_net): FCNet(
      (fcnet): Sequential(
        (0): Dropout(p=0.2, inplace=False)
   

## Training and Testing

In [28]:
from main import train, test

model = train(
    model, train_loader, test_loader, device, 
    lr=train_cfg['learning_rate'], 
    num_epochs=train_cfg['num_epochs'], 
)

torch.save(model.state_dict(), f'./demo_model_state_dict.pt') # save model state_dict
wandb.finish()

# Expected Output are as follows:

Epoch: 1/5, train loss: 0.536
Best model updated with roc_auc=0.5242!
Test Accuracy: 79.40 %
Test Loss: 0.4604
Test ROC AUC: 0.5242
Test F1 Score: 0.1010
Epoch: 2/5, train loss: 0.510
Best model updated with roc_auc=0.5725!
Test Accuracy: 76.39 %
Test Loss: 0.5015
Test ROC AUC: 0.5725
Test F1 Score: 0.3014
Epoch: 3/5, train loss: 0.505
Test Accuracy: 78.70 %
Test Loss: 0.4570
Test ROC AUC: 0.5595
Test F1 Score: 0.2459
Epoch: 4/5, train loss: 0.488
Best model updated with roc_auc=0.5768!
Test Accuracy: 78.94 %
Test Loss: 0.4610
Test ROC AUC: 0.5768
Test F1 Score: 0.2946
Epoch: 5/5, train loss: 0.475
Best model updated with roc_auc=0.5926!
Test Accuracy: 78.94 %
Test Loss: 0.4334
Test ROC AUC: 0.5926
Test F1 Score: 0.3358


0,1
test/accuracy,█▁▆▇▇
test/epoch,▁▃▅▆█
test/f1_score,▁▇▅▇█
test/loss,▄█▃▄▁
test/roc_auc,▁▆▅▆█
train/epoch,▁▃▅▆█
train/loss,█▅▅▃▁

0,1
test/accuracy,0.78935
test/epoch,5.0
test/f1_score,0.33577
test/loss,0.43335
test/roc_auc,0.59265
train/epoch,5.0
train/loss,0.47451
