
# **Antibody Drug Conjugate Tutorial**
Author : [Viren Loka](https://github.com/VirenLoka)

This tutorial is designed to be done in Google colab. If you'd like to open this notebook in colab, you can use the following link.


[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1An5DlWOL7SiJ4u6rFcUeD9rlONM6ralI?usp=sharing)
# **What is an Antibody-Drug Conjugate?**  

**Antibody-Drug Conjugates (ADCs)** are a class of targeted cancer therapies. They combine the target specificity of monoclonal antibodies with the cytotoxic potency of small-molecule drugs, offering a highly selective approach to destroying cancer cells while minimizing damage to healthy tissues.  

They are molecules consisting of:  

### **1. Monoclonal Antibodies (mAb):**  
- Laboratory-produced proteins that are used to target/bind to specific antigen proteins on the surface of cancer cells.  

### **2. Linker:**  
- A chemical structure that connects the antibody to the drug.  
- The linker must meet several requirements beyond just binding the antibody, which will be explored later.  

### **3. Drug (Payload):**  
- A highly potent cytotoxic agent designed to kill cancer cells.  

Antibody-drug conjugates become a very effective combination when you consider the **high specificity** of a monoclonal antibody and the **potent cytotoxicity** of a drug payload.  

![Structure of ADC](assets/ADC_Structure.png)  
Image Credit: https://www.urotoday.com/

---

# **Structure of Antibody**  

Antibodies (Abs) are typically represented as **Y-shaped proteins** that bind to their cognate epitope surfaces with **high specificity and affinity**.  

### **Key Structural Features:**  
- **Composed of:**  
  - Two identical **light chains**  
  - Two identical **heavy chains**  
  - Linked by **disulfide bonds**  
- **Antigen-binding site:**  
  - Formed by the **variable regions**  
  - Contains **hypervariable loops** known as **complementarity-determining regions (CDRs)**  
  - CDRs dictate **specificity and affinity** of the antibody-antigen interaction  

![Antibody Structure](assets/AntiBody_Structure.png)  
Image Credit:  https://www.dianova.com/

---

# **Working Mechanism of an Antibody-Drug Conjugate**  

The working of an ADC can be illustrated in **five key steps**:  

### **1. Target Recognition**  
- The **monoclonal antibody (mAb)** component of the ADC specifically recognizes and binds to **antigens** (usually proteins) expressed on the surface of target cells, often **cancer cells**.  
- The antibody ensures high specificity by only binding to cells that express the **target antigen**.  

### **2. Endocytosis of ADC-Antigen Complex**  
- After binding to the **cell surface antigen**, the **ADC-antigen complex** undergoes **receptor-mediated endocytosis**.  
- **Endocytosis** is the process by which a cell takes in substances from its surroundings by enclosing them in a portion of its plasma membrane.  
- This internalizes the complex into the cell, forming an **early endosome**, which transports it deeper inside the cell.  
- This step allows the drug to enter the cell in a **controlled and selective** manner.  

### **3. Antigen Recovery and Trafficking**  
- The endosome **matures** and traffics the ADC-antigen complex toward the **lysosome**.  
- During this process:  
  - The antigen may be **recycled back** to the **cell membrane**  
  - Or it may be **degraded**  
  - Meanwhile, the ADC itself is processed further inside the cell.  

### **4. Lysosomal Degradation**  
- Within the **lysosome**, the **acidic environment** and **proteolytic enzymes** break down the antibody component of the ADC.  
- This degradation releases the **cytotoxic payload**, often a highly potent chemotherapeutic agent, which is conjugated to the antibody via a **linker**.  

### **5. Release of Payload and Cell Death**  
- Once released, the **cytotoxic drug molecules** (depicted as **red stars** in the image) diffuse into the **cytoplasm** and/or **nucleus**.  
- The **disruption of these processes** triggers the **death of the target cancer cell**.  

![ADC Mechanism](assets/ADC_Flow.png)
Image Credits: https://arxiv.org/pdf/2401.0917

---

# **Role of Payload Toxicity in ADCs**  

- **Payload toxicity** is a **critical factor** in the design and effectiveness of **Antibody-Drug Conjugates (ADCs)**.  
- Since only a **small amount of payload** can be delivered to a cancer cell, its **toxicity is highly important**.  
- However, this extreme potency also raises the risk of **off-target toxicity** if the payload is released prematurely or if the antibody binds to non-cancerous tissues.  

### **Key Considerations for Payload Toxicity:**  
✔ **Potency is essential, but uncontrolled toxicity can be dangerous.**  
✔ **Unstable linkers** or **non-specific antibodies** can cause payload release into **healthy tissues**, leading to:  
  - **Off-target toxicity**  
  - **Adverse side effects**  
✔ Payload toxicity must be **potent enough to kill cancer cells** but **controlled** through:  
  - **Stable linkers**  
  - **Highly specific antibodies**  
✔ **Excessive toxicity** may prevent delivering enough ADC to kill the tumor effectively.  

---

### **Conclusion**  
Antibody-Drug Conjugates (ADCs) offer a powerful **targeted therapy approach** for cancer treatment by combining the specificity of **monoclonal antibodies** with the **cytotoxic potency** of small-molecule drugs. However, careful design considerations—such as **stable linkers**, **specific antibodies**, and **controlled toxicity**—are necessary to maximize their effectiveness and safety.  

---
---


##SETUP
To run DeepCHem within Colab, you'll need to run the following cell of installation commands.

In [None]:
!pip install deepchem
!pip install rdkit

Collecting deepchem
  Downloading deepchem-2.8.0-py3-none-any.whl.metadata (2.0 kB)
Collecting rdkit (from deepchem)
  Downloading rdkit-2024.9.6-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.0 kB)
Downloading deepchem-2.8.0-py3-none-any.whl (1.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.0/1.0 MB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rdkit-2024.9.6-cp311-cp311-manylinux_2_28_x86_64.whl (34.3 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m34.3/34.3 MB[0m [31m12.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rdkit, deepchem
Successfully installed deepchem-2.8.0 rdkit-2024.9.6


Import the packages you'll need.

In [None]:
import deepchem as dc
import torch
import torch.nn as nn
import torch.nn.functional as F
import pickle
import pandas as pd

The below cell will download the ADC related database from dropbox, containing the input features and output, and will be stored in your directory

In [None]:
!wget "https://www.dropbox.com/scl/fi/bi5tcl05i5waih8loloar/ADCDB.xlsx?rlkey=kxzdsjnoxv01auz3afzwddcju&dl=1" -O ADCDB.xlsx
!wget "https://www.dropbox.com/scl/fi/gsoqheepmce1t3wx4esxx/Antigen.pkl?rlkey=xkequ6khj7zuoc4sgjydgnkzy&st=g4npi065&dl=1" -O Antigen.pkl
!wget "https://www.dropbox.com/scl/fi/z5hm5975jpmdq6oyfikl9/Heavy.pkl?rlkey=7co5qvugzpm3bol4ivgck2vrz&st=3lhpy6qn&dl=1" -O Heavy.pkl
!wget "https://www.dropbox.com/scl/fi/juyc5pmaetgfiqaysi910/Light.pkl?rlkey=mecvvdwtf7145k5vu9z9n8523&st=v18ldzsp&dl=1" -O Light.pkl


Download and extract the xlsx file from the link and conver it into a panda DataFrame.

In [None]:
df = pd.read_excel('ADCDB.xlsx')

Check the columns of the dataframe and print first five rows to get a preview.


In [None]:
df.head(5)
df.columns

Index(['index', 'ADC ID', 'ADC Name', 'Antibody Name',
       'Antibody Heavy Chain Sequence', 'Antibody Light Chain Sequence',
       'Antigen Sequence', 'Payload Isosmiles', 'Linker Isosmiles', 'DAR',
       'label（10nm）', 'label（100nm）', 'label（1nm）', 'label（1000nm）',
       'DAR_val'],
      dtype='object')

Access the columns from the dataframe which we will need as inputs and convert them into numpy array.

In [None]:
sml_list1 = df["Payload Isosmiles"].to_numpy()
sml_list2 = df["Linker Isosmiles"].to_numpy()
t1 = df["Antibody Heavy Chain Sequence"].to_numpy()
t2 = df["Antibody Light Chain Sequence"].to_numpy()
t3 = df["Antigen Sequence"].to_numpy()
t4 = df["DAR_val"]

In [None]:
print(sml_list1.shape, sml_list2.shape, t1.shape, t2.shape, t3.shape, t4.shape)

(435,) (435,) (435,) (435,) (435,) (435,)


## **Custom Artificial Neural Network (ANN) Model**

This **CustomANN** model is a simple **feedforward neural network (FNN)** built using PyTorch. It is designed to process **(40, 1280)** shaped input tensors and generate a **40-dimensional output vector**.

### **Model Architecture**
- **Input Shape:** `(40, 1280)`  
- **Flattening Layer:**  
  - The input tensor is reshaped into a **1D vector of size 51200 (40 × 1280)** for processing.  
- **Fully Connected Layers:**  
  - `fc1`: First dense layer (**51200 → 512**)  
  - **ReLU Activation** is applied for non-linearity.  
  - `fc2`: Second dense layer (**512 → 40**)  
- **Output Activation:**  
  - A **Sigmoid function** normalizes the final output values between **0 and 1**.  
- **Averaging Operation:**  
  - The final output is **averaged across the batch dimension** using `x.mean(dim=0)`, producing a **single 40-dimensional output vector**.


In [None]:
class CustomANN(nn.Module):
    def __init__(self, input_size=(40, 1280), hidden_size=512, output_size=40):
        super(CustomANN, self).__init__()

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(40 * 1280, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.flatten(x)

        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = torch.sigmoid(x)
        return x.mean(dim=0)


## **cover_dict: Convert Pickle Dictionary to PyTorch Tensors**

The `cover_dict` function processes a **pickle file containing a dictionary**, converts its values into **PyTorch tensors**, and reassigns numerical indices as new keys.

### **Function Overview**
- **Loads a Pickle File:** Reads a serialized dictionary from the specified `path`.
- **Converts Values to PyTorch Tensors:** Ensures compatibility with PyTorch-based operations.
- **Reassigns Keys:** The original dictionary keys are replaced with **sequential integer indices**.


In [None]:
def cover_dict(path):
    file_path = path
    with open(file_path, 'rb') as file:
        data = pickle.load(file)
    tensor_dict = {key: torch.tensor(value) for key, value in data.items()}
    new_data = {i: value for i, (key, value) in enumerate(tensor_dict.items())}
    return new_data

Load the pickle files and get Dictionary data using `cover_dict()` function

In [None]:
heavy_dict = cover_dict('Heavy.pkl')
light_dict = cover_dict('Light.pkl')
antigen_dict = cover_dict("Antigen.pkl")

## **SMILES to Molecular Fingerprint Encoding**
This script converts **SMILES representations of molecules** into **fixed-size numerical encodings** using **Morgan fingerprints**. These encodings are then stored as **PyTorch tensors** for machine learning applications.

---


### **Steps in the Process**
1. **Convert SMILES to Molecular Representation**  
   - `Chem.MolFromSmiles(smiles)`: Converts a SMILES string into an RDKit molecule object.
2. **Generate Morgan Fingerprints**  
   - `AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1280)`:  
     - Uses **radius = 2** for capturing molecular neighborhoods.  
     - Produces a **fixed-length (1280-bit) fingerprint** for each molecule.  
3. **Store Encodings as Tensors**  
   - Converts the fingerprints into **NumPy arrays**.  
   - Wraps them in **PyTorch tensors** for ML compatibility.  

---

### **Tensor Shape**
- The final tensor, `sml_list1_enc`, has a shape of **(40, 1280)**:
  - `40`: Number of molecules processed.
  - `1280`: Length of the Morgan fingerprint for each molecule.

---

In [None]:
from rdkit import Chem
from rdkit.Chem import AllChem
import numpy as np
from rdkit.Chem import MACCSkeys

def smiles_to_encoding(smiles):
  mol = Chem.MolFromSmiles(smiles)

  morgan_fp = AllChem.GetMorganFingerprintAsBitVect(mol, 2, nBits=1280)
  encoding = np.array(morgan_fp)
  return encoding

sml_list1_enc = []
sml_list2_enc = []
for smile in sml_list1[:40]:
  encoding = smiles_to_encoding(smile)
  sml_list1_enc.append(encoding)

sml_list1_enc = torch.tensor(np.array(sml_list1_enc))
for smile in sml_list2[:40]:
  encoding = smiles_to_encoding(smile)
  sml_list2_enc.append(encoding)

sml_list2_enc = torch.tensor(np.array(sml_list2_enc))

sml_list1_enc.shape



torch.Size([40, 1280])

## **Converting Dictionary Values to NumPy Arrays**


### **Code Breakdown**
1. **Extract Values from Dictionaries**  
   - `[value for key, value in heavy_dict.items()]`:  
     - Iterates through the dictionary and collects **only the values**.
   - Similar steps apply to `light_dict` and `antigen_dict`.

2. **Convert Lists to NumPy Arrays**  
   - `np.array(...)` wraps the extracted values into **NumPy arrays** for numerical computation.



In [None]:
t1 = np.array([value for key,value in heavy_dict.items()])

t2 = np.array([value for key,value in light_dict.items()])
t3 = np.array([value for key,value in antigen_dict.items()])

In [None]:
df.columns

Index(['index', 'ADC ID', 'ADC Name', 'Antibody Name',
       'Antibody Heavy Chain Sequence', 'Antibody Light Chain Sequence',
       'Antigen Sequence', 'Payload Isosmiles', 'Linker Isosmiles', 'DAR',
       'label（10nm）', 'label（100nm）', 'label（1nm）', 'label（1000nm）',
       'DAR_val'],
      dtype='object')

## **Preparing and Reshaping Input Data for Model Training**
This code processes multiple tensors, **concatenates** them, and reshapes the final input tensor for a neural network.

- **Converts arrays to PyTorch tensors** for efficient computation.
- **Ensures consistent tensor shapes** before concatenation.
- **Reshapes final input tensor** into the required format `(6, 40, 1280)`.


In [None]:
sml_list1 = torch.tensor(sml_list1_enc)
sml_list2 = torch.tensor(sml_list2_enc)
t1 = torch.tensor(t1[:40])
t2 = torch.tensor(t2[:40])
t3 = torch.tensor(t3[:40])
t4 = torch.randn(40,1280)
labels = torch.tensor(df['label（100nm）'].to_numpy()[:40])
print(labels.shape)
print(sml_list1.shape, sml_list2.shape, t1.shape, t2.shape, t3.shape, t4.shape)
inputs = torch.cat([sml_list1,sml_list2, t1, t2, t3,t4])
inputs.shape
inputs = inputs.view(6,40,1280)
inputs.shape

torch.Size([40])
torch.Size([40, 1280]) torch.Size([40, 1280]) torch.Size([40, 1280]) torch.Size([40, 1280]) torch.Size([40, 1280]) torch.Size([40, 1280])


  sml_list1 = torch.tensor(sml_list1_enc)
  sml_list2 = torch.tensor(sml_list2_enc)
  t1 = torch.tensor(t1[:40])
  t2 = torch.tensor(t2[:40])
  t3 = torch.tensor(t3[:40])


torch.Size([6, 40, 1280])

In [None]:
model = CustomANN()
optimizer = dc.optim.Adam(model.parameters(),lr = 0.01)

In [None]:
labels

tensor([0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1,
        0, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1])

**Model Training**

In [None]:
model.train()
num_epochs = 10
for epoch in range(num_epochs):
  optimizer.zero_grad()
  outputs = model(inputs)

  loss = F.binary_cross_entropy(outputs, labels.float())
  loss.backward()
  optimizer.step()
  print(f"Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}")

Epoch 1/10, Loss: 0.6924586296081543
Epoch 2/10, Loss: 1.9352261272290302e-15
Epoch 3/10, Loss: 3.386609027082028e-37
Epoch 4/10, Loss: 0.0
Epoch 5/10, Loss: 0.0
Epoch 6/10, Loss: 0.0
Epoch 7/10, Loss: 0.0
Epoch 8/10, Loss: 0.0
Epoch 9/10, Loss: 0.0
Epoch 10/10, Loss: 0.0
