# 🔬 Peptide Fragmentation + ADMET-AI Calculations

This notebook demonstrates a streamlined workflow for **fragmenting peptides** using RDKit and running **ADMET profiling** with [Neurosnap](https://neurosnap.ai)’s API suite. Starting from an amino acid sequence, it generates **optimized fragments** under customizable constraints and evaluates them using the **ADMET-AI** model.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NeurosnapInc/neurosnap/blob/main/example_notebooks/peptide_fragmentation_admet.ipynb)

---

## 📝 Instructions

1. **Set up your environment**
   Ensure your Python environment includes the required packages listed below.

2. **Get your API Key**
   Generate a secure API key at: [neurosnap.ai/overview?view=api](https://neurosnap.ai/overview?view=api)
   > ⚠️ **Important:** Never share your API key with untrusted notebooks or third parties.
   >
   > A typical key looks like this:
   > **fd9f9be1974aa6f1609430c1e477926d4884188d5f752b5071457e10440b816773b92c0f1116442e54364734fd141537fcb6ce1619ad6825679f86511f38a80e**

3. **Input an amino acid sequence for your peptide**
   This peptide will be fragmented using a customizable sliding window.

4. **Configure other settings**

5. **Run the notebook**

6. **Review the results**
   Outputs are formatted for downstream use, with sequence names, ranks, and scores.

7. **Cleanup (Optional)**
   You may delete your API key after the run.

---

## 📦 Dependencies

Run the **🔧 Install Dependencies** cell or manually install using pip:
```bash
pip install git+https://github.com/NeurosnapInc/neurosnap.git
```

---

## 👏 Credits

Written by Keaun Amani

In [None]:
# @title 🔧 Install Dependencies
# @markdown Run this code cell to install all the dependencies for this notebook.
# @markdown **NOTE:** This cell only needs to be executed once.
import os

os.system("pip install git+https://github.com/NeurosnapInc/neurosnap.git")

In [None]:
# @title 🔧 Configure Notebook
# @markdown Set your inputs and preferences below, then **run this cell** to initialize the notebook.
# @markdown
# @markdown After running, several  fragments for your input amino acid sequence will be generated.
# @markdown
# @markdown ---
import json

import pandas as pd
from rdkit import Chem
from rdkit.Chem import AllChem

from neurosnap.api import NeurosnapAPI
from neurosnap.log import logger

# @markdown ### Notebook Settings
# @markdown
API_KEY = ""  # @param {type:"string", placeholder:"Enter your neurosnap API key"}
api = NeurosnapAPI(api_key=API_KEY)

# @markdown ---
# @markdown ### Input Settings
PEPTIDE_SEQ = ""  # @param {type:"string", placeholder:"Peptide amino acid sequence to fragment (e.g., ACDEFGHIKLMNPQRSTVWY)"}
PEPTIDE_WINDOW_SIZE = 4  # @param {type:"integer", placeholder:"Size of the sliding window in terms of number of amino acids"}


def sliding_window_fragments(peptide_sequence: str, window_size: int = 4):
  """
  Simple sliding window fragmentation on a linear peptide sequence (1-letter codes).
  Converts each fragment to SMILES.
  """
  fragments = {}
  for i in range(len(peptide_sequence) - window_size + 1):
    subseq = peptide_sequence[i : i + window_size]
    frag = Chem.MolFromFASTA(subseq)
    AllChem.Compute2DCoords(frag)
    smiles = Chem.MolToSmiles(frag)
    fragments[subseq] = smiles
  return fragments


# Example usage:
frags = sliding_window_fragments(PEPTIDE_SEQ, window_size=PEPTIDE_WINDOW_SIZE)

logger.info(
  f"NOTE: The following {len(frags)} peptide fragments of length {PEPTIDE_WINDOW_SIZE} residues are going to be used for downstream ADMET-AI analysis."
)
logger.info("Generated sliding window fragments (SMILES):")
for i, (subseq, frag) in enumerate(frags.items(), start=1):
  logger.debug(f" - {i:0>4}: {subseq} -> {frag}")


[38;5;226m[-][0m [90m2025-07-19 18:28:05,290[0m ⚠️ NEVER SHARE YOUR API KEY WITH ANY UNAUTHORIZED PARTIES ⚠️ [38;5;204m(api.py:30)[0m
[38;5;295m[*][0m Successfully connected to the Neurosnap API.
 - For information visit https://neurosnap.ai/blog/post/66b00dacec3f2aa9b4be703a
 - For support visit https://neurosnap.ai/support
 - For bug reports visit https://github.com/NeurosnapInc/neurosnap
[38;5;295m[*][0m NOTE: The following 17 peptide fragments of length 4 residues are going to be used for downstream ADMET-AI analysis.
[38;5;295m[*][0m Generated sliding window fragments (SMILES):
[38;5;47m[+][0m  - 0001: ACDE -> C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)O
[38;5;47m[+][0m  - 0002: CDEF -> N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1ccccc1)C(=O)O
[38;5;47m[+][0m  - 0003: DEFG -> N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1ccccc1)C(=O)NCC(=O)O
[38;5;47m[+][0m  - 0004: EFGH -> N[C@@H](CCC(=O)O)C(=O)N[

In [57]:
# @title 🧪 Run ADMET-AI
# @markdown This cell takes the generated peptide fragments and **runs them through ADMET-AI** for rapid profiling.
# @markdown
# @markdown ✅ **When to run this cell**:
# @markdown - Only proceed if you are happy with the generated fragments from previous steps.
# @markdown - The results will be automatically saved as **`results.csv`** in your working directory.
# @markdown
# @markdown ⏳ **Note:** Runtime depends on the number of fragments (typically ~10-30 seconds for small batches).
# @markdown
# @markdown ---


# submit job
fields = {
  "Input Molecules": json.dumps([{"data": frag, "type": "smiles"} for frag in frags.values()]),
}
job_id = api.submit_job("ADMET-AI", fields=fields, note=f"Peptide Fragmentation ADMET Notebook | Calculations for {len(frags)} fragments.")

status = api.wait_job_status(job_id)
assert status == "completed", f"Job with ID {job_id} failed."

# download results file
api.get_job_file(job_id, "out", "results.csv", "results.csv")
df = pd.read_csv("results.csv")

# resolve original subsequences
smiles2subseq = {v: k for k, v in frags.items()}
df.insert(loc=0, column="seq", value=df.molecule.apply(lambda x: smiles2subseq[x]))

# display results to the user
df

[38;5;295m[*][0m ADMET-AI job successfully submitted with ID 687c1c48c2d61a011f7ce5ef.
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (current status: pending)
[38;5;295m[*][0m Waiting for Neurosnap job with ID 687c1c48c2d61a011f7ce5ef (

Unnamed: 0,seq,molecule,molecular_weight,logP,hydrogen_bond_acceptors,hydrogen_bond_donors,Lipinski,QED,stereo_centers,tpsa,...,Caco2_Wang_drugbank_approved_percentile,Clearance_Hepatocyte_AZ_drugbank_approved_percentile,Clearance_Microsome_AZ_drugbank_approved_percentile,Half_Life_Obach_drugbank_approved_percentile,HydrationFreeEnergy_FreeSolv_drugbank_approved_percentile,LD50_Zhu_drugbank_approved_percentile,Lipophilicity_AstraZeneca_drugbank_approved_percentile,PPBR_AZ_drugbank_approved_percentile,Solubility_AqSolDB_drugbank_approved_percentile,VDss_Lombardo_drugbank_approved_percentile
0,ACDE,C[C@H](N)C(=O)N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)...,436.443,-2.8581,8,8,3.0,0.1378,4,225.22,...,1.4734,7.8713,34.2381,27.879,9.6161,16.7119,5.157,16.3242,82.5902,75.4168
1,CDEF,N[C@@H](CS)C(=O)N[C@@H](CC(=O)O)C(=O)N[C@@H](C...,512.541,-1.6353,8,8,2.0,0.1251,4,225.22,...,1.8612,17.6813,48.5847,71.6169,15.8976,31.6014,9.6937,30.3606,72.625,56.8825
2,DEFG,N[C@@H](CC(=O)O)C(=O)N[C@@H](CCC(=O)O)C(=O)N[C...,466.447,-1.9337,7,7,3.0,0.1605,3,225.22,...,2.6755,19.7363,42.1869,57.309,12.3304,33.5789,9.7325,22.373,74.8352,49.1663
3,EFGH,N[C@@H](CCC(=O)O)C(=O)N[C@@H](Cc1ccccc1)C(=O)N...,488.501,-1.4426,7,7,3.0,0.1653,3,216.6,...,1.8612,13.4161,27.3362,35.4013,5.2346,24.0403,5.6611,20.2016,78.2862,77.1229
4,FGHI,CC[C@H](C)[C@H](NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)...,472.546,-0.2613,6,6,3.0,0.2322,4,179.3,...,3.5285,13.6099,38.387,3.1795,8.4917,50.8337,8.8019,26.8321,75.1066,68.2435
5,GHIK,CC[C@H](C)[C@H](NC(=O)[C@H](Cc1c[nH]cn1)NC(=O)...,453.544,-1.375,7,7,3.0,0.1572,4,205.32,...,1.8612,15.6262,17.8364,7.5611,6.0876,38.6972,1.8612,10.5855,82.3963,90.9267
6,HIKL,CC[C@H](C)[C@H](NC(=O)[C@@H](N)Cc1c[nH]cn1)C(=...,509.652,0.0397,7,7,2.0,0.1424,5,205.32,...,2.1326,24.8158,40.5196,4.3815,10.1978,41.1012,6.3978,18.5343,77.627,84.2187
7,IKLM,CC[C@H](C)[C@H](N)C(=O)N[C@@H](CCCCN)C(=O)N[C@...,503.71,0.827,7,6,2.0,0.1489,5,176.64,...,2.4428,23.8852,50.1357,16.7895,22.8771,20.0853,11.6712,24.777,71.2679,35.4789
8,KLMN,CSCC[C@H](NC(=O)[C@H](CC(C)C)NC(=O)[C@@H](N)CC...,504.654,-1.3437,8,7,2.0,0.1096,4,219.73,...,1.3959,19.969,51.3377,7.2509,12.1365,13.4161,9.5386,18.7282,68.166,29.8565
9,LMNP,CSCC[C@H](NC(=O)[C@@H](N)CC(C)C)C(=O)N[C@@H](C...,473.596,-0.9665,7,5,4.0,0.2261,4,184.92,...,3.5673,10.3528,39.7441,4.3815,9.7712,23.3036,9.5386,16.4405,61.1477,42.1481


In [58]:
# @title 🏆 Rank Fragments by Drug-Like Properties
# @markdown This cell ranks fragments based on drug-likeness, solubility, low toxicity, and good absorption characteristics.


# Copy to avoid altering original DataFrame
ranking_df = df.copy()


# Step 1: Normalized Scoring
def normalize(series):
  return (series - series.min()) / (series.max() - series.min() + 1e-8)


ranking_df["QED_score"] = normalize(ranking_df["QED"])
ranking_df["solubility_score"] = normalize(ranking_df["Solubility_AqSolDB"])
ranking_df["permeability_score"] = normalize(ranking_df["PAMPA_NCATS"])
ranking_df["caco2_score"] = normalize(ranking_df["Caco2_Wang"])
ranking_df["hydrophobicity_penalty"] = normalize(ranking_df["logP"]) * -1  # penalize high logP

# Step 2: Weighted Composite Score
ranking_df["composite_score"] = (
  0.3 * ranking_df["QED_score"]
  + 0.3 * ranking_df["solubility_score"]
  + 0.15 * ranking_df["permeability_score"]
  + 0.15 * ranking_df["caco2_score"]
  + 0.1 * ranking_df["hydrophobicity_penalty"]
)

# Step 3: Ranking
ranking_df = ranking_df.sort_values(by="composite_score", ascending=False).reset_index(drop=True)

logger.info(f"Top {min(10, len(ranking_df))} fragments after filtering and ranking:")

# Display top 10 ranked fragments
display(
  ranking_df.head(10)[
    ["molecular_weight", "QED", "Solubility_AqSolDB", "PAMPA_NCATS", "Caco2_Wang", "logP", "AMES", "hERG", "DILI", "ClinTox", "composite_score"]
  ]
)

[38;5;295m[*][0m Top 10 fragments after filtering and ranking:


Unnamed: 0,molecular_weight,QED,Solubility_AqSolDB,PAMPA_NCATS,Caco2_Wang,logP,AMES,hERG,DILI,ClinTox,composite_score
0,472.546,0.2322,-1.597,0.0611,-6.868,-0.2613,0.1951,0.1244,0.2735,0.1828,0.603948
1,453.544,0.1572,-1.0649,0.0414,-7.2156,-1.375,0.2582,0.11,0.1722,0.2202,0.54357
2,436.443,0.1378,-1.0452,0.0128,-7.3668,-2.8581,0.6436,0.0499,0.3706,0.1359,0.498394
3,509.652,0.1424,-1.3993,0.1021,-7.1472,0.0397,0.3149,0.2188,0.1809,0.2026,0.490409
4,488.501,0.1653,-1.3542,0.0149,-7.2291,-1.4426,0.2605,0.1073,0.298,0.2369,0.47881
5,503.71,0.1489,-1.8729,0.1836,-7.1154,0.827,0.3097,0.2298,0.1896,0.1213,0.470956
6,473.596,0.2261,-2.4796,0.0924,-6.838,-0.9665,0.139,0.026,0.2849,0.0928,0.466109
7,466.447,0.1605,-1.6362,0.0264,-7.0679,-1.9337,0.3432,0.0635,0.2354,0.2532,0.451629
8,491.545,0.1768,-1.9505,0.06,-7.2145,-1.3942,0.1168,0.1762,0.4403,0.1928,0.423446
9,512.541,0.1251,-1.7753,0.0285,-7.2164,-1.6353,0.6277,0.1449,0.4807,0.2837,0.346197
