Skip to content

Computational-Machine-Intelligence/SPRINT

Repository files navigation

SPRINT: Sequence-based Immunogenicity Prediction Networks

📄 RECOMB 2025 Supplementary Material: The file appendix_recomb.pdf contains the supplementary materials for our RECOMB submission.

A unified PyTorch-based benchmarking framework for deep learning methods in T-cell receptor (TCR) and peptide-MHC (pMHC) binding prediction.

Table of Contents

Quick Start

Installation

git clone https://github.com/Computational-Machine-Intelligence/SPRINT.git
cd SPRINT
pip install -r requirements.txt

Requirements: Python >= 3.8, PyTorch >= 2.0.0

Set HuggingFace Token (Required)

# Linux/Mac
export HF_TOKEN="your_huggingface_token"

# Windows PowerShell
$env:HF_TOKEN = "your_huggingface_token"

Available Resources

Methods (7)

  • pmtnet: Multi-head attention for pMHC-TCR binding
  • piste: Pre-trained immune system transformer encoder
  • fusionpmt: Fusion model for pMHC-TCR interactions
  • fusionpm: Peptide-MHC binding prediction
  • transphla: Transformer for pHLA binding
  • ergo2: LSTM/Autoencoder TCR specificity model
  • nettcr: CNN-based TCR-pMHC prediction

Datasets (6)

  • pmt: Large-scale pMHC-TCR training set
  • pm: Peptide-MHC binding dataset
  • pt: Peptide-TCR binding dataset
  • allelic_ood: Out-of-distribution test (unseen alleles)
  • modality_ood: Cross-modality test (BA/EL assays)
  • temporal_ood: Temporal test (post-2021 data)

Modes (3)

  • train: Train models from scratch
  • eval: Evaluate models on test data
  • both: Train then evaluate

Usage

1. Evaluate Pre-trained Models

Evaluate a pre-trained model on test data:

# Evaluate on standard dataset
python scripts/run_benchmark.py --method pmtnet --dataset pmt --mode eval --pretrain

# Evaluate on OOD datasets
python scripts/run_benchmark.py --method transphla --dataset temporal_ood --mode eval --pretrain
python scripts/run_benchmark.py --method fusionpm --dataset modality_ood --mode eval --pretrain

What happens:

  • Downloads pre-trained model from HuggingFace (first time only)
  • Loads test data automatically
  • Evaluates and saves results to outputs/<method>/pre_train_results/evaluations/

2. Train Models from Scratch

Train a model on a dataset:

# Basic training
python scripts/run_benchmark.py --method pmtnet --dataset pmt --mode train

# Train with custom config
python scripts/run_benchmark.py --method ergo2 --dataset pt --mode train --config configs/methods/ergo2.yaml

Output: Trained model saved to outputs/<method>/<dataset>_<timestamp>/

3. Train and Evaluate

Train a model and immediately evaluate:

python scripts/run_benchmark.py --method fusionpmt --dataset pmt --mode both

4. List Available Resources

# List all methods
python scripts/run_benchmark.py --list-methods

# List all datasets
python scripts/run_benchmark.py --list-datasets

Command Options

Option Description Example
--method Model to use pmtnet, ergo2, piste
--dataset Dataset to use pmt, temporal_ood
--mode Operation mode train, eval, both
--pretrain Use pre-trained model (eval only) Flag, no value
--device Computing device cuda, cpu, auto
--seed Random seed 42

Citation

If you use SPRINT in your research, please cite:

@software{yin2025sprint,
  title={SPRINT for Benchmarking Sequence-based Immunogenicity Prediction Networks},
  author={Yin, Yujia and Li, Hongzong and Ma, Jiahao and Chen, Weijia and Yu, Yingying and Zhang, Xiaoyuan and Qu, Tianyi and Wu, Xinhong and Li, Junyi and Huang, Jian-Dong and Hu, Ye-Fan and Chen, Yifan},
  year={2025},
  url={https://github.com/Computational-Machine-Intelligence/SPRINT}
}

License

This project is licensed under the MIT License.

Contact

About

PyTorch Library for Biological Sequence

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages