A toolkit for Bayesian Optimization of Antibody Traits (BOAT). The methods naturally extend to other modalities based on sequences of amino acids.
It combines:
- Sequence encodings (PLMs, bio-specific embeddings)
- Bayesian & genetic optimization loops
- Liability filtering
- An interface to wrap models for sequence scoring
- Modular acquisition & model abstractions for rapid experimentation
The goal: enable fast design-iteration cycles with pluggable scoring functions and flexible optimization strategies.
- Bayesian optimization (single & multi-objective) with BoTorch / GPyTorch
- Genetic algorithm framework for sequence-level search
- Encodings: one-hot, physicochemical, PLM-based, etc.
- Liability and developability scoring utilities
- Pluggable scoring interfaces (fake, PLM, Oasis, liabilities)
Requires: Python 3.10 or 3.11 (see pyproject). Poetry is used for dependency management.
-
Install Poetry (if needed).
-
Standard install (core only): poetry install
-
Install with selected extras, e.g.:
poetry install --extras "plms"- Activate virtual environment:
eval $(poetry env activate)and run commands with:
poetry run python ...- boat: Bayesian optimization stack (ablang2, blosum, botorch, gpytorch, scikit-learn)
Install any combination via:
poetry install --extras "<space separated extras>"Example (pseudo) usage sketch:
from boat.bayesopt.mo_loop import MOBayesOptOnSequences
from boat.scoring_function.fake import FakeScoringFunction
loop = MOBayesOptOnSequences(
scoring_functions=[FakeScoringFunction()],
n_init=8,
n_iter=5,
)
loop.run()Replace FakeScoringFunction with real interfaces (PLM, humanness, etc.) as configured.
├── .github/workflows # CI workflows (lint, test, build, publish, docs)
├── Makefile # Common developer shortcuts
├── Dockerfile # Base container recipe
├── README.md
├── data/ # Example data
├── docs/ # MkDocs documentation project
├── pyproject.toml # Poetry configuration & extras
└── boat/
├── data_utils.py # Generic data helpers
├── bayesopt/ # Bayesian optimization components
│ ├── mo_loop.py # Multi-objective loop orchestration
│ ├── acquisition/ # Acquisition strategies & utilities
│ ├── encodings/ # Feature encodings for sequences
│ ├── loop/ # Core loop utilities
│ └── models/ # GP models, kernels, wrappers
├── biologics/ # Domain-specific sequence & liability helpers
├── genetic_algorithm/ # GA operators, optimizers, vocabularies
├── scoring_function/ # Unified scoring interfaces (fake, PLM, Oasis, liabilities)
└── __init__.py
- bayesopt: Acquisition functions, GP kernels, loops for sequential / multi-objective optimization.
- genetic_algorithm: Mutation / crossover / population management for sequence search.
- scoring_function: Abstraction layer to plug different scoring backends uniformly.
- biologics: Sequence manipulation, liabilities and developability heuristics.
- Missing optional features: confirm you installed correct extras.