Frame2seq

Official repository for Frame2seq, a structured-conditioned masked language model for protein sequence design, as described in our preprint Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space.

Colab notebook

Colab notebook for generating sequences with Frame2seq:

Setup

To use Frame2seq, install via pip:

pip install frame2seq

If previously installed via pip, upgrade to the latest version:

pip install --upgrade frame2seq

Usage

Sequence design

To use Frame2seq to generate sequences, you can use the design function.

from frame2seq import Frame2seqRunner

runner = Frame2seqRunner()
runner.design(pdb_file, chain_id, temperature, num_samples)

Arguments

pdb_file: Path to PDB file.
chain_id: Chain ID of protein.
temperature: Sampling temperature.
num_samples: Number of sequences to sample.
omit_AA: Amino acids to omit from sampling. Pass a list of single-letter amino acid strings (e.g. ['C', 'M']). Default None.
fixed_positions: Amino acid positions to fix during sampling. Pass a list of integers (e.g. [1, 3, 11]). Residue numbering starts from 1. Default None.
save_indiv_seqs: Whether to save sequences to indidual .fasta files. Default False.
save_indiv_neg_pll: Whether to save the per-residue negative pseudo-log-likelihoods of the sampled sequences. Default False.
verbose: Whether to print the sampled sequences and time taken for sampling. Default True.

Outputs

A .fasta file containing all sampled sequences is automatically saved. If save_indiv_seqs is True, individual .fasta files for each sampled sequence are also saved.

>pdbid=2fra chain_id=A recovery=62.67% score=0.83 temperature=1.0
PPSSVDWRDLGCITDVLDMGGCGACWAFSAVGALEARTTQKTGELTRLSAQDLVDCAREKYGNEGCDGGRMKSSFQFIIDKNGIDSHQAYPFTASDQECLYNSKYKAATCTDYTVLPEGDEDKLREAVSNVGPVAVGIDATHPEFRNFKSGVYHDPKCTTETNHGVLVVGYGTLKGKRFYKVKTCWGTYFGEDGFIRVAKNQGNHCGISTDPSYPEM

If save_indiv_neg_pll is True, a .csv file containing the per-residue negative pseudo-log-likelihoods of the sampled sequences is also saved.

Advanced sequence design

To use Frame2seq to generate sequences with advanced options, you can use the design function with additional arguments.

from frame2seq import Frame2seqRunner

runner = Frame2seqRunner()
runner.design(pdb_file, chain_id, temperature, num_samples, omit_AA=['C'], fixed_positions=[1, 3, 11])

Scoring

To use Frame2seq to score sequences, you can use the score function.

The following will score the PDB sequence for the PDB backbone.

from frame2seq import Frame2seqRunner

runner = Frame2seqRunner()
runner.score(pdb_file, chain_id)

The following will score all sequences in the given .fasta file for the PDB backbone.

from frame2seq import Frame2seqRunner

runner = Frame2seqRunner()
runner.score(pdb_file, chain_id, fasta_file)

Arguments

pdb_file: Path to PDB file.
chain_id: Chain ID of protein.
fasta_file: Path to .fasta file containing sequences to score. Default None. If None, will score the PDB sequence.
save_indiv_neg_pll: Whether to save the per-residue negative pseudo-log-likelihoods of the given sequence(s). Default False.
verbose: Whether to print time taken for scoring. Default True.

Outputs

A .csv file containing the average negative pseudo-log-likelihoods of the given sequence(s) is automatically saved. If save_indiv_neg_pll is True, per-residue negative pseudo-log-likelihoods are also saved in individual .csv files.

Citing this work

@article{akpinaroglu2023structure,
  title={Structure-conditioned masked language models for protein sequence design generalize beyond the native sequence space},
  author={Akpinaroglu, Deniz and Seki, Kosuke and Guo, Amy and Zhu, Eleanor and Kelly, Mark JS and Kortemme, Tanja},
  journal={bioRxiv},
  pages={2023--12},
  year={2023},
  publisher={Cold Spring Harbor Laboratory}
}

zenodo

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
frame2seq		frame2seq
.gitignore		.gitignore
Frame2seq.ipynb		Frame2seq.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github

.github

frame2seq

frame2seq

.gitignore

.gitignore

Frame2seq.ipynb

Frame2seq.ipynb

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

setup.cfg

setup.cfg

Repository files navigation

Frame2seq

Colab notebook

Setup

Usage

Sequence design

Arguments

Outputs

Advanced sequence design

Scoring

Arguments

Outputs

Citing this work

About

Releases

Packages

Languages

License

dakpinaroglu/Frame2seq

Folders and files

Latest commit

History

Repository files navigation

Frame2seq

Colab notebook

Setup

Usage

Sequence design

Arguments

Outputs

Advanced sequence design

Scoring

Arguments

Outputs

Citing this work

About

Resources

License

Stars

Watchers

Forks

Languages