De novo antimicrobial peptide sequence generation with a recurrent neural network
- Python 3.6
- PyTorch 1.7.1
- Numpy
- Pandas
- Biopython
- Create a new
conda
environment:
conda create -n ampd-up python=3.6
- Activate the environment:
conda activate ampd-up
- Install AMPd-Up in the environment:
conda install -c bioconda ampd-up
AMPd-Up
can now be run. See usage information below.
- To deactivate an active environment, use:
conda deactivate
The training set (antibacterial sequences only) and known AMP sequences for sequence novelty analysis are stored in the data
folder.
- Training set:
APD3_ABP_20190320.fa
- Known AMP sequences:
APD3_20220711.fa
+DADP_mature_AMP_20181206.fa
The 1,000 model instances used to generate the peptide sequences presented in the publication can be accessed through the Zenodo repository. Users can either choose to use the pre-trained models or train their own models for sequence generation.
Usage: AMPd-Up [-h] [-fm FROM_MODEL] -n NUM_SEQ [-sm SAVE_MODEL] [-od OUT_DIR] [-of {fasta,tsv}]
optional arguments:
-h, --help Show this help message and exit
-fm FROM_MODEL, --from_model FROM_MODEL
Directory of the existing models; only specify this
argument if you want to sample from existing models
(optional)
-n NUM_SEQ, --num_seq NUM_SEQ
Number of sequences to sample
-sm SAVE_MODEL, --save_model SAVE_MODEL
Prefix of the models if you want to save them; only
specify this argument if you want to sample by
training new models (optional)
-od OUT_DIR, --out_dir OUT_DIR
Output directory (optional)
-of {fasta,tsv}, --out_format {fasta,tsv}
Output format, fasta or tsv (tsv by default, optional)
Examples:
- Sample sequences by training new models:
AMPd-Up -n 100
- Sample sequences from existing models:
AMPd-Up -fm ../models/ -n 100
Chenkai Li (cli@bcgsc.ca)
If you have any questions, comments, or would like to report a bug, please file a Github issue or contact us.
If you use AMPd-Up in your work, please cite our publication:
Li, C., Sutherland, D., Richter, A. et al. De novo synthetic antimicrobial peptide design with a recurrent neural network. Protein Science 33, e5088 (2024). https://doi.org/10.1002/pro.5088