Skip to content

Epitopedia: identifying molecular mimicry of known immune epitopes

License

Notifications You must be signed in to change notification settings

cbalbin-bio/Epitopedia

Repository files navigation

Epitopedia

DOI

Epitopedia Screencap Example

Getting started

The quickest way to start using Epitopedia is by downloading the docker container which contains all the dependencies preinstalled:

git clone https://github.com/cbalbin-bio/Epitopedia.git

docker pull cbalbin/epitopedia

Epitopedia requires the PDB in mmCIF format, EpitopediaDB and EPI-SEQ DB. EpitopediaDB and EPI-SEQ DB can be downloaded here.

To download the entirety of PDB in mmCIF format:

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ ./mmCIF

OR

To download the only the PDB files present in EpitopediaDB (EPI-PDB) you can supply the pdb_id_list.txt to rsync:

rsync -rlpt -v -z --delete --port=33444 --include-from=/path/to/pdb_id_list.txt \
rsync.rcsb.org::ftp_data/structures/divided/mmCIF/ ./mmCIF

To run Epitopedia provide the paths to the various directories discussed below.

The data directory should contain Epitopedia DB (epitopedia.sqlite3) and EPI-SEQ (EPI-SEQ.fasta*) which can be downloaded here.

The mmcif directory should point to the sharded PDB directory in mmCIF format as downloaded above.

NOTE: you may need to unzip the mmCIF directory:

gunzip -r mmCIF

The output directory is where the output files will be written.

Replace the the paths on the left side of the colon with the actual absolute path on your local system. The paths on the right side of the colon are internal and should not be altered.

python3 Epitopedia/docker/run_epitopedia.py \
/Path/to/Output/Dir/ \
/Path/to/PDB/Dir/ \
/Path/to/Data/Dir/ \
--afdb-dir /Path/to/AlphaFold/Dir/ \
--taxid-filter 11118 --PDB-IDS 6VXX_A

NOTE: on some systems you may need to run docker with sudo.

It is recommended to use the flag taxid_filter to prevent the input protein from finding itself or other versions of itself. For example, if we wnted to find mimics of the SARS-CoV-2 spike protien (6VXX) is a SARS-CoV-2 protein we could use a taxid_filter of 11118 to prevent finding mimics in other Coronaviridae. The NCBI Taxonomy Browser will be helpful in determining what taxid to specify.

Epitopedia can run on multiple input structures to represent a conformational ensemble. To do so, simply provide a list of structures in the format PDBID_CHAINID as shown below.

run_epitopedia.py --PDB-IDS 6VXX_A 6VXX_B 6XR8_A 6XR8_B

Epitopedia defaults to a span length of 5, surface accesbility cutoff of 20% surface accesbility span legnth of 3, and no taxa filter, but these parameters can be set using the follow flags:

Flag Description
--span Minimum span length for a hit to progress
--rasa Cutoff for relative accessible surface area
--rasa_span Minimum consecutive accesssible residues to consider a hit a SeqBMM
--taxid_filter taxa filter; example to filter out all Coronaviridae --taxid_filter 11118
--rmsd Max RMSD to still be considered a structural mimic
--view View results from a previous run
--port Port to be used by webserver
--use-afdb Include AFDB in search
--pplddt Minimum protein pLDDT score a structure predicted by alphafold must have to be considered
--mplddt Minimum average local pLDDT score a region predicted by alphafold must have to be considered

Output

Example output files 6VXX_A with a taxid_filt of 11118 as an input can be found here.
Definitions for the output file headers can be found here.

Intermediate Output

Epitopedia will output the following files at various stages of execution:

File Name Description
EPI_SEQ_hits_{pdb_id(s)}.tsv Contains the raw results from the BLAST search of the input structure against EPI-SEQ
EPI_SEQ_span_filt_hits_{pdb_id(s)}.tsv Contains hits with consecutive spans that meet the set minimum span length
EPI_SEQ_span_filt_acc_hits_{pdb_id(s)}.tsv Contains the above spans that contain the minimum span of accessible residues
EPI_PDB_hits_{pdb_id(s)}.tsv" Contains epitope source sequences against EPI_PDB hits
EPI_PDB_fragment_pairs_{pdb_id(s)}.tsv Contains structurally aligned fragment pairs consisting of spans of the input structure aligned against the structural representatives
EPI_PDB_fragment_pairs_{pdb_id(s)}_ranked.tsv Contains the above but ranked from best to worst RMSD

Final Output

Epitopedia will show the best hit per epitope motif if there are redundant source sequences at the final stage of the execution. There results can be viewed in a tsv file (Example) or a more legible HTML file (Example).

Epitopedia database generation

Epitopedia uses IEDB and PDB to generate EpitopediaDB, which is used in the molecular mimicry search.

Generation of the database takes some time (~12 hours). Thus, the EpitopediaDB is provided above.

To create the EpitopediaDB, download IEDB and a mmCIF version of PDB.

Point the container to the approriate paths for the IEDB, PDB (mmCIF format) and a data directory where the databases will be written.

docker run --rm -it \
-v /Path/To/iedb_public.sql:/app/iedb \
-v /Path/to/mmCIF/Dir/:/app/mmcif \
-v /Path/to/Data/Dir/:/app/data \
cbalbin/epitopedia generate_database.py

License

This software is released under the MIT License.

Software and databases used in Epitopedia may be released under various licenses:

Software:

Databases:

Reference

If you use Epitopedia in your work, please cite:

Epitopedia: identifying molecular mimicry of known immune epitopes
Christian Andrew Balbin, Janelle Nunez-Castilla, Jessica Siltberg-Liberles
bioRxiv 2021.08.26.457577; doi: https://doi.org/10.1101/2021.08.26.457577