SpikePro algorithm predicts the fitness of a SARS-CoV-2 strain from the sequence of its spike protein. Given the target sequence in fasta format, the algorithm aligns it to the reference SARS-CoV-2 spike protein (Uniprot P0DTC2), list all mutations with respect to the reference and compute the fitness for each mutations as well as for the overal viral strain. You can find more details on our preprint (Pucci and Rooman, Prediction and evolution of the molecular fitness of SARS-CoV-2 variants: Introducing SpikePro, submitted).
To compile the c++ program type this command:
c++ SpikePro.cpp edlib/src/edlib.cpp CSVparser.cpp -o SpikePro -I edlib/include/ -std=c++11
To run the code and predict the fitness of a viral strain
./SpikePro TEST.fasta go
where TEST.fasta is the sequence of the considered variant of the SARS-CoV-2 spike protein in fasta format.
List of files in this directory
- Structures: All PDB structures used in the main paper (two PDB models for the full spike protein, ACE2-Spike protein complex PDB code 6M0J, 31 Antibody-Spike protein complexes) and list of all RBD epitopes.
- SpikePro.cpp: Main .cpp file
- Edlib, CVParser.cpp, CVParser.hpp, P0DTC2.fasta and PIO_6.csv: Dependencies
- TEST.fasta: Fasta file to use as example input