Skip to content

ciarajudge/EvoCov

Repository files navigation

EvoCov

This is EvoCov, a pipeline designed for analysis of SARS-CoV-2 sequences from GISAID. The pipeline can be run interactively or by default, with a view to using SARS-CoV-2 sequence information to make an evolutionarily aware estimate of efficient epitopes on the spike protein for antibody design. While the code is open source, the data intended for use with the pipeline may only be obtained with express permission from GISAID. If you have any issues making use of the pipeline, or suggestions for how it could be improved, please open an issue or start a discussion!

Installation

Clone the github repository to your machine to use the EvoCov package. Before using, you should check that any python dependencies are installed.

git clone https://github.com/ciarajudge/EvoCov.git
pip install -r requirements.txt

Preparation of site-wise mutation rates using Treecov

The prediction aspect of the pipeline makes use of estimated site-wise mutation rates from analysis of a SARS-CoV-2 phylogenetic tree with baseml, a Phylogenetic Analysis by Maximum Likelihood program. To generate these rates, you must download and compile paml and place it in the ./treecov/ directory, where the path to the baseml executable is ./treecov/paml/bin/baseml. It is important that the folders are named correctly. You must also download a phylogenetic tree on GISAID by clicking Audacity on the platform, and place the file global.tree in the ./treecov/ directory. To run the treecov pipeline to generate the rates, navigate to the treecov directory and use the command:

python treepipe.py /absolute/path/to/GISAID/fasta/file

This initiates the process of iterative sampling and analysis of the phylogenetic tree 100 times, in 10 batches of 10. These batch sizes, or the number of batches, can be adjusted by changing the number of loops in the code in subsampletree.R (for batch size) and treepipe.py (for no. of batches).

Default Usage of Evocov

Navigate to the cloned repository and call the package along with the file paths of your latest GISAID unmasked sequence file and metadata file. This will initiate a default run of the pipeline, including handling of any exceptions or options. This includes the final step of the pipeline where the results are piped to a PDF using R.

python -m evocov /path/to/sequencefile_masked.fa /path/to/metadata.tsv

If you'd like to be notified by text when the pipeline is complete, pass a third argument with a valid mobile number (no plus signs or brackets) for example: 353877910680 where the country code is +353 and the phone number is 0877910680.

python -m evocov /path/to/sequencefile_masked.fa /path/to/metadata.tsv 353877910680

Interactive Usage

Navigate to the cloned repository and call the package using the below command.

python -m evocov

Running the pipeline in this manner will create an interactive session where you will be able to select file names for the output, and give the names of the variants you want included in the analysis. Following epitope scoring you will also be given the option to use R to generate an output PDF with the key findings of the pipeline.

Pipeline Structure

Below is a flowchart outlining the rough pipeline structure. Image

Things to note

  1. If the text function isn't working anymore, contact me at judge.ciara@gmail.com or here on GitHub. The text message is sent using a subscription type service and I would just need to buy a bit more of an allowance.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

About

This is EvoCov, a pipeline designed for analysis of SARS-CoV-2 sequences from GISAID.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published