A framework for specific design of mRNA vaccine targets.
VaxCollapse requires a working Python installation. You can either install the official distribution, install the conda manager (recommended), or use existing system installation (most recent Linux distributions should have a sufficient programming environment).
VaxCollapse have been tested only on Python 3.9, however it may work on different versions as well.
Please follow these steps to set up the pipeline:
-
Obtain the latest version of the package by either:
-
Cloning this repository and
cd
-ing into it.git clone https://github.com/apalkowski/vaxcollapse.git cd ./vaxcollapse
or
-
Downloading contents of this repository, unzipping it, and
cd
-ing into it.wget https://github.com/apalkowski/vaxcollapse/archive/refs/heads/main.zip unzip vaxcollapse-main.zip cd ./vaxcollapse-main
-
-
Install dependencies. It is recommended to create a Python virtual environment or use the conda manager to prevent conflicts with your system's Python environment.
pip install -r requirements.txt
-
Download and install at least one of the inference software listed in the Obtaining Immunological Features section.
At a minimum, VaxCollapse requires three files as input:
- FASTA file with protein sequences.
- FASTA file with coding region sequences (CDS) of the above proteins.
- Text table file describing a specific immunological feature of the above proteins, produced by one of the supported models.
The third option can be extended to several such files, each describing another immunological feature, be it related to a different kind of immune mechanism (e.g., MHC class I antigen presentation or B-cell epitope detection) or to genetic variance (e.g., associated with alternative MHC alleles).
Because of current limitations, VaxCollapse requires specifically tailored sequence files as input. Make sure that those meet the following conditions:
- Sequences represent the same type of a protein.
- Sequences are of the same length (within one file).
- Related protein and CDS sequences must have the same FASTA header.
- CDS and protein sequences need to represent 1:1 nucleotides to amino acids translation.
Hence, you should prepare a same-length protein sequences FASTA file, e.g.:
>sequence_1
MFVFLVLLP
>sequence_2
LKGVKLHYT
>sequence_3
FDEDDSEPV
and a corresponding CDSs FASTA file:
>sequence_1
ATGTTTGTTTTTCTTGTTTTATTGCCA
>sequence_2
CTCAAAGGAGTCAAATTACATTACACA
>sequence_3
TTTGATGAAGACGACTCTGAGCCAGTG
Important note: you should make sure that sequence IDs (FASTA headers) are as short as possible and contain no spaces, because some inference software may shorten them in its results files, which will turn VaxCollapse reporting impossible.
VaxCollapse currently supports protein features inferred by the following models:
- BepiPred v3.0b
- NetMHCpan v4.1b
- NetMHCIIpan v4.3b
It is important to use the same proteins FASTA file (with the same sequence headers) as input to supported models within one analysis pipeline.
Prediction of potential B-cell epitopes from protein sequence.
To read detailed description of the tool, its terms of use, and access online or standalone versions, please refer to the BepiPred server website.
For VaxCollapse use the standalone version, as the online server may produce different results format. VaxCollapse currently supports only linear epitope prediction.
An example command to produce inference results should look like the following:
python bepipred3_CLI.py -i <PROTEINS.FASTA> -o <OUTPUT_DIR> -pred vt_pred -plot_linear_epitope_scores
The input file for BepiPred-included analysis should be raw_output.csv
residing in the output directory <OUTPUT_DIR>
given as an argument. The table within should have the following structure:
Accession | Residue | BepiPred-3.0 score | BepiPred-3.0 linear epitope score |
---|---|---|---|
sequence_1 | M | 0.0239487458020449 | 0.0113448531677326 |
sequence_1 | F | 0.023962065577507 | 0.0128710796642635 |
sequence_1 | V | 0.0242273863404989 | 0.0140518152879344 |
... | ... | ... | ... |
Prediction of pan-specific binding of peptides to MHC class I alleles.
To read detailed description of the tool, its terms of use, and access online or standalone versions, please refer to the NetMHCpan server website.
For VaxCollapse use the standalone version, as the online server may produce different results format. Produce one results file per one allele and one peptide length. VaxCollapse supports only whole-protein-based input.
An example command to produce inference results should look like the following:
./netMHCpan -f <PROTEINS.FASTA> -l <PEPTIDE_LENGTH> -xls -xlsfile <OUTPUT_TABLE.TSV> -a <ALLELE_NAME>
The output file <OUTPUT_TABLE.TSV>
can be a part of NetMHCpan-included analysis as one of its inputs. The table within should have the following structure:
HLA-A01:01 | ||||||||
---|---|---|---|---|---|---|---|---|
Pos | Peptide | ID | core | icore | EL-score | EL_Rank | Ave | NB |
0 | MFVFLVLLP | sequence_1 | MFVFLVLLP | MFVFLVLLP | 0.0001 | 68.3333 | 0.0001 | 0 |
1 | FVFLVLLPL | sequence_1 | FVFLVLLPL | FVFLVLLPL | 0.0002 | 43.9091 | 0.0002 | 0 |
2 | VFLVLLPLV | sequence_1 | VFLVLLPLV | VFLVLLPLV | 0.0002 | 46.25 | 0.0002 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
You may produce and use multiple results files per one proteins set, each for a different supported MHC class I allele.
Prediction of pan-specific binding of peptides to MHC class II alleles.
To read detailed description of the tool, its terms of use, and access online or standalone versions, please refer to the NetMHCIIpan server website.
For VaxCollapse use the standalone version, as the online server may produce different results format. Produce one results file per one allele and one peptide length. VaxCollapse supports only whole-protein-based input.
An example command to produce inference results should look like the following:
./netMHCIIpan -f <PROTEINS.FASTA> -length <PEPTIDE_LENGTH> -inptype 0 -xls -xlsfile <OUTPUT_TABLE.TSV> -a <ALLELE_NAME>
The output file <OUTPUT_TABLE.TSV>
can be a part of NetMHCIIpan-included analysis as one of its inputs. The table within should have the following structure:
DRB1_0301 | |||||||||
---|---|---|---|---|---|---|---|---|---|
Pos | Peptide | ID | Target | Core | Inverted | Score | Rank | Ave | NB |
1 | MFVFLVLLPLVSSQC | sequence_1 | NA | LVLLPLVSS | 0 | 0.000041 | 90.740738 | 0.000041 | 0 |
2 | FVFLVLLPLVSSQCV | sequence_1 | NA | LVLLPLVSS | 0 | 0.000059 | 88.25 | 0.000059 | 0 |
3 | VFLVLLPLVSSQCVN | sequence_1 | NA | LLPLVSSQC | 0 | 0.00008 | 85.625 | 0.00008 | 0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
You may produce and use multiple results files per one proteins set, each for a different supported MHC class II allele.
- Add support for more immunological models.
- Add support for different-length sequences (alignment?).
- Change the core model to a network-based.
If you use VaxCollapse in a scientific publication, please cite:
@Article{VaxCollapseX,
author = {Palkowski, Aleksander},
title = {{VaxCollapse: A framework for specific design of mRNA vaccine targets}},
journal = {X},
year = {X},
volume = {X},
number = {X},
pages = {X--X},
doi = {X}
}
VaxCollapse uses and/or references the following external libraries, packages, and other software:
We wish to thank all their contributors and maintainers!
Copyright 2023 Aleksander Pałkowski.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The software, libraries, or code from third parties mentioned in the Acknowledgements section above may come with their own terms and conditions or licensing requirements. When using this third-party software, libraries, or code, it's essential to adhere to these terms. Ensure you understand and can follow any relevant restrictions or terms and conditions prior to using them.