Scripts for "HPiP: an R/Bioconductor package for predicting hostpathogen protein-protein interactions from protein sequences using an ensemble machine learning"
Despite arduous and time-consuming experimental efforts, protein-protein interactions (PPIs) for many pathogenic microbes with their human host are still unknown, limiting our ability to understand the intricate interactions during infection and the identification of therapeutic targets. Since computational tools offer a promising alternative, we developed an R/Bioconductor package, HPiP (Host-Pathogen Interaction Prediction) software toolkit with a series of amino acid sequence property descriptors and an ensemble machine-learning (ML) classifiers to define the yet unmapped interactions between pathogen and host proteins.
This package required R version 4.1 or higher. If you are using an older version of R you will be prompted to upgrade when installing the package.
The official release of HPiP is available on Bioconductor. You can install the HPiP
from bioconductor using:
if(!requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
BiocManager::install("HPiP")
To install the development version in R
, run:
if(!requireNamespace("devtools", quietly = TRUE)) {
install.packages("devtools")
}
devtools::install_github("mrbakhsh/HPiP")
-
The main page: https://github.com/BabuLab-UofR/HPiP
-
Vignette
browseVignettes("HPiP")
-
Trainingset_priotFC.csv
includes Sars-CoV-1-human PPIs. To construct this set, positive set was retrieved from Gordon, Hiatt, et al., 2020. Negative sampling was used to construct negative PPIs from the positive ones using theget_negativePPI
function.Both sets were mixed to construct labelled training set. -
Testset_priotFC.csv
includes Sars-CoV-2-human PPIs. This set contains high-confidence PPI interactions between SARS-CoV-2 and human proteins were extracted from Gordon, Hiatt, et al., 2020. Negative sampling was then used to construct negative PPIs from the positive ones using theget_negativePPI
function. Both sets were mixed to construct test set.
Trainingset_priotFC.csv
includes Sars-CoV-2-human PPIs. To construct this set, all the postive interactions including CoV-2-human PPIs deposited in BioGRID was retrived usingget_postivePPI
function provided in the HPiP package. Negative sampling was used to construct negative PPIs from the positive ones. Both sets were then mixed to construct labelled training set. Only (20%) of the data was used for model construction. This set was further split into a training set (70%) for model training and a test set (30%) for performance assessment of the classifiers.
Trainingset_priotFC.csv
includes Mtb-human PPIs. To construct this set, all the postive interactions including Mtb-human PPIs was retrived from Penn, Bennett H, et al., 2018, while negative instanses were constructed usig negative sampling. Both sets were mixed to construct labelled training set. This set was further split into a training set (70%) for model training and a test set (30%) for performance assessment of the classifiers.
Rahmatbakhsh,M. et al. (2022) HPiP: an R/Bioconductor package for predicting host–pathogen protein–protein interactions from protein sequences using ensemble machine learning approach. Bioinforma. Adv., 2, vbac038.