In [1]:
"""
The purpose of this Jupyter notebook is to evaluate the performance of
xCAPT5 without XGBoost, which serves as one of the three published
benchmark models.
"""

'\nThe purpose of this Jupyter notebook is to evaluate the performance of\nxCAPT5 without XGBoost, which serves as one of the three published\nbenchmark models.\n'

In [2]:
import sys
# Jupyter notebooks exhibit some peculiarities; one of them is that the
# present/current working directory is always the directory that was
# opened is VS Code; another one is that importing from directories
# above the Jupyter notebook's one fails, be it via a direct or a
# relative import
# Therefore, it is necessary to explicitly add the directory the desired
# code is located in to path (this step is necessary in order to import
# functionalities housed by `evaluation_utils.py`)
sys.path.append("../..")

In [3]:
import importlib

import evaluation_utils

In [4]:
# In the case of xCAPT5, only the predicted probabilities, but not the
# corresponding labels are stored in the output file
# Therefore, as a first step, a `label` column is added to each output
# file
# As threshold, 0.5 is applied, i.e. a PPI having a probability of at
# least 0.5 is predicted to occur and is assigned a label of 1;
# conversely, PPIs with a probability below 0.5 are predicted not to
# occur and are assigned a label of 0

evaluation_utils.add_labels_based_on_probs(
    "xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_{i}_without_XGBoost.tsv",
    pred_col_name="interaction_probability",
    n_fold=10
)

The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_0_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_1_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_2_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_3_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_4_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_5_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_6_without_XGBoost.tsv already comprises a `label` column.
The file xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_test_set_7_without_XGBoost.tsv already comprises a 

In [6]:
# Yet another peculiarity exhibited by Jupyter notebooks is that once an
# external file has been loaded, it isn't automatically reloaded
# This renders code development cumbersome, as the Python kernel would
# have to restarted each time changes are introduced to the external
# file
# Fortunately, there is a more sophisticated solution to this issue,
# which consists of reloading the respective external file by means of
# the `importlib` built-in library
importlib.reload(evaluation_utils)

evaluation_utils.evaluation_k_fold_cross_val(
    ground_truth_path="/Users/jacobanter/Documents/Code/VACV_screen/"\
    "HVIDB_pos_instances_with_nucleolus_neg_instances/VACV_WR_pos_and"\
    "_nucleolus_prots_neg_PPI_instances.tsv",
    splits_path="xCAPT5_interaction_probs_VACV_WR_10-fold_cross-val_"\
    "test_set_{i}_without_XGBoost.tsv",
    n_fold=10,
    probability_key="interaction_probability",
    model_name="xCAPT5 without XGBoost",
    output_path="xCAPT5_without_XGBoost_results_10-fold_cross-"\
    "validation_on_combined_VACV_WR_data_set_without_training.txt"
)

Using 10-fold cross-validation, the metrics for xCAPT5 without XGBoost are as follows:
Accuracy:      0.6432371023790431 ± 0.06186294045717659
Precision:     0.6048990856200923 ± 0.10734006770924984
Recall:        0.707400286757122 ± 0.06632584787404003
F1-score:      0.6476739928885946 ± 0.07935969370594026
Specificity:   0.5891093662345321 ± 0.07082521379945325
ROC AUC score: 0.6352982596561614 ± 0.08957027520962184


((0.6432371023790431, 0.06186294045717659),
 (0.6048990856200923, 0.10734006770924984),
 (0.707400286757122, 0.06632584787404003),
 (0.6476739928885946, 0.07935969370594026),
 (0.5891093662345321, 0.07082521379945325),
 (0.6352982596561614, 0.08957027520962184))