/
OriginalModelTutorial.Rmd
61 lines (49 loc) · 3.33 KB
/
OriginalModelTutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
---
title: "Evaluating Signatures Using Original Models"
author:
- name: Xutao Wang
affiliation:
- Department of Biostatistics, Boston University, Boston, MA
email: xutaow@bu.edu
package: TBSignatureProfiler
output:
BiocStyle::html_document
vignette: >
%\VignetteIndexEntry{"Evaluating signatures using their original models"}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: inline
---
# Introduction
Some of the gene signatures included in the TBSignatureProfiler were originally trained using a machine learning or statistical model. In order to provide an element of completeness to our package, we have included these models for users to run and compare to the methods that serve as the main mechanism of scoring gene signatures in the TBSP.
This vignette provides some examples to allow users to evaluate certain signatures' performance using these original models. Currently, the package has incorporated the original methods for the gene signatures listed in the code chunk below. The specific genes within each biomarker can be found by calling that gene within the `TBsignatures` data object.
```{r, eval=FALSE}
library(TBSignatureProfiler)
signatureOriginalModel <- c("Anderson_42", "Anderson_OD_51", "Kaforou_27",
"Kaforou_OD_44", "Kaforou_OD_53", "Sweeney_OD_3",
"Maertzdorf_4", "Maertzdorf_15", "LauxdaCosta_OD_3",
"Verhagen_10", "Jacobsen_3", "Sambarey_HIV_10",
"Leong_24", "Berry_OD_86", "Berry_393",
"Bloom_OD_144", "Suliman_RISK_4", "Zak_RISK_16",
"Leong_RISK_29", "Zhao_NANO_6")
```
# Evaluation
In this tutorial, we will work with HIV and Tuberculosis (TB) gene expression data in a `SummarizedExperiment` format. First, we evaluate the performance of all available TB gene signatures whose original models have been included in the package by setting `geneSignaturesName = ""`.
```{r, eval = FALSE, message=FALSE, results='hide'}
# HIV/TB gene expression data, included in the package
hivtb_data <- TB_hiv
out <- evaluateOriginalModel(input = hivtb_data, geneSignaturesName = "",
useAssay = "counts")
out$Zak_RISK_16_OriginalModel
```
Users can also evaluate selected gene signatures based on their preference.
```{r, eval = FALSE, message=FALSE, results='hide'}
outSub <- evaluateOriginalModel(input = hivtb_data,
geneSignaturesName = c("Anderson_42", "Sweeney_OD_3",
"Verhagen_10", "Zak_RISK_16"),
useAssay = "counts")
# The predicted score from each signature can be viewed by calling:
colData(outSub)[, paste0(c("Anderson_42", "Sweeney_OD_3", "Verhagen_10", "Zak_RISK_16"), "_OriginalModel")]
```
The returned object is also of the `SummarizedExperiment`. The scores will be returned as a part of the `colData` with column names formatted as "Name_Of_Signature_OriginalModel". The structure of the returned object is the same as the one given by `runTBsigProfiler`. At this point, users may now follow the guidance to using the package given in the main [package vignette](https://wejlab.github.io/TBSignatureProfiler-docs/articles/rmd/TBSig_Vignette.html) for downstream analysis.