/
find_wu1_epitopes.Rmd
138 lines (102 loc) · 4.27 KB
/
find_wu1_epitopes.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
---
title: "Wu1-SARS2 Epitopes"
output: html_notebook
---
# Description
This rmarkdown file takes the pepMeld output, calculates signficance on the sequence level using
a two-sided t-test (COVID+ vs COVID-), and then finds epitopes based upon consecutive probes that have survived a significance and fold-change threshold.
# Requirements
## R Packages
UW.Adult.Covid.19 - Part of this repository (use devtools::install("UW.Adult.Covid.19))
Matrix - https://cran.r-project.org/web/packages/Matrix/index.html
data.table - https://cran.r-project.org/web/packages/data.table/index.html
multtest - https://www.bioconductor.org/packages/release/bioc/html/multtest.html
preprocessCore - http://bioconductor.org/packages/release/bioc/html/preprocessCore.html
Matrix and data.table are in the requirements of the UW.Adult.Covid.19 package and should be installed upon its installation, the Bioconductor packages can be installed manually using the instructions on the respective package's bioconductor webpage.
## Data
Data can be downloaded from: https://dholk.primate.wisc.edu/project/dho/sequencing/Polypeptide%20Microarrays/public/COVID_19/begin.view
From the above site, this script uses:
- aggregated_data/df_stacked.tsv.gz
- all_sequences_except_wi.tsv.gz
Please download these files and place them in the same directory as the script (or provide the paths in the variables below).
The df_stacked.tsv.gz file can also be generated using pepMeld (http://https://github.com/DABAKER165/pepMeld). Please read the documentation at https://dholk.primate.wisc.edu/project/dho/sequencing/Polypeptide%20Microarrays/public/COVID_19/begin.view
## Output
After successful completion of this script, the following files will be output:
epitopes_wu1.csv - Epitopes detected and their probe boundaries with sequence information
epitope_mat_wu1.csv - Average probe signal within each sample epitope. Rows are protein_start_stop, where start is the first probe's with the protein starting position, and stop is the last probe's protein starting position.
```{r}
# Load library
#devtools::install("~/git/github/Ong-Research/UW_Adult_Covid-19/UW.Adult.Covid.19")
library(UW.Adult.Covid.19);
```
``` {r}
##############################
# BEGIN OF SCRIPT
##############################
```
```{r, parameters}
stacked_df_path = "df_stacked.tsv.gz";
probe_meta_path = "all_sequences_except_wi.tsv.gz";
padj_threshold = 0.1;
lfc_threshold = 1;
```
```{r, sequence_matrix}
seq_mat = loadSeqMat(file_name = stacked_df_path);
```
```{r, sample_meta}
sample_meta = attr(seq_mat, "sample_meta");
```
```{r, probe_meta}
probe_meta = loadProbeMeta(probe_meta_path);
probe_meta = probe_meta[probe_meta$PROBE_SEQUENCE %in% rownames(seq_mat),];
```
```{r, quantile_normalize}
seq_mat_qn = UW.Adult.Covid.19::normalizeQuantile(seq_mat[,-1]);
```
```{r, ttest_covid_vs_control}
seq_mat_qn_covid_vs_control_ttest = UW.Adult.Covid.19::doTTest(
seq_mat_qn,
sample_meta$SAMPLE_NAME[sample_meta$COVID_POSITIVE == "NO"],
sample_meta$SAMPLE_NAME[sample_meta$COVID_POSITIVE == "YES"],
var.equal = FALSE,
label.col = "row.names"
)
```
```{r, probe}
probe_mat_qn_covid_vs_control_ttest =
as.data.frame(
UW.Adult.Covid.19::convertSequenceMatToProbeMat(
seq_mat = seq_mat_qn_covid_vs_control_ttest[seq_mat_qn_covid_vs_control_ttest$ID %in% probe_meta$PROBE_SEQUENCE,-1],
probe_meta = probe_meta
)
)
```
```{r, epitopes}
epi = findEpitopesTTestProbe(probe_mat_qn_covid_vs_control_ttest, probe_meta, lfc.threshold = lfc_threshold, pvalue.threshold = padj_threshold);
```
```{r, epitopes_wu1}
epi$Epitope_ID = getEpitopeID(epi$Protein, epi$Start, epi$Stop);
epi_wu1 = epi[grep("Wu1", epi$Protein),]
epi_wu1 = epi_wu1[grep("NC_045512.2;YP_009725295.1;Wu1-SARS2_orf1a", epi_wu1$Protein, invert=TRUE),];
write.csv(epi_wu1, "epitopes_wu1.csv", row.names=TRUE);
```
```{r, probe_mat}
probe_mat = convertSequenceMatToProbeMat(
seq_mat = seq_mat_qn[rownames(seq_mat_qn) %in% probe_meta$PROBE_SEQUENCE,],
probe_meta = probe_meta
);
```
```{r}
epitope_mat_wu1 = getEpitopeMat(probe_mat = probe_mat, epitopes_df = epi_wu1);
write.csv(epitope_mat_wu1, "epitopes_mat_wu1.csv", row.names=TRUE);
```
```{r}
##############################
# END OF SCRIPT
##############################
cat("END OF SCRIPT!\n");
```
```{r results='asis'}
#Print out session info
sessionInfo()
```