/
Pancreas_facs.Rmd
338 lines (252 loc) · 10.7 KB
/
Pancreas_facs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
---
title: "Pancreas FACS Notebook"
output: html_notebook
---
Specify the tissue of interest, run the boilerplate code which sets up the functions and environment, load the tissue object.
```{r}
tissue_of_interest = "Pancreas"
library(here)
source(here("00_data_ingest", "02_tissue_analysis_rmd", "boilerplate.R"))
tiss = load_tissue_facs(tissue_of_interest)
```
Visualize top genes in principal components
```{r, echo=FALSE, fig.height=4, fig.width=8}
PCHeatmap(object = tiss, pc.use = 1:3, cells.use = 500, do.balanced = TRUE, label.columns = FALSE, num.genes = 8)
```
Later on (in FindClusters and TSNE) you will pick a number of principal components to use. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. There is no correct answer to the number to use, but a decent rule of thumb is to go until the plot plateaus.
```{r}
PCElbowPlot(object = tiss)
```
Choose the number of principal components to use.
```{r}
# Set number of principal components.
n.pcs = 12
```
The clustering is performed based on a nearest neighbors graph. Cells that have similar expression will be joined together. The Louvain algorithm looks for groups of cells with high modularity--more connections within the group than between groups. The resolution parameter determines the scale. Higher resolution will give more clusters, lower resolution will give fewer.
For the top-level clustering, aim to under-cluster instead of over-cluster. It will be easy to subset groups and further analyze them below.
```{r}
# Set resolution
res.used <- 0.5
tiss <- FindClusters(object = tiss, reduction.type = "pca", dims.use = 1:n.pcs,
resolution = res.used, print.output = 0, save.SNN = TRUE)
```
We use TSNE solely to visualize the data.
```{r}
# If cells are too spread out, you can raise the perplexity. If you have few cells, try a lower perplexity (but never less than 10).
tiss <- RunTSNE(object = tiss, dims.use = 1:n.pcs, seed.use = 10, perplexity=30)
```
```{r}
TSNEPlot(object = tiss, do.label = T, pt.size = 1.2, label.size = 4)
```
Check expression of genes useful for indicating cell type. For the islet cells, the mRNA for their specific secretory molecule is a strong signal.
general endocrine: Chga, Isl1
alpha: Gcg, Mafb, Arx,
beta: Ins1, Ins2, Mafa, Nkx6-1, Slc2a2,
gamma: Ppy
delta: Sst, Hhex
epsilon: Ghrl
ductal: Krt19, Hnf1b
immune: Ptprc
stellate: Pdgfra, Pdgfrb
endothelial: Pecam1, Cdh5, Kdr
acinar: Amy2b, Cpa1
other genes of interest: Cpa1, Ptf1a, Neurog3(endocrine progenitor and perhaps adult delta),Pdx1(beta and delta)
```{r, echo=FALSE, fig.height=12, fig.width=12}
genes_to_check = c('Chga', 'Isl1', 'Gcg', 'Mafb', 'Arx', 'Ins1', 'Ins2', 'Mafa', 'Nkx6-1', 'Slc2a2', 'Sst', 'Hhex', 'Pdx1', 'Ppy','Ghrl', 'Krt19', 'Hnf1b', 'Amy2b', 'Cpa1', 'Ptf1a', 'Pdgfra', 'Pdgfrb', 'Pecam1', 'Cdh5', 'Kdr','Ptprc', 'Neurog3')
FeaturePlot(tiss, genes_to_check, pt.size = 1, nCol = 5, cols.use = c("grey", "blue"))
```
Dotplots let you see the intensity of expression and the fraction of cells expressing for each of your genes of interest.
The radius shows you the percent of cells in that cluster with at least one read sequenced from that gene. The color level indicates the average Z-score of gene expression for cells in that cluster, where the scaling is done over all cells in the sample.
```{r, echo=FALSE, fig.height=8, fig.width=10}
DotPlot(tiss, genes_to_check, plot.legend = T, col.max = 2.5, do.return = T) + coord_flip()
```
We can also find all differentially expressed genes marking each cluster. This may take some time.
```{r}
#clust.markers0 <- FindMarkers(object = tiss, ident.1 = 0, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
#tiss.markers <- FindAllMarkers(object = tiss, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
```
Display the top markers you computed above.
```{r}
#tiss.markers %>% group_by(cluster) %>% top_n(5, avg_logFC)
```
Using the markers above, we can confidentaly label many of the clusters:
0: beta
3: acinar
4: ductal
6: beta
7: endothelial
8: immune
9: stellate
The abundance of Ppy and Gcg in clusters 1 and 2 makes them seem like mixtures of alpha and gamma cells. The expression of Sst and Hhex in cluster 5
indicates that it might contain many delta cells, but to get a finer resolution, we subset the data and recompute.
We will add those cell_ontology_class to the dataset.
```{r}
tiss <- StashIdent(object = tiss, save.name = "cluster.ids")
cluster.ids <- c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
free_annotation <- c(
"beta cell",
NA,
NA,
"acinar cell",
"ductal cell",
NA,
"beta cell",
NA,
NA,
"stellate cell")
cell_ontology_class <-c(
"type B pancreatic cell",
NA,
NA,
"pancreatic acinar cell",
"pancreatic ductal cell",
NA,
"type B pancreatic cell",
"endothelial cell",
"leukocyte",
"pancreatic stellate cell")
tiss = stash_annotations(tiss, cluster.ids, free_annotation, cell_ontology_class)
```
## Checking for batch effects
Color by metadata, like plate barcode, to check for batch effects.
```{r}
TSNEPlot(object = tiss, do.return = TRUE, group.by = "plate.barcode")
```
## Subcluster
```{r}
subtiss = SubsetData(tiss, ident.use = c(1,2,5))
```
```{r}
subtiss <- subtiss %>% ScaleData() %>%
FindVariableGenes(do.plot = TRUE, x.high.cutoff = Inf, y.cutoff = 0.5) %>%
RunPCA(do.print = FALSE)
```
```{r}
PCHeatmap(object = subtiss, pc.use = 1:3, cells.use = 500, do.balanced = TRUE, label.columns = FALSE, num.genes = 8)
PCElbowPlot(subtiss)
```
```{r}
sub.n.pcs = 8
sub.res.use = 1
subtiss <- subtiss %>% FindClusters(reduction.type = "pca", dims.use = 1:sub.n.pcs,
resolution = sub.res.use, print.output = 0, save.SNN = TRUE) %>%
RunTSNE(dims.use = 1:sub.n.pcs, seed.use = 10, perplexity=30)
TSNEPlot(object = subtiss, do.label = T, pt.size = 1.2, label.size = 4)
```
```{r, echo=FALSE, fig.height=12, fig.width=8}
FeaturePlot(subtiss, genes_to_check, nCol = 5, cols.use = c("grey", "blue"))
```
```{r, echo=FALSE, fig.height=8, fig.width=10}
DotPlot(subtiss, genes_to_check, col.max = 2.5, plot.legend = T, do.return = T) + coord_flip()
```
```{r}
VlnPlot(subtiss, 'Ppy')
```
```{r}
table(subtiss@ident)
```
Discover new PP marker genes (negative or positive)
```{r}
gamma_markers = FindMarkers(subtiss, ident.1 = c(6,7), ident.2 = c(0,1,2,4), test.use = "roc")
```
```{r}
gamma_markers = FindMarkers(subtiss, ident.1 = c(6,7), ident.2 = c(0,1,2,4), test.use = "wilcox")
```
New markers from this test include
1) negative marker genes, aka. those abscent in PP cells or highly abundant in alpha cells
'Arg1', 'Mafb', 'Gfra3', 'Slc38a5', 'Dpp10','Ang','Irx1',
2) positive marker genes, aka. those abscent in alpha cells or highly abundant in PP cells
'Cd9', 'Spp1', 'Tspan8', 'Folr1','Vsig1'
```{r}
gamma_genes_to_check_neg = c('Arg1', 'Mafb', 'Gfra3', 'Slc38a5', 'Dpp10','Ang','Irx1')
gamma_genes_to_check_pos = c('Cd9', 'Spp1', 'Tspan8', 'Folr1','Vsig1')
```
```{r, echo=FALSE, fig.height=4, fig.width=8}
FeaturePlot(subtiss, gamma_genes_to_check_neg, nCol = 5, cols.use = c("grey", "blue"))
```
```{r}
DotPlot(subtiss, gamma_genes_to_check_neg, col.max = 2.5, plot.legend = T, do.return = T) + coord_flip()
```
```{r, echo=FALSE, fig.height=2, fig.width=8}
FeaturePlot(subtiss, gamma_genes_to_check_pos, nCol = 5, cols.use = c("grey", "blue"))
```
```{r}
DotPlot(subtiss, gamma_genes_to_check_pos, col.max = 2.5, plot.legend = T, do.return = T) + coord_flip()
```
```{r, echo=FALSE, fig.height=6, fig.width=8}
subtiss_genes_to_check = c('Chga', 'Isl1', 'Gcg', 'Mafb', 'Arx', 'Sst', 'Hhex', 'Pdx1', 'Ppy','Ghrl','Gfra3', 'Slc38a5', 'Dpp10','Ang','Irx1','Cd9', 'Spp1', 'Tspan8', 'Folr1','Vsig1')
FeaturePlot(subtiss, subtiss_genes_to_check, nCol = 5, cols.use = c("grey", "blue"))
```
```{r, echo=FALSE, fig.height=8, fig.width=10}
DotPlot(subtiss, subtiss_genes_to_check, col.max = 2.5, plot.legend = T, do.return = T) + coord_flip()
```
From these genes, it appears that the clusters represent:
0: alpha
1: alpha
2: alpha
3: delta
4: alpha
5: delta
6: gamma
7: gamma
The multitude of clusters of each type correspond mostly to individual animals/sexes.
```{r}
table(FetchData(subtiss, c('mouse.id','ident')) %>% droplevels())
```
```{r}
sub.cluster.ids <- c(0, 1, 2, 3, 4, 5, 6, 7)
sub.free_annotation <- c("pancreatic A cell", "pancreatic A cell", "pancreatic A cell", "pancreatic D cell", "pancreatic A cell", "pancreatic D cell", "pancreatic PP cell", "pancreatic PP cell")
sub.cell_ontology_class <-c("pancreatic A cell", "pancreatic A cell", "pancreatic A cell", "pancreatic D cell", "pancreatic A cell", "pancreatic D cell", "pancreatic PP cell", "pancreatic PP cell")
subtiss = stash_annotations(subtiss, sub.cluster.ids, sub.free_annotation, sub.cell_ontology_class)
tiss = stash_subtiss_in_tiss(tiss, subtiss)
```
## Checking for batch effects
Color by metadata, like plate barcode, to check for batch effects.
```{r}
TSNEPlot(object = tiss, do.return = TRUE, group.by = "plate.barcode")
```
# Final coloring
Color by cell ontology class on the original TSNE.
```{r}
TSNEPlot(object = tiss, do.return = TRUE, group.by = "cell_ontology_class")
table(tiss@meta.data[["cell_ontology_class"]])
```
# Save the Robject for later
```{r}
filename = here('00_data_ingest', '04_tissue_robj_generated',
paste0("facs_", tissue_of_interest, "_seurat_tiss.Robj"))
print(filename)
save(tiss, file=filename)
```
```{r}
# To reload a saved object
# filename = here('00_data_ingest', '04_tissue_robj_generated',
# paste0("facs_", tissue_of_interest, "_seurat_tiss.Robj"))
# load(file=filename)
```
## Add column for Neurog3 expression for supplemental figures
```{r}
tiss@meta.data[, 'Neurog3>0_scaled'] = FetchData(tiss, c("Neurog3"), use.raw=FALSE) > 0
tiss@meta.data[, 'Neurog3>0_raw'] = FetchData(tiss, c("Neurog3"), use.raw=TRUE) > 0
additional_cols = c('Neurog3>0_scaled', 'Neurog3>0_raw')
```
# Explore Ppy+ multihormonal cells (to be continued)
```{r}
FetchData(subtiss, c('Ppy', 'Gcg', 'Arx', 'Irx2', 'Mafb', 'mouse.id', 'plate.barcode', 'ident')) %>%
ggplot(aes(x = Ppy, y = Gcg, color = plate.barcode)) + geom_point()
# It would be interesting to fetch the ids of these cells expressing high Gcg as well as Ppy, and examine their genetic signature.
```
```{r}
gammatiss <- RunPCA(subtiss, pc.genes = c('Arg1', 'Mafb', 'Gfra3', 'Slc38a5', 'Dpp10','Ang','Irx1', 'Cd9', 'Spp1', 'Tspan8', 'Folr1','Vsig1'), pcs.compute = 3)
```
```{r}
PCHeatmap(object = gammatiss, pc.use = 1:3, cells.use = 500, do.balanced = TRUE, label.columns = FALSE, num.genes = 8)
```
```{r}
GenePlot(subtiss, 'Vsig1', 'Gfra3')
```
# Export the final metadata
Write the cell ontology and free annotations to CSV.
```{r}
save_annotation_csv(tiss, tissue_of_interest, "facs", additional_cols = additional_cols)
```