/
Trachea_facs.Rmd
161 lines (109 loc) · 4.87 KB
/
Trachea_facs.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
---
title: "Trachea FACS Notebook"
output: html_notebook
---
Specify the tissue of interest, run the boilerplate code which sets up the functions and environment, load the tissue object.
```{r}
tissue_of_interest = "Trachea"
library(here)
source(here("00_data_ingest", "02_tissue_analysis_rmd", "boilerplate.R"))
tiss = load_tissue_facs(tissue_of_interest)
```
Visualize top genes in principal components
```{r, echo=FALSE, fig.height=4, fig.width=8}
PCHeatmap(object = tiss, pc.use = 1:3, cells.use = 500, do.balanced = TRUE, label.columns = FALSE, num.genes = 8)
```
Later on (in FindClusters and TSNE) you will pick a number of principal components to use. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. There is no correct answer to the number to use, but a decent rule of thumb is to go until the plot plateaus.
```{r}
PCElbowPlot(object = tiss)
```
Choose the number of principal components to use.
```{r}
n.pcs = 16
```
The clustering is performed based on a nearest neighbors graph. Cells that have similar expression will be joined together. The Louvain algorithm looks for groups of cells with high modularity--more connections within the group than between groups. The resolution parameter determines the scale...higher resolution will give more clusters, lower resolution will give fewer.
For the top-level clustering, aim to under-cluster instead of over-cluster. It will be easy to subset groups and further analyze them below.
```{r}
# Set resolution
res.used <- 0.2
tiss <- FindClusters(object = tiss, reduction.type = "pca", dims.use = 1:n.pcs,
resolution = res.used, print.output = 0, save.SNN = TRUE, force.recalc = TRUE)
```
To visualize
```{r}
# If cells are too spread out, you can raise the perplexity. If you have few cells, try a lower perplexity (but never less than 10).
tiss <- RunTSNE(object = tiss, dims.use = 1:n.pcs, seed.use = 10, perplexity=15)
```
```{r}
# note that you can set do.label=T to help label individual clusters
TSNEPlot(object = tiss, do.label = T)
```
Check expression of genes of interset.
```{r, echo=FALSE, fig.height=12, fig.width=8}
genes_to_check = c('Epcam', 'Cdh1', 'Krt5', 'Scgb1a1', 'Pdgfrb', 'Pdgfra', 'Col1a1', 'Col8a1', 'Foxj1', 'Pecam1', 'Ptprc')
FeaturePlot(tiss, genes_to_check, pt.size = 1, nCol = 3)
```
Dotplots let you see the intensity of exppression and the fraction of cells expressing for each of your genes of interest.
```{r, echo=FALSE, fig.height=4, fig.width=8}
# To change the y-axis to show raw counts, add use.raw = T.
DotPlot(tiss, genes_to_check, plot.legend = T, cols.use = c("green","red"))
```
How big are the clusters?
```{r}
table(tiss@ident)
```
Which markers identify a specific cluster?
```{r}
clust.markers <- FindMarkers(object = tiss, ident.1 = 0, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
```
```{r}
print(x = head(x= clust.markers, n = 10))
```
You can also compute all markers for all clusters at once. This may take some time.
```{r}
#tiss.markers <- FindAllMarkers(object = tiss, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
```
Display the top markers you computed above.
```{r}
#tiss.markers %>% group_by(cluster) %>% top_n(5, avg_logFC)
```
## Assigning cell type identity to clusters
At a coarse level, we can use canonical markers to match the unbiased clustering to known cell types:
```{r}
# stash current cluster IDs
tiss <- StashIdent(object = tiss, save.name = "cluster.ids")
# enumerate current cluster IDs and the labels for them
cluster.ids <- c(0, 1, 2, 3, 4, 5, 6, 7)
free_annotation = c(NA,NA,NA,NA,NA,NA,NA, NA)
cell_ontology_class <- c("mesenchymal cell","mesenchymal cell","mesenchymal cell","epithelial cell","blood cell","endothelial cell","blood cell", "epithelial cell")
tiss = stash_annotations(tiss, cluster.ids, free_annotation, cell_ontology_class)
TSNEPlot(object = tiss, do.label = TRUE, pt.size = 0.5, group.by='cell_ontology_class')
```
## Checking for batch effects
Color by metadata, like plate barcode, to check for batch effects.
```{r}
TSNEPlot(object = tiss, do.return = TRUE, group.by = "plate.barcode")
```
Print a table showing the count of cells in each identity category from each plate.
```{r}
table(as.character(tiss@ident), as.character(tiss@meta.data$plate.barcode))
```
# Save the Robject for later
When you save the annotated tissue, please give it a name.
```{r}
filename = here('00_data_ingest', '04_tissue_robj_generated',
paste0("facs_", tissue_of_interest, "_seurat_tiss.Robj"))
print(filename)
save(tiss, file=filename)
```
```{r}
# To reload a saved object
# filename = here('00_data_ingest', '04_tissue_robj_generated',
# paste0("facs_", tissue_of_interest, "_seurat_tiss.Robj"))
# load(file=filename)
```
# Export the final metadata
Write the cell ontology and free annotations to CSV.
```{r}
save_annotation_csv(tiss, tissue_of_interest, "facs")
```