Skip to content

Commit

Permalink
Josh/vignette (#158)
Browse files Browse the repository at this point in the history
* Created Tissue Annotation Vignette

* Synced Liver_facs with the more detailed vignette.

* fixed typos.
  • Loading branch information
batson committed Mar 19, 2018
1 parent 0791b44 commit 2ccc12a
Show file tree
Hide file tree
Showing 3 changed files with 107 additions and 129 deletions.
60 changes: 23 additions & 37 deletions 00_data_ingest/02_tissue_analysis_rmd/Liver_facs.Rmd
Expand Up @@ -5,8 +5,6 @@ output:
html_notebook: default
---

## Preprocess and Cluster

Enter the name of the tissue you want to analyze.

```{r}
Expand All @@ -16,34 +14,25 @@ source(here("00_data_ingest", "02_tissue_analysis_rmd", "boilerplate.R"))
tiss = load_tissue_facs(tissue_of_interest)
```

The loading function above includes basic preprocessing of the gene x cell matrix, including

1. Filtering out cells with fewer than 500 genes or 50,000 reads.
2. Log-normalizing counts for each cell (log(1 + counts per million)).
3. Shifting and scaling each gene to have mean 0 and variance 1.
4. Finding variable genes (those with high dispersion, given their mean).
5. PCA to project the variable genes to 20 dimensions.

We can visualize top genes in each principal component.

```{r, echo=FALSE, fig.height=4, fig.width=8}
```{r, echo=FALSE}
PCHeatmap(object = tiss, pc.use = 1:3, cells.use = 500, do.balanced = TRUE, label.columns = FALSE, num.genes = 8)
```

To further reduce variance in the data, we will project onto the top principal components. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. A decent rule of thumb is to pick the elbow in the plot below.
We then project onto just the top principal components. This has the effect of keeping the major directions of variation in the data and, ideally, supressing noise. A decent rule of thumb is to pick the elbow in the plot below.

```{r}
PCElbowPlot(object = tiss)
```

Choose the number of principal components to use.

```{r}
# Set number of principal components.
n.pcs = 11
```


The clustering is performed using on a shared-nearest-neighbors graph on the cells. Two cells are connected in the graph of their k-neighborhoods of cells overlap. The (enhanced) Louvain algorithm looks for groups of cells with high modularity--more connections within the group than between groups. The resolution parameter determines the tradeoff between in-group connections and between-group connections in that objective. Higher resolution will give more clusters, lower resolution will give fewer.
## Cluster

```{r}
# Set resolution
Expand All @@ -67,32 +56,26 @@ TSNEPlot(object = tiss, do.label = T, pt.size = 1.2, label.size = 4)

Check expression of genes useful for indicating cell type.

hepatocyte: Alb, Ttr, Apoa1, and Serpina1c
endothelial cells: Pecam1, Nrp1, Kdr+ and Oit3+
Kuppfer cells: Emr1, Clec4f, Cd68, Irf7
NK/NKT cells: Zap70, Il2rb, Nkg7, Cxcr6, Klr1c, Gzma
B cells: Cd79a, Cd79b, Cd74 and Cd19

```{r}
genes_hep = c('Alb', 'Ttr', 'Apoa1', 'Serpina1c')
genes_endo = c('Pecam1', 'Nrp1', 'Kdr','Oit3')
genes_kuppfer = c('Emr1', 'Clec4f', 'Cd68', 'Irf7')
genes_nk = c('Zap70', 'Il2rb', 'Nkg7', 'Cxcr6', 'Gzma')
genes_b = c('Cd79a', 'Cd79b', 'Cd74')
genes_hep = c('Alb', 'Ttr', 'Apoa1', 'Serpina1c') #hepatocyte
genes_endo = c('Pecam1', 'Nrp1', 'Kdr','Oit3') # endothelial
genes_kuppfer = c('Emr1', 'Clec4f', 'Cd68', 'Irf7') # Kuppfer cells
genes_nk = c('Zap70', 'Il2rb', 'Nkg7', 'Cxcr6') # Natural Killer cells
genes_b = c('Cd79a', 'Cd79b', 'Cd74', 'Cd19') # B Cells
genes_all = c(genes_hep, genes_endo, genes_kuppfer, genes_nk, genes_b)
```

In the tSNE plots below, the intensity of each point represents the gene expression, scaled to mean 0 and variance 1 across all cells.
In the tSNE plots below, the intensity of each point represents the log-normalized gene expression $N_{ij}$.

```{r, echo=FALSE, fig.height=24, fig.width=12}
FeaturePlot(tiss, genes_all, pt.size = 3, nCol = 3, cols.use = c("lightgrey", "blue"))
```{r, echo=FALSE, fig.height=20, fig.width=16}
FeaturePlot(tiss, genes_all, pt.size = 3, nCol = 4, cols.use = c("lightgrey", "blue"), no.legend = F)
```

Dotplots show, for each cluster and gene, the fraction of cells with at least one read for the gene (circle size) and the average scaled expression for that gene among the cells expressing it (circle color).

```{r, echo=FALSE, fig.height=10, fig.width=6}
DotPlot(tiss, genes_all, plot.legend = T, col.max = 2.5, do.return = T) + coord_flip()
```{r, echo=FALSE}
DotPlot(tiss, genes_all, plot.legend = T, col.max = 2.5, x.lab.rot = T)
```

The low but nonzero levels of Albumin present in all clusters is consistent with a small amount of leakage, either through physical contamination or index hopping. Nevertheless, the absolute levels of expression confirm a sharp difference between the hepatocyte clusters and the others.
Expand All @@ -104,16 +87,17 @@ VlnPlot(tiss, 'Alb', use.raw = T, do.return = T)
To confirm the identity of a cluster, you can inspect the genes differentially expressed in that cluster compared to the others.

```{r}
clust.markers7 <- FindMarkers(object = tiss, ident.1 = 7, only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
clust.markers7 <- FindMarkers(object = tiss, ident.1 = 7,
only.pos = TRUE, min.pct = 0.25, thresh.use = 0.25)
```

The top markers for cluster 7 include histocompatibility markers H2-*, consistent with the expression of other B-cell markers seen above.

```{r}
clust.markers7
head(clust.markers7)
```

Using the markers, we can confidentaly label the clusters:
Using the markers, we can confidentaly label the clusters. We provide both a free annotation (where anything name can be used) and a cell ontology class. The latter uses a controlled vocabulary for easy comparison between studies and different levels of the taxonomy.

```{r}
tiss <- StashIdent(object = tiss, save.name = "cluster.ids")
Expand All @@ -129,7 +113,7 @@ free_annotation <- c(
"kuppfer",
"hepatocyte",
"B cell",
"Nk/NKT cells")
"NK/NKT cells")
cell_ontology_class <-c(
"endothelial cell of hepatic sinusoid",
Expand All @@ -148,13 +132,13 @@ tiss = stash_annotations(tiss, cluster.ids, free_annotation, cell_ontology_class

## Checking for batch effects

Color by metadata, like plate barcode, to check for batch effects. Here we see that the clusters are segregated by sex, though cell types are mixed within each population.
Color by metadata, like plate barcode, to check for batch effects. Here we see that the clusters are segregated by sex.

```{r}
TSNEPlot(object = tiss, do.return = TRUE, group.by = "mouse.id")
```

Every cluster contains cell types from multiple mice.
Nevertheless, every cluster contains cells from multiple mice.

```{r}
table(FetchData(tiss, c('mouse.id','ident')) %>% droplevels())
Expand Down Expand Up @@ -190,3 +174,5 @@ save(tiss, file=filename)
```{r}
save_annotation_csv(tiss, tissue_of_interest, "facs")
```


172 changes: 82 additions & 90 deletions 00_data_ingest/02_tissue_analysis_rmd/Liver_facs.nb.html

Large diffs are not rendered by default.

Expand Up @@ -139,7 +139,7 @@ $$Q = \sum_{ij} \left (A_{ij} - \gamma \frac{k_i k_j}{2m} \right ) \delta(c_i, c

where $A_{ij}$ is the weighted adjacency matrix, $k_i$ and $k_j$ are the weighted degrees of cells $i$ and $j$, $m$ is the total weight of edges in the graph, $c_i$ denotes cluster membership, and $\delta(c_i,c_j)$ is $1$ if $i$ and $j$ are in the same cluster, and $0$ otherwise.

The resolution $\gamma$ is a tuneable parameter in this analysis that sets the tradeoff off between in-group connections and between-group connections. High resolution favors smaller clusters.
The resolution $\gamma$ is a tuneable parameter in this analysis that sets the tradeoff between in-group connections and between-group connections. High resolution favors smaller clusters.

```{r}
# Set resolution
Expand Down Expand Up @@ -204,7 +204,7 @@ The top markers for cluster 7 include histocompatibility markers H2-*, consisten
head(clust.markers7)
```

Using the markers, we can confidentaly label the clusters. We provide both a free annotation (where anything name can be used) and a cell ontology class. The latter uses a controlled vocabulary for easy comparison between studies and different levels of the taxonomy.
Using the markers, we can confidentaly label the clusters. We provide both a free annotation (where any name can be used) and a cell ontology class. The latter uses a controlled vocabulary for easy comparison between studies and different levels of the taxonomy.

```{r}
tiss <- StashIdent(object = tiss, save.name = "cluster.ids")
Expand Down

0 comments on commit 2ccc12a

Please sign in to comment.