Skip to content

Commit

Permalink
Make bone marrow scRNA-seq example dataset smaller
Browse files Browse the repository at this point in the history
Make example data smaller by excluding not expressed genes and reducing the number of cells. This should reduce memory usage and time of building and running the tapseq_target_genes vignette.
  • Loading branch information
argschwind committed Mar 11, 2020
1 parent 85c321c commit 945a66d
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 11 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: TAPseq
Type: Package
Title: Targeted scRNA-seq primer design for TAP-seq
Version: 0.99.2
Version: 0.99.3
Authors@R: c(
person("Andreas", "Gschwind", email = "andreas.gschwind@stanford.edu",
role = c("aut", "cre"), comment = c(ORCID = "0000-0002-0769-6907")),
Expand Down
20 changes: 13 additions & 7 deletions data-raw/bone_marrow_genex.R
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
library(Seurat)
library(Matrix)

## create Seurat object containing cell population example data

Expand All @@ -22,19 +23,24 @@ cell_idents <- Idents(NicheData10x_filt)
object <- CreateSeuratObject(counts = counts)
Idents(object) <- cell_idents

# subsample cells to about 10% of cells (~350cells)
set.seed("20200115")
# get top 5% cells per population (~180cells)
n_txs <- colSums(object)
cell_idents <- cell_idents[names(sort(n_txs, decreasing = TRUE))]
idents_split <- split(cell_idents, f = cell_idents)
idents_sampled <- lapply(idents_split, FUN = function(x) {
sample(x, size = length(x) * 0.10)
idents_top <- lapply(idents_split, FUN = function(x) {
head(x, n = length(x) * 0.05)
})

# create vector with cell ids for these cells
names(idents_sampled) <- NULL
sampled_cells <- names(unlist(idents_sampled))
names(idents_top) <- NULL
top_cells <- names(unlist(idents_top))

# subset object to these cells
bone_marrow_genex <- subset(object, cells = sampled_cells)
bone_marrow_genex <- subset(object, cells = top_cells)

# remove any genes with less than 10 total transcripts
txs <- rowSums(GetAssayData(bone_marrow_genex))
bone_marrow_genex <- subset(bone_marrow_genex, features = names(txs[txs > 10]))

# save data as RData files in data directory
usethis::use_data(bone_marrow_genex, overwrite = TRUE)
Binary file modified data/bone_marrow_genex.rda
Binary file not shown.
6 changes: 3 additions & 3 deletions vignettes/tapseq_target_genes.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -70,11 +70,11 @@ length(target_genes_100)
To intuitively assess how well a chosen set of target genes distinguishes cell types, we can use
UMAP plots based on the full gene expression data and on target genes only.
```{r, message=FALSE, warning=FALSE, fig.height=3, fig.width=7.15}
plotTargetGenes(bone_marrow_genex, target_genes = target_genes_cv)
plotTargetGenes(bone_marrow_genex, target_genes = target_genes_100)
```

We can see that the expression of the `r length(target_genes_cv)` automatically selected target
genes groups cells of different populations together.
We can see that the expression of the `r length(target_genes_100)` selected target genes groups
cells of different populations together.

A good follow up would be to cluster the cells based on only the target genes following the same
workflow used to define the cell identities in the original object. This could then be used to
Expand Down

0 comments on commit 945a66d

Please sign in to comment.