Filter the raw data from Ernst to get matrix of counts for different cell types.

Data available https://www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-6946/ (Ernst et al. 2019)


In [1]:
library(Matrix)
library("data.table")
library(edgeR)
library(SingleCellExperiment)
library(scran)
library(scuttle)

Loading required package: limma



In [14]:
# Load male RPKM data (3.01 GB file)
raw_counts <- readMM('data/ernst/raw_counts.mtx') #load the data as a sparse matrix

The raw_counts row are genes and colummns are cells

In [3]:
raw_counts[1:5,1:5] #This takes time as the raw_counts.mtx file is 3.01 GB 

5 x 5 sparse Matrix of class "dgTMatrix"
              
[1,] . . . . .
[2,] . . . . .
[3,] . . . . .
[4,] . . . . .
[5,] . . . . .

In [29]:
# Load metadata and genes
cell_metadata <- read.delim("data/ernst/cell_metadata.txt", header = TRUE, sep = " ", dec = ".")
genes <- read.delim("data/ernst/genes.tsv", header = TRUE, sep = "\t", dec = ".")


names(cell_metadata) #Check metadata of the table
head(cell_metadata)

Unnamed: 0_level_0,Sample,Barcode,Library,is_cell_control,total_features_by_counts,log10_total_features_by_counts,total_counts,log10_total_counts,pct_counts_in_top_50_features,pct_counts_in_top_100_features,pct_counts_in_top_200_features,pct_counts_in_top_500_features,BroadClusters,AnnotatedClusters
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<lgl>,<int>,<dbl>,<int>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<chr>,<chr>
1,B6,AAACCTGAGATGGCGT-1,do17815,False,4761,3.677789,41018,4.612985,26.05929,34.15817,45.16798,63.93778,Germ,S9
2,B6,AAACCTGAGTTCGATC-1,do17815,False,3211,3.506776,20236,4.306146,30.46057,39.52856,51.37379,70.47341,Germ,S10
3,B6,AAACCTGTCAAGATCC-1,do17815,False,6175,3.790707,41335,4.616328,12.71078,19.64437,29.30204,46.93117,Germ,S3
4,B6,AAACCTGTCACGATGT-1,do17815,False,4873,3.687886,13920,4.14367,13.06753,20.39511,29.67672,45.41667,Germ,Spermatogonia
5,B6,AAACCTGTCTGCTGCT-1,do17815,False,6029,3.780317,48207,4.683119,14.77379,21.76862,31.5867,50.32672,Germ,S6
6,B6,AAACGGGCAGCCAATT-1,do17815,False,5796,3.763203,42084,4.624127,13.13088,20.28324,30.4011,49.39169,Germ,S6


The table have 2 interesting colummns Sample and AnnotatedClusters. This metadata give us the age of the mice (Sample)
and the cell type (AnnotatedClusters). It is based in the analysis published by Ernst et al. 2019.

In [4]:
table(cell_metadata$Sample) # Different ages of the mice B6 is adult, and P means days after birth. 


   B6   P10   P15   P20   P25   P30   P35    P5   Tc0   Tc1 
 3355  3213  4258  1775  4334  2278  3160  8112  9677 13348 

In [5]:
table(cell_metadata$AnnotatedClusters) # Different clusters showing different cell types. See Fig. 2 Erns et al. 2019


                D Endothelial_cells               eP1               eP2 
             3162               309               688              1855 
   Fetal_Leydig_1    Fetal_Leydig_2  Interstitial_tMg          Leydig_1 
             5730               845               198               819 
         Leydig_2               lP1               lP2                MI 
              598              2738              2972              1819 
              MII                mP          Outliers               PTM 
             1602              2254              3310              1279 
               S1               S10               S11                S2 
             2704              2087              2273              2131 
               S3                S4                S5                S6 
             1116              2212              2016              2444 
               S7                S8                S9           Sertoli 
             1086              1113              1

A1: type A1 spermatogonia, In:
intermediate spermatogonia, BS: S phase type B spermatogonia, BG2: G2/M phase type B spermatogonia, G1: G1 phase pre-leptotene SC, epL: early-S
phase pre-leptotene SC, mpL: mid-S phase pre-leptotene SC, lpL: late-S phase pre-leptotene SC, L: leptotene SC, Z: zygotene SC, eP: early-pachytene SC,
mP: mid-pachytene SC, lP: late-pachytene SC, D: diplotene SC, MI: metaphase I, MII: metaphase II, RS1o2: S1–2 spermatids, RS3o4: S3–4 spermatids, RS5o6:
S5-6 spermatids, RS7o8: S7-8 spermatids

In [30]:
row.names(cell_metadata) =  paste(cell_metadata$AnnotatedClusters, "-",row.names(cell_metadata))

In [31]:
row.names(genes) = genes$ID

In [26]:
head(genes)

Unnamed: 0_level_0,ID,Symbol
Unnamed: 0_level_1,<chr>,<chr>
1,ENSMUSG00000102693,4933401J01Rik
2,ENSMUSG00000051951,Xkr4
3,ENSMUSG00000103377,Gm37180
4,ENSMUSG00000104017,Gm37363
5,ENSMUSG00000103025,Gm37686
6,ENSMUSG00000089699,Gm1992


In [33]:
#Add names to raws and colummns 
rownames(raw_counts) <- genes$ID
colnames(raw_counts) <- row.names(cell_metadata)

In [34]:

sce <- SingleCellExperiment(list(counts= raw_counts), colData=cell_metadata,rowData = genes )
sce

class: SingleCellExperiment 
dim: 33226 53510 
metadata(0):
assays(1): counts
rownames(33226): ENSMUSG00000102693 ENSMUSG00000051951 ...
  ENSG00000160307 ENSG00000160310
rowData names(2): ID Symbol
colnames(53510): S9 - 1 S10 - 2 ... S4 - 53509 S11 - 53510
colData names(14): Sample Barcode ... BroadClusters AnnotatedClusters
reducedDimNames(0):
mainExpName: NULL
altExpNames(0):

In [39]:
#Nfix <- sce[ rowData(sce)$Symbol == 'Nfix',c(1:10)]

In [44]:
qcstats <- perCellQCMetrics(sce)


In [48]:
clusters <- quickCluster(sce)
sce <- computeSumFactors(sce, clusters=clusters)
summary(sizeFactors(sce))

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.04222  0.40280  0.77707  1.00000  1.35858 15.88151 

In [49]:
sce <- logNormCounts(sce)

We are intersted in the first wave of spermatognesis and adult spermatogenesis. It starts ~P3 and is completed ~P30-35. We filter P15 and adult spermatogonia Cells.

In [50]:
Spermatogonia = sce[,sce$AnnotatedClusters == 'Spermatogonia']

In [63]:
SpermatogoniaP15 = Spermatogonia[ , Spermatogonia$Sample == 'P15' ]

In [64]:
SpermatogoniaAdult = Spermatogonia[ , Spermatogonia$Sample == 'B6' ]

Save the data in files

In [94]:
save(sce, file="data/ernst/spermatogenesisCellErnst.Rdata")

In [93]:
save(Spermatogonia, file="data/ernst/Spermatogonia.Rdata")
save(SpermatogoniaP15, file="data/ernst/SpermatogoniaP15.Rdata")
save(SpermatogoniaAdult, file="data/ernst/SpermatogoniaAdult.Rdata")

In [70]:
table(Spermatogonia$Sample)


 B6 P10 P15 P20 P25 P30 P35  P5 Tc0 Tc1 
 40 345 820  36  18  18   6 545  28  62 

Filter Spermatocytes

In [96]:
Spermatocytes = sce[,sce$AnnotatedClusters %in% c("eP1","eP2","mP","lP1","lP2","D","MI","MII")]



ERROR: Error: subscript is a logical vector with out-of-bounds TRUE values


In [97]:
SpermatocytesP15 = Spermatocytes[ , Spermatocytes$Sample == 'P15' ]
SpermatocytesAdult = Spermatocytes[ , Spermatocytes$Sample == 'B6' ]

In [98]:
save(Spermatocytes, file="data/ernst/Spermatocytes.Rdata")
save(SpermatocytesP15, file="data/ernst/SpermatocytesP15.Rdata")
save(SpermatocytesAdult, file="data/ernst/SpermatocytesAdult.Rdata")

In [99]:
table(Spermatocytes$AnnotatedClusters)


   D  eP1  eP2  lP1  lP2   MI  MII   mP 
3162  688 1855 2738 2972 1819 1602 2254 

Filter Sertoli cells

In [100]:
Sertoli = sce[,sce$AnnotatedClusters == c("Sertoli")]
table(Sertoli$Sample)


 B6 P10 P15 P20 P25 P30 P35  P5 Tc0 Tc1 
 29 112 377  12  13  34  14 170 127 119 

In [101]:
SertoliP15 = Sertoli[ , Sertoli$Sample == 'P15' ]
SertoliAdult = Sertoli[ , Sertoli$Sample == 'B6' ]

save(Sertoli, file="data/ernst/Sertoli.Rdata")
save(SertoliP15, file="data/ernst/SertoliP15.Rdata")
save(SertoliAdult, file="data/ernst/SSertoliAdult.Rdata")


Filter Leydig cells

In [102]:
Leydig = sce[,sce$AnnotatedClusters %in% c("Leydig_1","Leydig_2")]
table(Leydig$Sample)


 B6 P10 P15 P20 P25 P30 P35  P5 Tc0 Tc1 
 13 516  96   7   6   3   2 641  31 102 

In [103]:
LeydigP15 = Leydig[ , Leydig$Sample == 'P15' ]
LeydigAdult = Leydig[ , Leydig$Sample == 'B6' ]

save(Leydig, file="data/ernst/Leydig.Rdata")
save(LeydigP15, file="data/ernst/LeydigP15.Rdata")
save(LeydigAdult, file="data/ernst/LeydigAdult.Rdata")