# Step 0 - prepare your data
Prepare cellphoneDB inputs starting from a seurat object object

In [1]:

library(Seurat)
library(SeuratObject)
library(Matrix)

Attaching SeuratObject



## 1. Load seurat object
The seurat object contains counts that have been normalized (per cell) and log-transformed. If your data are raw counts, please normalize accordingly

In [2]:
#sceasy::convertFormat("Atlashumanized.h5ad", from="anndata", to="seurat",
#                      outFile='Atlashumanized.Rds')

In [3]:
Atlas = readRDS('Atlashumanized.Rds')

In [3]:
Atlas

An object of class Seurat 
4588 features across 95097 samples within 1 assay 
Active assay: RNA (4588 features, 0 variable features)
 5 dimensional reductions calculated: emb, mde, mde_scanvi, scANVI, scVI

In [4]:
head(rownames(Atlas))

# E10.5

In [5]:
Idents(Atlas) <- Atlas@meta.data$DevTP

In [6]:
so <- subset(Atlas, idents = c('E10.5'))

In [7]:
so <- NormalizeData(object = so)

In [8]:
so

An object of class Seurat 
4588 features across 5826 samples within 1 assay 
Active assay: RNA (4588 features, 0 variable features)
 5 dimensional reductions calculated: mde, mde_scanvi, pca, scANVI, scVI

## 2. Write gene expression in mtx format

In [10]:
# Save normalised counts - NOT scaled!
writeMM(so@assays$RNA@data, file = 'E10.5/matrix.mtx')
# save gene and cell names
write(x = rownames(so@assays$RNA@data), file = "E10.5/features.tsv")
write(x = colnames(so@assays$RNA@data), file = "E10.5/barcodes.tsv")

NULL

## 3. Generate your meta
In this example, our input is an anndata containing the cluster/celltype information in metadat$'cell_type'

The object also has metadat$'lineage' information wich will be used below for a hierarchical DEGs approach.

In [11]:
table(so@meta.data$CellType)


                  Embryonic skin                              TBD 
                             245                                0 
                     LepR+ BMSCs                        Tenocytes 
                               0                                0 
    Proximal limb bud mesenchyme    Pre-hypertrophic chondrocytes 
                            1300                                0 
           Periosteal stem cells           Periosteal progenitors 
                               0                                0 
                Osteoprogenitors                      Osteoclasts 
                               0                                0 
                     Osteoblasts Intermediate limb bud mesenchyme 
                               0                             2078 
                    Immune cells        Hypertrophic chondrocytes 
                               0                                0 
      Distal limb bud mesenchyme         Fast proliferating c

In [12]:
so@meta.data$Cell = rownames(so@meta.data)
df = so@meta.data[, c('Cell','CellType')]
write.table(df, file ='E10.5_meta.tsv', sep = '\t', quote = F, row.names = F)

## 4. Compute DEGs (optional)

Use Seurat `FindAllMarkers` to compute differentially expressed genes and extract the corresponding data frame `DEGs`.
Here there are three options you may be interested on:
1. Identify DEGs for each cell type (compare cell type vs rest, most likely option) 
2. Identify DEGs for each cell type using a per-lineage hierarchycal approach (compare cell type vs rest in the lineage, such as in endometrium paper Garcia-Alonso et al 2021)

In the endometrium paper (Garcia-Alonso et al 2021) we're interested in the differences within the stromal and epithelial lineages, rather than the commonalities (example, what is specific of epithelials in the glands compared to epithelials in the lumen). The reason is that epithelial and stromal subtypes vary in space and type and thus we wanna extract the subtile differences within the lineage to better understand their differential location/ biological role.


In [13]:
Idents(so) <- so$CellType

In [14]:
# OPTION 1 - compute DEGs for all cell types
# Extract DEGs for each cell_type
 DEGs <- FindAllMarkers(so, 
                        test.use = 'LR', 
                        verbose = F, 
                        only.pos = T, 
                        random.seed = 1, 
                        logfc.threshold = 0.2, 
                        min.pct = 0.1, 
                        return.thresh = 0.05)

“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: algorithm did not converge”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“g

In [15]:
 'BMP7' %in% rownames(so@assays$RNA@counts)

In [16]:
DEGs

Unnamed: 0_level_0,p_val,avg_log2FC,pct.1,pct.2,p_val_adj,cluster,gene
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<chr>
WNT6,0.000000e+00,3.622856,0.996,0.030,0.000000e+00,Embryonic skin,WNT6
PDGFA,0.000000e+00,3.603954,0.996,0.079,0.000000e+00,Embryonic skin,PDGFA
PERP,0.000000e+00,3.575402,0.984,0.027,0.000000e+00,Embryonic skin,PERP
WNT4,0.000000e+00,3.555949,0.951,0.026,0.000000e+00,Embryonic skin,WNT4
FERMT1,0.000000e+00,3.428088,0.984,0.019,0.000000e+00,Embryonic skin,FERMT1
EPCAM,7.040435e-321,3.361296,0.996,0.102,3.230152e-317,Embryonic skin,EPCAM
HSPB1,1.125062e-315,3.409250,0.988,0.069,5.161783e-312,Embryonic skin,HSPB1
WNT7B,6.081035e-308,3.161385,0.853,0.009,2.789979e-304,Embryonic skin,WNT7B
BCAM,1.657467e-306,3.307708,0.971,0.046,7.604460e-303,Embryonic skin,BCAM
SFN,4.837702e-299,3.445808,0.935,0.035,2.219538e-295,Embryonic skin,SFN


In [17]:
fDEGs = subset(DEGs, p_val_adj < 0.05 & avg_log2FC > 0.1)

# 1st column = cluster; 2nd column = gene 
fDEGs = fDEGs[, c('cluster', 'gene', 'p_val_adj', 'p_val', 'avg_log2FC', 'pct.1', 'pct.2')] 
write.table(fDEGs, file ='E10.5_DEGs.tsv', sep = '\t', quote = F, row.names = F)

In [18]:
head(fDEGs)

Unnamed: 0_level_0,cluster,gene,p_val_adj,p_val,avg_log2FC,pct.1,pct.2
Unnamed: 0_level_1,<fct>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
WNT6,Embryonic skin,WNT6,0.0,0.0,3.622856,0.996,0.03
PDGFA,Embryonic skin,PDGFA,0.0,0.0,3.603954,0.996,0.079
PERP,Embryonic skin,PERP,0.0,0.0,3.575402,0.984,0.027
WNT4,Embryonic skin,WNT4,0.0,0.0,3.555949,0.951,0.026
FERMT1,Embryonic skin,FERMT1,0.0,0.0,3.428088,0.984,0.019
EPCAM,Embryonic skin,EPCAM,3.230152e-317,7.04e-321,3.361296,0.996,0.102


In [19]:
 'BMP7' %in% rownames(fDEGs)

# E11.5

In [20]:
Idents(Atlas) <- Atlas@meta.data$DevTP

In [21]:
so <- subset(Atlas, idents = c('E11.5'))

In [22]:
so <- NormalizeData(object = so)

In [23]:
so

An object of class Seurat 
4588 features across 8290 samples within 1 assay 
Active assay: RNA (4588 features, 0 variable features)
 5 dimensional reductions calculated: mde, mde_scanvi, pca, scANVI, scVI

## 2. Write gene expression in mtx format

In [24]:
# Save normalised counts - NOT scaled!
writeMM(so@assays$RNA@data, file = 'E11.5/matrix.mtx')
# save gene and cell names
write(x = rownames(so@assays$RNA@data), file = "E11.5/features.tsv")
write(x = colnames(so@assays$RNA@data), file = "E11.5/barcodes.tsv")

NULL

## 3. Generate your meta
In this example, our input is an anndata containing the cluster/celltype information in metadat$'cell_type'

The object also has metadat$'lineage' information wich will be used below for a hierarchical DEGs approach.

In [25]:
table(so@meta.data$CellType)


                  Embryonic skin                              TBD 
                             441                                0 
                     LepR+ BMSCs                        Tenocytes 
                               0                                0 
    Proximal limb bud mesenchyme    Pre-hypertrophic chondrocytes 
                             112                                0 
           Periosteal stem cells           Periosteal progenitors 
                               0                                0 
                Osteoprogenitors                      Osteoclasts 
                               0                                0 
                     Osteoblasts Intermediate limb bud mesenchyme 
                               0                             1544 
                    Immune cells        Hypertrophic chondrocytes 
                              73                                0 
      Distal limb bud mesenchyme         Fast proliferating c

In [26]:
so@meta.data$Cell = rownames(so@meta.data)
df = so@meta.data[, c('Cell','CellType')]
write.table(df, file ='E11.5_meta.tsv', sep = '\t', quote = F, row.names = F)

## 4. Compute DEGs (optional)

Use Seurat `FindAllMarkers` to compute differentially expressed genes and extract the corresponding data frame `DEGs`.
Here there are three options you may be interested on:
1. Identify DEGs for each cell type (compare cell type vs rest, most likely option) 
2. Identify DEGs for each cell type using a per-lineage hierarchycal approach (compare cell type vs rest in the lineage, such as in endometrium paper Garcia-Alonso et al 2021)

In the endometrium paper (Garcia-Alonso et al 2021) we're interested in the differences within the stromal and epithelial lineages, rather than the commonalities (example, what is specific of epithelials in the glands compared to epithelials in the lumen). The reason is that epithelial and stromal subtypes vary in space and type and thus we wanna extract the subtile differences within the lineage to better understand their differential location/ biological role.


In [27]:
Idents(so) <- so$CellType

In [28]:
# OPTION 1 - compute DEGs for all cell types
# Extract DEGs for each cell_type
 DEGs <- FindAllMarkers(so, 
                        test.use = 'LR', 
                        verbose = F, 
                        only.pos = T, 
                        random.seed = 1, 
                        logfc.threshold = 0.2, 
                        min.pct = 0.1, 
                        return.thresh = 0.05)

“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerical

In [29]:
 'BMP7' %in% rownames(so@assays$RNA@counts)

In [30]:
fDEGs = subset(DEGs, p_val_adj < 0.05 & avg_log2FC > 0.1)

# 1st column = cluster; 2nd column = gene 
fDEGs = fDEGs[, c('cluster', 'gene', 'p_val_adj', 'p_val', 'avg_log2FC', 'pct.1', 'pct.2')] 
write.table(fDEGs, file ='E11.5_DEGs.tsv', sep = '\t', quote = F, row.names = F)

In [31]:
head(fDEGs)

Unnamed: 0_level_0,cluster,gene,p_val_adj,p_val,avg_log2FC,pct.1,pct.2
Unnamed: 0_level_1,<fct>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
SFN,Embryonic skin,SFN,0,0,4.13444,0.971,0.029
PERP,Embryonic skin,PERP,0,0,4.102231,0.993,0.032
WNT6,Embryonic skin,WNT6,0,0,4.015696,0.964,0.024
KRT14,Embryonic skin,KRT14,0,0,3.983657,0.916,0.035
KRT5,Embryonic skin,KRT5,0,0,3.937162,0.902,0.024
CLDN4,Embryonic skin,CLDN4,0,0,3.932182,0.778,0.012


# E12.5

In [36]:
Idents(Atlas) <- Atlas@meta.data$DevTP

In [37]:
so <- subset(Atlas, idents = c('E12.5'))

In [38]:
so <- NormalizeData(object = so)

In [39]:
so

An object of class Seurat 
4588 features across 10080 samples within 1 assay 
Active assay: RNA (4588 features, 0 variable features)
 5 dimensional reductions calculated: mde, mde_scanvi, pca, scANVI, scVI

## 2. Write gene expression in mtx format

In [40]:
# Save normalised counts - NOT scaled!
writeMM(so@assays$RNA@data, file = 'E12.5/matrix.mtx')
# save gene and cell names
write(x = rownames(so@assays$RNA@data), file = "E12.5/features.tsv")
write(x = colnames(so@assays$RNA@data), file = "E12.5/barcodes.tsv")

NULL

## 3. Generate your meta
In this example, our input is an anndata containing the cluster/celltype information in metadat$'cell_type'

The object also has metadat$'lineage' information wich will be used below for a hierarchical DEGs approach.

In [41]:
table(so@meta.data$CellType)


                  Embryonic skin                              TBD 
                             235                               14 
                     LepR+ BMSCs                        Tenocytes 
                               0                              303 
    Proximal limb bud mesenchyme    Pre-hypertrophic chondrocytes 
                             495                               49 
           Periosteal stem cells           Periosteal progenitors 
                               0                                0 
                Osteoprogenitors                      Osteoclasts 
                               0                                0 
                     Osteoblasts Intermediate limb bud mesenchyme 
                               0                             1319 
                    Immune cells        Hypertrophic chondrocytes 
                              25                                0 
      Distal limb bud mesenchyme         Fast proliferating c

In [42]:
so@meta.data$Cell = rownames(so@meta.data)
df = so@meta.data[, c('Cell','CellType')]
write.table(df, file ='E12.5_meta.tsv', sep = '\t', quote = F, row.names = F)

## 4. Compute DEGs (optional)

Use Seurat `FindAllMarkers` to compute differentially expressed genes and extract the corresponding data frame `DEGs`.
Here there are three options you may be interested on:
1. Identify DEGs for each cell type (compare cell type vs rest, most likely option) 
2. Identify DEGs for each cell type using a per-lineage hierarchycal approach (compare cell type vs rest in the lineage, such as in endometrium paper Garcia-Alonso et al 2021)

In the endometrium paper (Garcia-Alonso et al 2021) we're interested in the differences within the stromal and epithelial lineages, rather than the commonalities (example, what is specific of epithelials in the glands compared to epithelials in the lumen). The reason is that epithelial and stromal subtypes vary in space and type and thus we wanna extract the subtile differences within the lineage to better understand their differential location/ biological role.


In [43]:
Idents(so) <- so$CellType

In [44]:
# OPTION 1 - compute DEGs for all cell types
# Extract DEGs for each cell_type
 DEGs <- FindAllMarkers(so, 
                        test.use = 'LR', 
                        verbose = F, 
                        only.pos = T, 
                        random.seed = 1, 
                        logfc.threshold = 0.2, 
                        min.pct = 0.1, 
                        return.thresh = 0.05)

“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”
“glm.fit: fitted probabilities numerically 0 or 1 occurred”


In [45]:
 'BMP7' %in% rownames(so@assays$RNA@counts)

In [46]:
fDEGs = subset(DEGs, p_val_adj < 0.05 & avg_log2FC > 0.1)

# 1st column = cluster; 2nd column = gene 
fDEGs = fDEGs[, c('cluster', 'gene', 'p_val_adj', 'p_val', 'avg_log2FC', 'pct.1', 'pct.2')] 
write.table(fDEGs, file ='E12.5_DEGs.tsv', sep = '\t', quote = F, row.names = F)

In [47]:
head(fDEGs)

Unnamed: 0_level_0,cluster,gene,p_val_adj,p_val,avg_log2FC,pct.1,pct.2
Unnamed: 0_level_1,<fct>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
KRT14,Embryonic skin,KRT14,0,0,4.239889,0.97,0.018
KRT5,Embryonic skin,KRT5,0,0,4.059047,0.987,0.013
WNT6,Embryonic skin,WNT6,0,0,3.900353,0.962,0.014
GJB2,Embryonic skin,GJB2,0,0,3.88891,0.872,0.009
PERP,Embryonic skin,PERP,0,0,3.753152,0.979,0.019
TP63,Embryonic skin,TP63,0,0,3.724515,0.945,0.016
