Question about running the scDECAF #2

NianzhenGu · 2024-01-02T14:35:58Z

Hi! I'm working on a single-cell RNA project that compares single-cell transcriptomic data of embryonic and adult mouse colons to identify embryonic-specific gene signatures and use these genes to score colon cancer single-cell data. I find the scDECAF algorithm is suitable for this project.

I have some problems understanding the inputs to the algorithm in the Quick Start part.

For the variable x, can I put the SingleCellExperiment object?
I don't know the meaning of geneset and the HM_geneset. Does the HM_geneset represent the human geneset that can be downloaded online? How about the mouse geneset?

What I have now is the gene signature, a vector of gene ids, like "ENSMUSG00000031957" "ENSMUSG00000069893" "ENSMUSG00000055827"...; the data I want to score: a SingleCellExperiment object; a list of highly variable genes (hvg).

Very much appreciate it if you could give me some instructions! Thanks!

The text was updated successfully, but these errors were encountered:

soroorh · 2024-01-02T14:57:35Z

Hi ! thank you for using the issue tracker!

we only support standard matrices atm, so you'd have to supply the matrix of normalised expression values of hvg genes
HM_geneset, or the input to genesetlist in general in pruneGenesets(), is a list of gene sets i.e. each element of the list is a list of genes (names or ids depends on row names of x). This is the set of all possible gene sets one is interested, and must consist of more than one gene set (say C2 collection from MSigDB). The name was selected based on our application (we actually meant Hallmark gene sets), and it does not imply the species. I encourage you to look up the documentation for any of the functions of interest by installing the packages and running ?functionname

The vector space representation computed by scDECAF requires more than one gene signature. So, i'd recommend you consider adding other gene sets to run the model. For your project, for example, differentially expressed genes in differentially abundant neighbourhoods which you can get from miloDE will provide you with sufficient number of gene sets to use as input to scDECAF.

Link to miloDE https://github.com/MarioniLab/miloDE

Hope this helps.

NianzhenGu · 2024-01-02T15:16:34Z

So for example, if I have two gene signatures, s1 = [a, b, c], s2 = [d, e, f]. I can create a geneset like [s1, s2]. Then I run the pruneGenesets() and genesets2ids() before the scDECAF() right?

soroorh · 2024-01-02T15:26:10Z

so, the genesetlist has to be a named list. so i suggest

gslist = list()
gslist[['gs1']] <- s1
gslist[['gs2']] <- s2

As i mentioned, due to nature of the model we generally need larger than 2 gene sets. you got the order of running the functions correctly, but if your full geneset list has less than 10 gene sets, pruning via pruneGenesets () might not be required. Hence why i suggested obtaining additional gene sets from miloDE analysis, for example.

I suggest you also checkout our tutorials from the reproducibility repo
https://github.com/DavisLaboratory/scDECAF-reproducibility/blob/master/kang_pbmc/kang_pbmc.ipynb
https://github.com/DavisLaboratory/scDECAF-reproducibility/blob/master/cite_pbmc/TotalVI_scDECAF_analysis-addMilo.ipynb

Hope this helps

NianzhenGu · 2024-01-02T15:40:16Z

Great! Thanks for your suggestion! I will try it in the coming days.

NianzhenGu · 2024-01-05T05:12:51Z

Hi! I still have a problem. The picture shows my command of running the scDECAF. I'm not sure what should I use for the embedding.

The rest data: merged_counts is my original data with rows are genes and columns are samples. The dim of this data is 25904 x 4.

target is the result obtained from genesets2ids() where rows are genes and columns are genesets. The dim of this data is 106 x 8.

hvg_union is the list of hvg, the length is 8698.

For the embedding = reducedDims(tumor_sce)[["UMAP"]], the tumor_sce is the SingleCellExperiment object. I'm not sure whether my inputs are correct for the data I showed above.

Appreciate it if you could find the problem! Thanks!

soroorh · 2024-01-05T05:43:24Z

Hi. so the error is suggesting that dim(embedding)!= dim(merged_counts). Can you pls verify that? also, i suggest you use log normalised gene expression rather than raw counts, i.e. scTransformed data.

For embedding, you can use umap as you're doing here, but can also consider any other embedding (PCA, PHATE etc with > 2 dimensions).

Hope this helps!

NianzhenGu · 2024-01-05T06:22:02Z

The merged_counts is log normalized. I tried the PCA but still got the same error:

soroorh · 2024-01-05T06:43:57Z

ok - thanks. Are the row names set for the embedding matrix? scDECAF at some point matches column names in merged_counts with row names in the embedding. hopefully that fixes?

NianzhenGu · 2024-01-05T06:51:12Z

Sorry, I'm not sure about what you mean. The row name of embedding is the gene name and the column name is PC1, PC2, PC3, PC4. The row name of merged_counts is the gene name with the same order and the column name is four sample names.

soroorh · 2024-01-05T06:55:12Z

ah then i see what's going wrong. the embedding is a cell embedding ie. has dims n_cells x n_D where D is the dimension in the dimension reduction space. Whereas you are providing a gene embedding. Your initial code was correct because you had reducedDims(tumor_sce)[["UMAP"]]. Please just check the row names there and verify nrow(educedDims(tumor_sce)[["UMAP"]]) == ncol(merged_counts).

NianzhenGu · 2024-01-05T07:01:26Z

So the merged_count should be n_gene x n_cells and the embedding should be n_cells x n_D. But the dim(merged_count) will not be equal to dim(embedding)? Also, the column name of merged_count should be the same as the row name of embedding, which is the cell name, right?

soroorh · 2024-01-05T07:04:31Z

correct. dim(embedding)!= dim(merged_counts) is always true and i actually meant the error is suggesting nrow(embedding)!= ncol(merged_counts), or that row names are not set in embedding. Apologies for confusion.

soroorh · 2024-01-05T07:06:09Z

also since you only have 8 gene sets, k should be <8 (you have 10 now). I also updated README with more specifications.

NianzhenGu · 2024-01-05T07:06:48Z

Thanks! Will try it.

Jade0904 · 2024-01-05T14:03:27Z

Hi, I'm NianzhenGu's teammate. I still cannot run scDECAF successfully. Here's the error:

"merged_logcounts" was defined by "merged_logcounts <- logcounts(merged_sce)" and the logcounts assay was generated by "merged_sce <- logNormCounts(merged_sce)".

"target" was defined by "target <- genesets2ids(merged_logcounts, gene_signature)", where "gene_signature" was a list of geneset, as below:

hvg_union was a vector of highly variable genes we chose.

Reduced dimensions were generated by:
merged_sce <- runPCA(merged_sce)
merged_sce <- runUMAP(merged_sce, dimred = "PCA")
I tried both of them (UMAP and PCA) in scDECAF() but it threw the same error.

Do you have any idea about what could possibly be the problems? Thanks a lot :)

soroorh · 2024-01-05T14:07:21Z

Hey :).
Since data is log-transformed, please set standardize=FALSE, as per example code on README. Hope this helps.

Jade0904 · 2024-01-05T14:16:14Z

Hey :). Since data is log-transformed, please set standardize=FALSE, as per example code on README. Hope this helps.

Yes, it worked! Thank you so much for your help!

soroorh · 2024-01-05T14:18:06Z

No worries. please close the issue, if this is done!

soroorh added enhancement New feature or request and removed enhancement New feature or request labels Jan 2, 2024

soroorh mentioned this issue Jan 2, 2024

compatibility with singlecellexperiment and seurat objs #3

Open

soroorh mentioned this issue Jan 4, 2024

Input data columns #4

Closed

NianzhenGu closed this as completed Jan 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about running the scDECAF #2

Question about running the scDECAF #2

NianzhenGu commented Jan 2, 2024

soroorh commented Jan 2, 2024 •

edited

Loading

NianzhenGu commented Jan 2, 2024

soroorh commented Jan 2, 2024 •

edited

Loading

NianzhenGu commented Jan 2, 2024

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024 •

edited

Loading

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024 •

edited

Loading

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024

soroorh commented Jan 5, 2024

NianzhenGu commented Jan 5, 2024

Jade0904 commented Jan 5, 2024

soroorh commented Jan 5, 2024

Jade0904 commented Jan 5, 2024

soroorh commented Jan 5, 2024

Question about running the scDECAF #2

Question about running the scDECAF #2

Comments

NianzhenGu commented Jan 2, 2024

soroorh commented Jan 2, 2024 • edited Loading

NianzhenGu commented Jan 2, 2024

soroorh commented Jan 2, 2024 • edited Loading

NianzhenGu commented Jan 2, 2024

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024 • edited Loading

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024 • edited Loading

NianzhenGu commented Jan 5, 2024

soroorh commented Jan 5, 2024

soroorh commented Jan 5, 2024

NianzhenGu commented Jan 5, 2024

Jade0904 commented Jan 5, 2024

soroorh commented Jan 5, 2024

Jade0904 commented Jan 5, 2024

soroorh commented Jan 5, 2024

soroorh commented Jan 2, 2024 •

edited

Loading

soroorh commented Jan 2, 2024 •

edited

Loading

soroorh commented Jan 5, 2024 •

edited

Loading

soroorh commented Jan 5, 2024 •

edited

Loading