You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since clusterProfiler uses under the hood fgsea for gene set enrichment analysis, I checked whether the reported issue originates from the way input/output data is being processed by clusterProfiler, or from fgsea. It turns that I could reproduce the issue when directly using fgsea, hence this post.
Please note that the OP reported this issue when using R-4.2.2, but I could reproduce it also with the current versions of R (R-4.3.0 resp. R-4.3.3) and fgsea on both my Windows resp. Linux machines.
Also note that the issue occurs when minSize is set to 10; when minSize=11 is ued fgsea runs as expected...
For your convenience I have attached the 2 input files to this post as RData file (which I compressed into an ZIP archive in order to be able to upload it). See below how these objects were generated, also in case you would like to generate them yourselves.
I would appreciate if you could have a look at this to see whether this can be fixed.
G
> ## load libraries
> library(clusterProfiler)
> library(fgsea)
> library(org.Hs.eg.db)
>
> ## import input genes (human ENSEMBL) and GO-BP gene sets
> load("fgsea.input.Rdata")
>
> ######
> ## if preferred, code to generate input
>
> ## copy/paste list of input genes ('hgene_list') from:
> ## https://github.com/YuLab-SMU/clusterProfiler/issues/659#issuecomment-2027820878
>
>
> ## create GO-based gene sets; limit to BP
> ## 'ont' should either be "BP", "CC", "MF" or all
> library(GO.db)
> ont <- "BP"
>
> goterms <- AnnotationDbi::Ontology(GO.db::GOTERM)
> if (ont != "ALL") {goterms <- goterms[goterms == ont]}
>
> term2gene.go <- AnnotationDbi::mapIds(org.Hs.eg.db,
+ keys=names(goterms),
+ column="ENTREZID",
+ keytype="GOALL",
+ multiVals='list')
'select()' returned 1:many mapping between keys and columns
>
> ## end code to generate input.
> ######
>
> ## manually convert ENSEMBL into ENTREZID using function bitr from clusterProfiler.
> ## when using the function gseGO from clusterProfiler, this is being done on the fly;
> ## see for gseGO function call: https://github.com/YuLab-SMU/clusterProfiler/issues/659#issuecomment-2027820878
>
> ensembl.2.eg <- bitr( names(hgene_list),
+ fromType="ENSEMBL",
+ toType="ENTREZID",
+ OrgDb="org.Hs.eg.db",
+ drop = TRUE)
'select()' returned 1:many mapping between keys and columns
Warning message:
In bitr(names(hgene_list), fromType = "ENSEMBL", toType = "ENTREZID", :
0.05% of input gene IDs are fail to map...
>
>
> input.genes <- hgene_list[ensembl.2.eg$ENSEMBL]
> names(input.genes) <- ensembl.2.eg$ENTREZID
> ## perform GSEA
> ## with minSize = 11; works fine!
>
> system.time({
+
+ res <- fgseaMultilevel(
+ pathways = term2gene.go,
+ stats = input.genes,
+ minSize = 11,
+ maxSize = 500,
+ eps = 0,
+ scoreType = c("std") )
+
+ })
user system elapsed
3.47 0.87 20.19
Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are ties in the preranked stats (2.19% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In fgseaMultilevel(pathways = term2gene.go, stats = input.genes, :
There were 8 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
3: In fgseaMultilevel(pathways = term2gene.go, stats = input.genes, :
For some of the pathways the P-values were likely overestimated. For such pathways log2err is set to NA.
>
> ## perform GSEA
> ## now with minSize = 10; run was aborted after 5 mins since it wasn't finished by then...
>
> system.time({
+
+ res <- fgseaMultilevel(
+ pathways = term2gene.go,
+ stats = input.genes,
+ minSize = 10,
+ maxSize = 500,
+ eps = 0,
+ scoreType = c("std") )
+
+ })
Warning messages:
1: In preparePathwaysAndStats(pathways, stats, minSize, maxSize, gseaParam, :
There are ties in the preranked stats (2.19% of the list).
The order of those tied genes will be arbitrary, which may produce unexpected results.
2: In fgseaMultilevel(pathways = term2gene.go, stats = input.genes, :
There were 4 pathways for which P-values were not calculated properly due to unbalanced (positive and negative) gene-level statistic values. For such pathways pval, padj, NES, log2err are set to NA. You can try to increase the value of the argument nPermSimple (for example set it nPermSimple = 10000)
Timing stopped at: 3.07 0.91 592.6
>
>
To keep you updated: this is turned out to be an issue of the algorithm we were generally aware of, although not in this setting. Anyway we recently developed an approach how to properly fix it. Hopefully we'll be able to integrate the proper fix into fgsea in not so distant future, but also it's not trivial, so I can't make any ETA. As a workaround for now one could add random noise to the input scores, and everything should start working fine:
Hi Alex,
A (reproducible) issue ("GSEA hangs") was posted on the
clusterProfiler
GitHub.See: YuLab-SMU/clusterProfiler#659 (comment), and posts below that one.
Since
clusterProfiler
uses under the hoodfgsea
for gene set enrichment analysis, I checked whether the reported issue originates from the way input/output data is being processed byclusterProfiler
, or fromfgsea
. It turns that I could reproduce the issue when directly usingfgsea
, hence this post.Please note that the OP reported this issue when using
R-4.2.2
, but I could reproduce it also with the current versions of R (R-4.3.0
resp.R-4.3.3
) andfgsea
on both my Windows resp. Linux machines.Also note that the issue occurs when
minSize
is set to 10; whenminSize=11
is uedfgsea
runs as expected...For your convenience I have attached the 2 input files to this post as
RData
file (which I compressed into an ZIP archive in order to be able to upload it). See below how these objects were generated, also in case you would like to generate them yourselves.I would appreciate if you could have a look at this to see whether this can be fixed.
G
sessionInfo()
Windows machine:sessionInfo()
Linux machine:fgsea.input.zip
The text was updated successfully, but these errors were encountered: