-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce redundancy of enriched GO terms #28
Comments
finished in 7e32c7d I will close the issue. Comment is still welcome. |
output from compareCluster is also supported, see983262b. |
see also the blog post |
Hi, I applied simplify() function to compareCluster result, but return an error message: "Error in FUN(X[[i]], ...) : unused argument (organism = "human")". If I used enrichGO result as input, simplify worked. Could you help me solve this problem? Thank you. |
it was fixed in github version which will be released on Wednesday in BioC 3.4. |
Thank you very much. Just one more question: which "semData" should I use for simpifying compareCluster results? If it needs to set manually, what's the default semData of simplify()? |
you need to refer to the GOSemSim vignette. If you use measure = 'Wang', you don't need to pass |
But when I run simplify(x, measure="Wang"), it also return an message: Error in FUN(X[[i]], ...) : argument "semData" is missing, with no default. If I add 'semData = NULL', I got Error in match.arg(ont, c("BP", "CC", "MF")) : 'arg' must be of length 1. Where did I go wrong? |
data(gcSample)
x = compareCluster(gcSample, fun='enrichGO', ont='MF', OrgDb='org.Hs.eg.db')
y <- simplify(x, measure='Wang', semData=NULL) works. pls follow the guide and provide a reproducible example. |
Hi, I applied the simplify() function to reduce redundancy of the enriched GO terms produced by enrichGO. The codes below runs fine without error. But I noticed that several redundant GO terms were not filtered out. The example is below genedata <- read.table("genedata.txt", quote=NULL, header=TRUE, check.names=F, sep="\t") I show the first six GO terms below I compared the similarity of GO terms using GOSemSim and the result is as follows
It looks that one of "GO:0006335" and "GO:0034723" should be filtered but actually not. Can you help to look at this problem? Any advice is very appreciated. Thanks for developing clusterProfiler! Zhuofei Xu |
pls follow https://guangchuangyu.github.io/2016/07/how-to-bug-author/ and provide a reproducible example. |
Hi, |
thanks for sharing this info, Massino! I will try again. |
Doesn't
|
Hi, I am using simplify function, but it comes to errors.
I am sure the input is result from enrichGO. Could you please help me to have a look of this? And also, you mentioned that you prefer using the second criteria which is "REVIGO", but here in simplify, you integrated GOSemSim, right?. Thanks, |
It takes me really long time to run the simplify function, it that normal? |
I have a similar problem with "Lucyyang1991". It takes quite long time to run the simplify function. Is that normal? |
|
same here...also when I don't use "ALL" but like ont="BP", simplify never finishes for me |
the same... ont="BP", simplify never finishes for me |
actually the problem seems to be that now (was there a change?) eg. enrichGO() returns an object that stores in ego@result also all unsignificant results. When I strip this down to the sig.-only results
simplify finishes normally. But I don't know if the result of simplify is still correct with that change. I just also strip down |
@steffenheyne thanks for pointing this out. Yes, the result should be strip down before simplify. Will fix it. |
Thanks Steffen! Your suggestion worked perfect. |
Have you solved this problem? I also encountered such condition. |
@xxz19900 simplify only works within a sub-ontology like BP, MF or CC, not with
|
I am having the following error when trying to simplify Arabidopsis GO BP data:
Any suggestion will be appreciated . |
Hi Dr. Yu (@GuangchuangYu ), I would like to use GOSemSim to select the most informative term when comparing lists of GO IDs using mgoSim(). I am using the mouse genome and calculated IC scores: mmGO <- godata('org.Mm.eg.db', ont='BP'). Because my GO IDs are not from using enrichGO, I am unable to use the simplify function. So I would like to try GOSemSim instead. Thank you! |
I am having the same problem as @kamalmdmostafa, getting the error "unused arguments (cutoff = 0.7, by = "p.adjust", select_fun = min)" even though it seems to work with another set of genes. I don't think it's an issue with my enrich object (ego) as it is still showing the relevant GO Terms.
Error in simplify(EEDDOC_ego, cutoff = 0.7, by = "p.adjust", select_fun = min) : |
@claraina could you solve this issue. Seems like they gave up on this issue. Any other way to solve this? Creating a new issue maybe? |
1 similar comment
Hi, I love the clusterprofiler package and regularly use the gseaGO and gseaKEGG functions to investigate my scRNAseq experiment results. However, using these there is always the problem of having many related terms enriched. Would it be possible to implement a function that only selects the most specific term that is enriched in the result for visualization in the dotplot() function? And/or extend the simplify function to work on gsea objects? Best, Niko |
@sparsepix You could use the |
@huerqiang thanks for the tip. Is there a way to easily compute the terms to select (based on them being either the most specific one in one subtree or selecting only the term with the lowest FDR within the subtree or sth similar?) |
@sparsepix Hi Niko, here is some code that selects the most general term: #372 If you instead compute
you will get the most specific enriched terms. If you are not looking for the most specific terms and just want to deal with the redundancy issue, use the msigdbr package to obtain non-redundant gene sets/pathways. Then, supply them to |
hi @TylerSagendorf |
@deevdevil88 The gene sets/pathways provided by MSigDB on their website and through the msigdbr package are non-redundant. They explain how GO terms are filtered in the v7.0 release notes. I only know for sure that the same type of procedure is applied to Reactome pathways, but they claim that the entire database has had its redundancy decreased (doi: 10.1016/j.cels.2015.12.004). |
@TylerSagendorf awesome ! Thank you for confirming and also including the relevant release notes. I use MSigdb loads but this non-redundancy of GO terms wasn't obvious. Thanks again. |
@deevdevil88 No problem! Glad I could help. |
To simplify the enriched result, we can use
slim
version of GO and use enricher() function to analyze.Another strategy is to use GOSemSim to calculate similarity of GO terms and remove those highly similar terms by keeping one representative term.
The criteria of selecting representative term can be:
GOSemSim
; can be extended to un-supported organisms).I prefer using the second criteria for it's more intuitive and more easy to implement for those not internally supported by
GOSemSim
.I propose to define a function to
simplify
the output fromenrichGO
by removing redundant GO terms.Any comment/suggestion is welcome.
Reference:
The text was updated successfully, but these errors were encountered: