Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maxGSSize in enrichGO? #46

Closed
mevers opened this issue Mar 30, 2016 · 2 comments
Closed

maxGSSize in enrichGO? #46

mevers opened this issue Mar 30, 2016 · 2 comments

Comments

@mevers
Copy link

mevers commented Mar 30, 2016

Dear Guangchuang.

I've got another question/request, this time concerning the return output of enrichGO. Is there a simple way to filter entries based on an upper (max) bound on the number of genes associated with a particular term? Something like an upper bound equivalent to minGSSize?

The reason being that I always end up with e.g. a (trivial) 100% overlap with genes from term "biological process" if I run a BP GO enrichment analysis. I can filter out these events if I do a summary(...), but it would be nice to do this directly using the enrichResult return object, so I can use the filtered results for visualisation.

Thanks,
Maurits

@GuangchuangYu
Copy link
Member

In devel branch, I had added maxGSSize=500 for both hypergeometric test and GSEA, so the annoying biological process term won't appear again.

For GO, you can use simplify to remove redundant terms and dropGO to remove user specific terms or terms at specific level.

You are not the only one that want to restrict enriched result by gene set size. Although we can restrict the size by minGSSize and maxGSSize, it would be great if we can filter the result instead of re-run the analysis.

I have implemented a function gsfilter in DOSE that can do this task. By default, it filter the gene set size. You can also filter the result by the Count column (gene count) by specify by='Count'.

> data(geneList)
> de=names(geneList)[1:100]
> x=enrichDO(de)
> x
#
# over-representation test
#
#...@organism    Homo sapiens
#...@ontology    DO
#...@keytype     ENTREZID
#...@gene    chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" ...
#...pvalues adjusted by 'BH' with cutoff <0.05
#...28 enriched terms found
'data.frame':   28 obs. of  9 variables:
 $ ID         : chr  "DOID:0060071" "DOID:5295" "DOID:8719" "DOID:3007" ...
 $ Description: chr  "pre-malignant neoplasm" "intestinal disease" "in situ carcinoma" "breast ductal carcinoma" ...
 $ GeneRatio  : chr  "5/77" "9/77" "4/77" "4/77" ...
 $ BgRatio    : chr  "22/8007" "157/8007" "18/8007" "29/8007" ...
 $ pvalue     : num  1.67e-06 1.76e-05 2.18e-05 1.56e-04 2.08e-04 ...
 $ p.adjust   : num  0.000632 0.002752 0.002752 0.013426 0.013426 ...
 $ qvalue     : num  0.000456 0.001985 0.001985 0.009684 0.009684 ...
 $ geneID     : chr  "6280/6278/10232/332/4321" "4312/6279/3627/10563/4283/890/366/4902/3620" "6280/6278/10232/332" "6280/6279/4751/6286" ...
 $ Count      : int  5 9 4 4 13 6 13 5 5 6 ...
#...Citation
  Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
  R/Bioconductor package for Disease Ontology Semantic and Enrichment
  analysis. Bioinformatics 2015 31(4):608-609

> gsfilter(x, min=150, max=170)
#
# over-representation test
#
#...@organism    Homo sapiens
#...@ontology    DO
#...@keytype     ENTREZID
#...@gene    chr [1:100] "4312" "8318" "10874" "55143" "55388" "991" ...
#...pvalues adjusted by 'BH' with cutoff <0.05
#...4 enriched terms found
'data.frame':   4 obs. of  9 variables:
 $ ID         : chr  "DOID:5295" "DOID:3082" "DOID:3310" "DOID:1176"
 $ Description: chr  "intestinal disease" "interstitial lung disease" "atopic dermatitis" "bronchial disease"
 $ GeneRatio  : chr  "9/77" "7/77" "6/77" "6/77"
 $ BgRatio    : chr  "157/8007" "170/8007" "152/8007" "153/8007"
 $ pvalue     : num  1.76e-05 1.18e-03 3.32e-03 3.43e-03
 $ p.adjust   : num  0.00275 0.03099 0.04694 0.04694
 $ qvalue     : num  0.00198 0.02235 0.03385 0.03385
 $ geneID     : chr  "4312/6279/3627/10563/4283/890/366/4902/3620" "4312/6280/3627/27299/6362/81930/4321" "597/6278/3627/820/4283/6362" "4312/6280/3627/820/6362/3620"
 $ Count      : int  9 7 6 6
#...Citation
  Guangchuang Yu, Li-Gen Wang, Guang-Rong Yan, Qing-Yu He. DOSE: an
  R/Bioconductor package for Disease Ontology Semantic and Enrichment
  analysis. Bioinformatics 2015 31(4):608-609

>

Before filter:

#...28 enriched terms found

After filter:

#...4 enriched terms found

@mevers
Copy link
Author

mevers commented Mar 31, 2016

That's great; exactly what I was looking for. Thanks again for the fast help.

Maurits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants