Skip to content
GOMCL: a Python tool for Gene Ontology gene sets clustering
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
scripts
GOMCL-sub.py
GOMCL.py
README.md

README.md

GOMCL

Overview

GOMCL is a tool to cluster and extract summarized associations of Gene Ontology based functions in omics data. It clusters GO terms using MCL based on overlapping ratios, OC (Overlap coefficient) or JC (Jaccard coefficient). The resulting clusters can be further analyzed and separated into sub-clusters using a second script, GOMCL-sub. This tool helps researchers to reduce time spent on manual curation of large lists of GO terms and minimize biases introduced by shared GO terms in data interpretation.


Getting Started

Prerequisites - programs

Prerequisites - input data

Installation

1. Download zip file and install

 wget https://github.com/Guannan-Wang/GOMCL/archive/master.zip
 unzip master.zip
 cd GOMCL-master
 chmod 755 *.py scripts/*.py
 export PATH=/path/to/GOMCL-master:$PATH 

After installation, please check if GOMCL is properly installed by simply typing the following command:

GOMCL.py -h # This will print all options for GOMCL

Ready to run GOMCL

Use examples:

GOMCL:

GOMCL.py OBOfile EnrichedGO -d -gosize 3500 -Ct 0.5 -I 1.5 
GOMCL.py OBOfile EnrichedGO -Ct 0.5 -I 1.5 -hm -nw -d -hg 0 -hgt -ssd 0 

GOMCL-sub:

GOMCL-sub.py OBOfile ClstrGO -C 1 -gosize 2000 -I 1.8 -ssd 0 -hg 0 -hgt -hm -nw # Cluster C1 will be further separated.

Option expalanations:

This can be accessed by -h or --help.

required arguments:

  -OBO		obo file should be provided, e.g. go-basic.obo
  -enGO		Enriched GO input file may be from different GO enrichment analysis tools, currently supported GO enrichment tools are: BiNGO, agriGO, GOrilla, gProfiler

optional arguments:

  -d		Only needed if depth for input GO terms is desired 
  -got		GO enrichment tools used for enrichment test (default: BiNGO), 
  -gosize	Threshold for the size of GO terms, only GO terms below this threshold will be printed out (default: 3000)
  -gotype	Type of GO terms, only GO terms in this or these categories will be printed out 
  -SI		Method to calculate similarity between GO terms, OC (Overlap coefficient) or JC (Jaccard coefficient) (default: OC)
  -Ct		Clustering threshold for the overlapping ratio between two GO terms, any value between 0 and 1 (default: 0.5)
  -I		Inflation value, main handle for cluster granularity, usually chosen somewhere in the range [1.2-5.0] (default: 1.5)
  -Sig		Signifance level (p-value cutoff) used in the enrichment test (default: 0.05)
  -ssd		Only needed if a similarity score distribution is desired for clusters with number of GOs larger than this threshold
  -hg		Only needed if a hierarchy graph is desired for clusters with number of GOs larger than this threshold
  -hgt		Only needed if a tabular output of the GO hierarchy is desired for the clusters specified by option -hg, should always be used with option -hg
  -hm		Only needed if a similarity heatmap is desired
  -nw		Only needed if a similarity-based network is desired

Note:

  1. GOMCL is currently compatible with BiNGO, agriGO, GOrilla, g:Profiler and customized inputs. Support for other tools will be added upon request.
  2. Similarity between GO terms is calculated either as Jaccard Coefficient (JC) or Overlap Coefficient (OC), as described in Merico et al., 2010.
  3. The use of -Ct and -I values heavily depends on the number of input GO terms and how similar they are. It is suggested to try different -Ct and -I values to select the best combination.

Running the test

Will be available soon

You can’t perform that action at this time.