-
Notifications
You must be signed in to change notification settings - Fork 5
Workshop
mclust: multi-resolution clustering of omics data
- For additional information and a quick demo, please see the mclust tutorial.
- Installation
m2clust is implemented in python and packaged and available
via PyPi. Run the following command to get it installed (use sudo
to install it for all users or use --user and provide a path with write access)
$ sudo pip3 install m2clust
- Input data
The input data is a distance matrix of feature n*n
where n
is the number of features.
optional input is a metadata table n*m
where
n
is the number of features and m
is the number of metadata
- How to run?
$ m2clust -i synthetic_demo/adist.txt -o demo_output
if metadata is available then use the following command:
$ m2clust -i synthetic_demo/adist.txt -o demo_output --metadata synthetic_demo/metadata.txt --plot
--plot
is optional to generate a heatmap with
deprogram of the data
--metadata
is optional to shape the clusters with
highest influence in clusters.
- output
-
m2clust.txt
contains cluster, their members, and metadata resolution score sorted from highest to lowest score.
- Download the input: Distance matrix and metadata)
- Run m2clust in command line with input
$ m2clust -i synthetic_demo/adist.txt -o demo_output --metadata synthetic_demo/metadata.txt --plot
- Check your output folder
Here we show the PCoA and DMS plot as one the representative visualization of the results.
Please see the wiki for real-world example including: gene expression, microbial species stains, and metabolite profiles.
Please see the Wiki for the data, their description.
- Please submit your questions or issues with the software at Issues tracker.
$ pip install mclust
To test if mclust is running correctly, you may run the following command in the terminal:
#!cmd
mclust_test
Which yields:
test_create_folders (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
test_is_present (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
----------------------------------------------------------------------
Ran 2 tests in 1.01s
OK
#!python
usage: mclust [-h] [-i INPUT] -o OUTPUT [-m SIMILARITY] [--metadata METADATA]
[-n ESTIMATED_NUMBER_OF_CLUSTERS] [--size-to-plot SIZE_TO_PLOT]
[-c LINKAGE_METHOD] [--plot] [--resolution {high,medium,low}]
[-v]
Multi-resolution clustering using hierarchical clustering and Silhouette score.
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
the input file D*N, Rows: D features and columns: N samples OR
a distance matrix file D*D (rows and columns should be the same and in the same order)
-o OUTPUT, --output OUTPUT
the output directory
-m SIMILARITY, --similarity SIMILARITY
similarity measurement {default spearman, options: spearman, nmi, ami, dmic, mic, pearson, dcor}
--metadata METADATA Rows are features and each column is a metadata
-n ESTIMATED_NUMBER_OF_CLUSTERS, --estimated_number_of_clusters ESTIMATED_NUMBER_OF_CLUSTERS
estimated number of clusters
--size-to-plot SIZE_TO_PLOT
Minimum size of cluster to be plotted
-c LINKAGE_METHOD, --linkage_method LINKAGE_METHOD
linkage clustering method method {default = single, options average, complete
--plot dendrogram plus heatmap
--resolution {high,medium,low}
Resolution c . Low resolution is good when clusters are well-separated clusters.
-v, --verbose additional output is printed
-
-i or --input:
a distance matrix. -
--output-folder
: a folder containing all the output files -
--resolution
: a resolution to be used for clustering {low or high}
Returns a list of clusters for features. an example output coming soon
-
Basic usage:
$ mclustviz $DISTANCE_MATRIX.txt /path-to-mclust-output/mclust.txt --metadata $META_DATA.txt --shapeby $METADATA1 -o /path-to-mclust-output/
-
$DISTANCE_MATRIX.txt
= an distance matrix that used for clustering -
mclust.txt
= an mclust output which assigns features to clusters -
$META_DATA.txt
: is metadata file which contains metadat for features -
$META_DATA1
: is a metadata in the metadata file to be used for shaping poins in the ordination plot - Run with
-h
to see additional command line options
Produces a set of ordination plots for features colored by computational clusters and shaped by metadata.
usage: mclustviz [-h] [--metadata METADATA] [--shapeby SHAPEBY] -o OUTPUT
[--size-to-plot SIZE_TO_PLOT]
adist clusters
mclust visualization script.
positional arguments:
adist the input file D*N, Rows: D features and columns: N samples OR
a distance matrix file D*D (rows and columns should be the same and in the same order)
clusters the input file D*N, Rows: D features and columns: N samples OR
a distance matrix file D*D (rows and columns should be the same and in the same order)
optional arguments:
-h, --help show this help message and exit
--metadata METADATA metadata
--shapeby SHAPEBY the input file D*N, Rows: D features and columns: N samples OR
a distance matrix file D*D (rows and columns should be the same and in the same order)
-o OUTPUT, --output OUTPUT
the output directory
--size-to-plot SIZE_TO_PLOT
Minimum size of cluster to be plotted