Skip to content

Clustering. Worked examples and exercises

Francisco García edited this page Jan 21, 2015 · 1 revision

Worked Examples

Example 1. Fibroblasts K-means clustering

  • Go to the Babelomics page and select the Clustering option from the //Expression// menu.
  • Press //Online Examples//, select the example number 1 and you will see how the parameters and form fields are now filled. As you can notice, this example is prepared to perform a clustering analysis on genes(rows) and conditions(columns) using the K-means algorithm with 5 sample-clusters and 15 gene-clusters. Here, the selected distance is Euclidean (square).
  • Press run, and wait for your job to be finished.
  • When the process finishes, a new //green job// is shown at the right side of the web page. Press it to check your results.

** Questions **

These are some questions that you should be able to answer about the previous example:

  • Do you think that the clustering was able to differentiate any group of coexpressed genes?
  • How many sample clusters are there? and gene clusters?

Launch online examples number 3 (Fibroblasts SOTA clustering) and number 4 (Fibroblasts UPGMA clustering). Compare the results.

  • Do you obtain the same result?
  • Which are the differences between the results of these three examples?
  • Why are they different?

Example 2. Rheumatoid SOTA clustering

  • Go to the Babelomics page and select the Clustering option from the //Expression// menu.
  • Press //Online Examples//, select the example number 2 and you will see how the parameters and form fields are now filled. As you can notice, this example is prepared to perform a clustering analysis on genes(rows) and conditions(columns) using the SOTA algorithm and Euclidean (square) distance.
  • Press run, and wait for your job to be finished.
  • When the process finishes, a new //green job// is shown at the right side of the web page. Press it to check your results.

** Questions **

These are some questions that you should be able to answer about the previous example:

  • Do you think that the clustering was able to differentiate any group of coexpressed genes?
  • How many groups/clusters?
  • What is your answer based on?
  • Do your selected groups represent different functional classes?

Try to use other distance and clustering methods by selecting different options from the Babelomics interface. Compare the results.

  • Do you obtain the same result?
  • Which is the main difference between the hierarchical and non-hierarchical results?
  • Does the distance method affect to your results?

Exercises

Exercise 1. Random dataset

Download this {{example_data:clustering:random_array.txt|random dataset}} and perform a clustering analysis.

  • What would we obtain for an analysis of data with no structure?
  • Do you obtain a result?
  • What can you say about this result?

Exercise 2. Response of human fibroblasts to serum

Download {{example_data:clustering:fibro.txt|this dataset}} and perform a clustering analysis.

This dataset was explored in detail in http://genome-www.stanford.edu/serum/clusters.html. A functional validation was made on detected clusters. You can perform the same clustering analysis and take a look from the biological interpretation that was made of the different clusters.

Select a cluster and click on the highlighted region. Continue your analysis sending it to enrichment_analysis to compare them against the rest of genome.

  • Do you obtain any interesting results? If not, you can try with another cluster of genes.

Exercise 3. Zebrafish embryogenesis data

Download {{:example_data:clustering:zebrafish_embryo.txt|this file}} and perform a hierarchical clustering analysis of its genes. This example file contains the first 999 genes of the 3,657 genes that showed significant levels of differential expression in http://www.plosgenetics.org/article/info%3Adoi%2F10.1371%2Fjournal.pgen.0010029.

  • Do you see any patterns of gene expression between different developmental stages?
  • Are gene clusters of different developmental stages functionally enriched?

Exercise 4. Golub data

  • Run a differential expression analysis with the Golub data (train dataset from Predictor exercise predictors): \ {{:images:clustering:diff_expr_golub.jpg|}}

\

  • Then, redirect the differentially expressed genes to clustering: \ {{:images:clustering:golub_redirect_clustering.jpg|}}

  • Do the samples clusters make sense? are samples of the some conditions clustered together? Does the genes cluster make sense? Are functionally related? \ \

Clone this wiki locally
You can’t perform that action at this time.