# 📝 Learning goals of practical

- You can describe how to apply hierarchical clustering to a transcriptomics dataset

- You can discuss the goals of unsupervised machine learning when applied to transcriptomics data

In this practical, we will explore RNA-seq data from the study:

*High-Throughput RNA Sequencing of Pseudomonas-Infected Arabidopsis Reveals Hidden Transcriptome Complexity and Novel Splice Variants* by [Howard et al. (2013)](https://doi.org/10.1371/journal.pone.0074183)

### ❓Questions
Have a look at the paper.

- What three treatments are the plants subjected to?

- What is the goal of these three treatments?

# Hierarchical Clustering

Let's perform hierarchical clustering on this dataset.

In [None]:
%pip install -q observable_jupyter==0.1.10 clustergrammer2 fastcluster pyvis

import sys
if "google.colab" in sys.modules:
    %pip install git+https://github.com/CropXR/EduXR.git
    from google.colab import files
else:
    %load_ext autoreload
    %autoreload 2

from clustergrammer2 import net
import seaborn as sns
import matplotlib.pyplot as plt
from observable_jupyter import embed
import pandas as pd
from IPython.display import Javascript

from dsplantbreeding.clustering import plot_correlation_network
from IPython.core.display import display


def resize_colab_cell():
  display(Javascript('google.colab.output.setIframeHeight(0, true, {maxHeight: 5000})'))
get_ipython().events.register('pre_run_cell', resize_colab_cell)

!wget https://raw.githubusercontent.com/CropXR/EduXR/refs/heads/main/data/biotic_transcriptomics.txt
net.load_file('biotic_transcriptomics.txt')
net.cluster(dist_type='correlation', linkage_type='average')

Now we can display a heatmap of the samples and their (normalised) gene expressions:

In [None]:
plt.figure(figsize=(10, 10))
df = pd.read_csv('biotic_transcriptomics.txt', sep='\t', header=[0], index_col=0, skiprows=[1,2])
g = sns.clustermap(df, metric='correlation', method='average', cmap="vlag", vmin=-2, vmax=2)
g.ax_heatmap.set_yticks([])
g.ax_heatmap.set_yticklabels([])
plt.show()

### ❓Questions
- What do the rows and columns represent? What do the colours mean?

- Is the avirulent sample more similar to the virulent sample or the mock sample?

- Does the treatment or the time point play a bigger role for clustering the samples?

- Could you explain the clustering? What does it tell you about the relation between the samples?

We can also show you an interactive plot of this. Adjust the sliders on the right and bottom to find the clustering cutoff. By clicking on the trapezoid that belongs to a cluster you can select the group of genes.

In [None]:
embed('@cornhundred/clustergrammer-gl', cells=['clustergrammer'],  inputs={'network': net.viz})

### ❓Questions
- How many gene clusters do you think is most appropriate for this dataset? Why?

- Change the clustering parameters, what differences do you observe?

- What group(s) of genes would be most interesting to study? Why?


If you have extra time you can further investigate through what biological process the genes you found are important for stress response, for example by looking up information about them in  databases such as [UniProt](https://www.uniprot.org/). Can you link the genes to certain processes? How does that relate to what you find in literature about this response?

# Correlation network

In [None]:
plot_correlation_network(df, threshold=0.9)

### ❓Questions
- What happens as you change the correlation threshold?
- What could be the biological meaning of two nodes being connected in the network?
- Is there a relationship between the hierarchical clustering shown above and the tightly connected nodes you find in this network?

Now explore some node metrics to identify genes that could be worth studying. You can replace `degree` by `betweenness`, or `closeness` to colour nodes based on other properties. 

In [None]:
plot_correlation_network(df, threshold=0.9, interactive=True, colour_by='degree',)

###  ❓Questions
- How would you interpret these metrics biologically?
- When working with a plant biologist, what experiment would you suggest for further study?