# Visualization with hierarchical clustering and t-SNE

In this chapter, you'll learn about two unsupervised learning techniques for data visualization, hierarchical clustering and t-SNE. Hierarchical clustering merges the data samples into ever-coarser clusters, yielding a tree visualization of the resulting cluster hierarchy. t-SNE maps the data samples into 2d space so that the proximity of the samples to one another can be visualized.

# (1) Visualizing hierarchies

## Visualizations communicate insight
- "t-SNE": Creates a 2D map of a dataset (later)
- "Hierarchical clustering" (this video)

## A hierarchy of groups
- Groups of living things can form a hierarchy
- Clusters are contained in one another

<p align='center'>
    <img src='image/Screenshot 2021-02-18 235251.png'>
</p>

## Eurovision scoring dataset
- Countries gave scores to songs performed at the Eurovision 2016
- 2D array of scores
- Rows are countries, columns are songs

<p align='center'>
    <img src='image/Screenshot 2021-02-18 235505.png'>
</p>

## Hierarchical clustering of voting countries
<p align='center'>
    <img src='image/Screenshot 2021-02-18 235628.png'>
</p>

## Hierarchical clustering
- Every country begins in a separate cluster
- At each step, the two closet clusters are merged
- Continue until all countries in a single cluster
- This is "agglomerative" hierarchical clustering

## The dendrogram of a hierarchical clustering
- Read from the bottom up
- Vertical lines represent clusters

<p align='center'>
    <img src='image/Screenshot 2021-02-19 000009.png'>
</p>

## Hierarchical clustering with SciPy
- Given `samples` (the array of scores), and `country_names`

In [None]:
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram
mergings = linkage(samples, method='complete')
dendrogram(mergins, 
            labels=country_names, 
            leaf_rotation=90,
            leaf_font_size=6)
plt.show()

# Exercise I: How many merges?

If there are 5 data samples, how many merge operations will occur in a hierarchical clustering? (To help answer this question, think back to the video, in which Ben walked through an example of hierarchical clustering using 6 countries.)

### Possible Answers
- 4 merges (T)
- 3 merges
- This can't be known in advance

# Exercise II: Hierarchical clustering of the grain data

In the video, you learned that the SciPy `linkage()` function performs hierarchical clustering on an array of samples. Use the `linkage()` function to obtain a hierarchical clustering of the grain samples, and use `dendrogram()` to visualize the result. A sample of the grain measurements is provided in the array `samples`, while the variety of each grain sample is given by the list `varieties`.

### Instructions

- Import:
    - `linkage` and `dendrogram` from `scipy.cluster.hierarchy`.
    - matplotlib.pyplot as plt.
- Perform hierarchical clustering on `samples` using the `linkage()` function with the `method='complete'` keyword argument. Assign the result to `mergings`.
- Plot a dendrogram using the `dendrogram()` function on `mergings`. Specify the keyword arguments `labels=varieties`, `leaf_rotation=90`, and `leaf_font_size=6`.


In [None]:
# Perform the necessary imports
from scipy.cluster.hierarchy import linkage, dendrogram
import matplotlib.pyplot as plt

# Calculate the linkage: mergings
mergings = linkage(samples, method='complete')

# Plot the dendrogram, using varieties as labels
dendrogram(mergings,
           labels=varieties,
           leaf_rotation=90,
           leaf_font_size=6,
)
plt.show()


## Plot

<p align='center'>
    <img src='image/[2021-02-19] 013215.svg'>
</p>

# Exercise III: Hierarchies of stocks

In chapter 1, you used k-means clustering to cluster companies according to their stock price movements. Now, you'll perform hierarchical clustering of the companies. You are given a NumPy array of price movements `movements`, where the rows correspond to companies, and a list of the company names `companies`. SciPy hierarchical clustering doesn't fit into a sklearn pipeline, so you'll need to use the `normalize()` function from `sklearn.preprocessing` instead of `Normalizer`.

`linkage` and `dendrogram` have already been imported from `scipy.cluster.hierarchy`, and PyPlot has been imported as `plt`.

### Instructions

- Import `normalize` from `sklearn.preprocessing`.
- Rescale the price movements for each stock by using the `normalize()` function on `movements`.
- Apply the `linkage()` function to `normalized_movements`, using `'complete'` linkage, to calculate the hierarchical clustering. Assign the result to `mergings`.
- Plot a dendrogram of the hierarchical clustering, using the list `companies` of company names as the `labels`. In addition, specify the `leaf_rotation=90`, and `leaf_font_size=6` keyword arguments as you did in the previous exercise.


In [None]:
# Import normalize
from sklearn.preprocessing import normalize

# Normalize the movements: normalized_movements
normalized_movements = normalize(movements)

# Calculate the linkage: mergings
mergings = linkage(normalized_movements, method='complete')

# Plot the dendrogram
dendrogram(mergings, labels=companies, leaf_rotation=90, leaf_font_size=6)
plt.show()


## Plot

<p align='center'>
    <img src='image/[2021-02-19] 014008.svg'>
</p>