### Exercise 1

The following cell sets up some functions and pulls in data for the first few exercises.  In the first exercise we'll explore the interaction of different dimensionality reduction strategies with k-means.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the MNIST digit data
digits = datasets.load_digits()
X = digits.data
y = digits.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Visualization function
def visualize_clusters(X_2D, labels, title=""):
    plt.figure(figsize=(10, 6))
    scatter = plt.scatter(X_2D[:, 0], X_2D[:, 1], c=labels, cmap='tab10', s=50, alpha=0.6, edgecolors='w')
    plt.title(title)
    plt.xlabel(labels[0])
    plt.ylabel(labels[1])
    plt.colorbar(scatter)
    plt.show()



#### Step 1: Helper function for k-means evaluation

Build a function that takes some data, a min k-value, a max k value, and a title that plots three graphs showing davies-bouldin, silhouette, and inertia (elbow) for k-means. Once you're done, run the function on the raw data to establish a baseline.

#### Step 2: Build a helper function to examine different numbers of dimensions

Build a function to that takes a set of named transformers (in the form of an array of tuples) and then uses the previous function to evaluate k-means clustering for each.  The system should generate a row of plots for each of the transformers used.  


#### Step 3: Build a simple iterator to construct the list of transformers to test

Rather than write a bunch of stuff by hand, build a simple loop that constructs 9 different transformers - 3 each for PCA, UMAP, and t-SNE - at three levels of components.  e.g., try 2, 5, and 10 dimensions. This should generate the array of transformers required by the above.

Loop through this array to evaluate the different transformers and parameters - this should give you a 9 x 3 matrix of images you can use to make decisions about which transformer method you want to use.

#### Step 4: Visualize your results

Build a function that allows you to pass in a transformer class, a k-value, and a set of named arguments, ground truth labels, and inferred labels and visually compare ground truth to clustered data.  That is, the function will run clustering with projection method and parameters specified.  Then it will display two plots side by side using the projection method specified (PCA, UMAP, TSNE) to see how ground truth corresponds to the inferred clusters.  Use the function to inspect your "best" methods from the above.

To make things easier, you can use the following function to build your transformer with arbitrary parameters:

In [15]:
def new_transformer(cls, **kwargs):
    """
    Given a class (e.g. PCA) and kwargs, return a new instance.
    Example:
        t = new_transformer(PCA, n_components=2)
    """
    return cls(**kwargs)

### Exercise 2

Using any one of the projected versions of MNIST handwriting data above, use the HAC method with different linkages and different cutoff levels to understand how the identified clusters vary over both of these hyper-parameters.  How does HAC compare with K-Means?  Which linkage do you prefer and why? 