### Exercise 1

The following example sets up a clustering problem with the MNIST handwriting data. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.manifold import TSNE
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Load the MNIST digit data
digits = datasets.load_digits()
X = digits.data
y = digits.target

# Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Use t-SNE for dimensionality reduction
tsne = TSNE(n_components=2, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)

# Visualization function
def visualize_clusters(X_2D, labels, title=""):
    plt.figure(figsize=(10, 6))
    scatter = plt.scatter(X_2D[:, 0], X_2D[:, 1], c=labels, cmap='tab10', s=50, alpha=0.6, edgecolors='w')
    plt.title(title)
    plt.xlabel('t-SNE feature 1')
    plt.ylabel('t-SNE feature 2')
    plt.colorbar(scatter)
    plt.show()

# Visualize ground truth
visualize_clusters(X_tsne, y, title="Ground Truth (MNIST Digits)")

# Apply KMeans clustering
kmeans = KMeans(n_clusters=10, random_state=42)
kmeans_labels = kmeans.fit_predict(X_scaled)

# Visualize KMeans clustering results
visualize_clusters(X_tsne, kmeans_labels, title="KMeans Clustering on MNIST Digits")


#### Step 1: Select K

Use the elbow method, davies-bouldin, and silouhette methods to pick the best value of k.  What do you get?

#### Step 2: Pick the best method

Pick the best method you can to resolve the clusters.  Which one do you prefer?

### Exercise 2

There are four data samples in the "data" folder.  For each one, load it in and try to find a clustering method which is able to identify the labeled clusters in each case.  The last data file (`clustering_4.csv`) might benefit from dimensionality reduction!  I've included code to evaluate your clustering.

In [2]:
def evaluate_clustering(y_true, y_pred):
    """
    Evaluates clustering performance using multiple metrics.
    Returns a dictionary of scores.
    """
    scores = {
        'ari': adjusted_rand_score(y_true, y_pred),
        'silhouette': silhouette_score(X, y_pred) if len(np.unique(y_pred)) > 1 else -1
    }
    return scores