# Homework Assignment: Clustering a Single Atom's Trajectory

## Introduction

In computational biology, clustering is often used to identify distinct protein
conformations from simulation trajectories. Normally, one would first align the
trajectory to remove overall translations and rotations so that only internal
motions are considered. In this assignment, however, you will work with a
simplified scenario: a simulation of a single atom moving in a plane. The 2D
points provided below represent the positions the atom visits over time. Your
goal is to use clustering algorithms from scikit-learn to determine which
regions of the space the atom samples most frequently.

In [None]:
# Part 1: Data Generation

# The points below represent the trajectory of a single atom moving through 2D
# space over time. Each point corresponds to the atom's position at a particular
# time. Colors represent simulation time, ranging from red (0 ns) through white
# (~50 ns) to blue (100 ns). This kind of data is often used in computational
# biology to track atomic or molecular motion. You will apply clustering to
# determine which regions of space the atom tends to sample more frequently.

import numpy as np
import matplotlib.pyplot as plt

# Set a fixed random seed
np.random.seed(42)

# Simulate a continuous 2D random walk representing atomic movement
n_points = 200
steps = np.random.randn(n_points, 2) * 0.2  # small steps for smooth movement
data = np.cumsum(steps, axis=0)  # accumulate steps to simulate trajectory

# Simulated time: 0 ns to 100 ns
times = np.linspace(0, 100, n_points)

# Plot the trajectory with color mapped to simulation time
plt.figure(figsize=(6, 5))
scatter = plt.scatter(data[:, 0], data[:, 1], c=times, cmap='RdBu_r')
plt.title("Simulated Atom Trajectory (2D)")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
cbar = plt.colorbar(scatter, label="Simulation Time (ns)")
plt.show()

In [None]:
# Part 2: K-means Clustering

# Cluster the data using the K-means algorithm with 3 clusters. Complete the
# missing code indicated by `# YOUR CODE HERE`

from sklearn.cluster import KMeans

# Initialize KMeans with 3 clusters
kmeans = # YOUR CODE HERE

# Fit the model and predict cluster labels
labels_kmeans = # YOUR CODE HERE

# Plot the K-means clustering results
plt.figure(figsize=(6, 5))
plt.scatter(data[:, 0], data[:, 1], c=labels_kmeans, cmap='viridis')
plt.title("K-means Clustering")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()

In [None]:
# Part 3: Affinity Propagation Clustering

# Use Affinity Propagation with a damping factor of 0.8 and a preference value
# of -50. Complete the code below by filling in the missing part.

from sklearn.cluster import AffinityPropagation

# Initialize AffinityPropagation with damping=0.8 and preference=-50
affprop = # YOUR CODE HERE

# Fit the model and predict cluster labels
labels_affprop = # YOUR CODE HERE

# Plot the Affinity Propagation clustering results
plt.figure(figsize=(6, 5))
plt.scatter(data[:, 0], data[:, 1], c=labels_affprop, cmap='viridis')
plt.title("Affinity Propagation Clustering")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()


In [None]:
# Part 4: Agglomerative Clustering

# Use Agglomerative Clustering to divide the atom's trajectory into 4 clusters.
# Complete the missing part of the code below.

from sklearn.cluster import AgglomerativeClustering

# Initialize Agglomerative Clustering with 4 clusters
agglo = # YOUR CODE HERE

# Fit the model and predict cluster labels
labels_agglo = # YOUR CODE HERE

# Plot the Agglomerative Clustering results
plt.figure(figsize=(6, 5))
plt.scatter(data[:, 0], data[:, 1], c=labels_agglo, cmap='viridis')
plt.title("Agglomerative Clustering")
plt.xlabel("X-coordinate")
plt.ylabel("Y-coordinate")
plt.show()
