# K-Means Clustering Notebook

This notebook explains and demonstrates K-Means clustering in simple terms. We'll learn how computers can automatically group similar things together, with beautiful animations to help you visualize each step.

## Step 1: Importing Libraries

First, we import the necessary Python libraries. These help us work with data, create visualizations, and make beautiful animations.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from manim import *
import random
from scipy.spatial.distance import cdist

## Step 2: Creating Sample Data

Let's create some example data to work with. We'll generate customer data with age and annual income, which we want to group into similar customer segments.

In [None]:
# Set random seed for reproducible results
np.random.seed(42)

# Create sample customer data
n_customers = 150

# Generate data with natural clusters
X, y_true = make_blobs(n_samples=n_customers, centers=4, cluster_std=1.5, 
                       center_box=(20.0, 80.0), random_state=42)

# Convert to meaningful customer data
ages = X[:, 0]
incomes = X[:, 1]

# Create DataFrame
customers_df = pd.DataFrame({
    'Customer_ID': range(1, n_customers + 1),
    'Age': ages,
    'Annual_Income': incomes
})

print(f"Created {n_customers} customers with age and income data")
print(customers_df.head())

## Step 3: Visualizing the Data

Before we start clustering, let's look at our customer data. We'll create a scatter plot to see how customers are distributed.

In [None]:
# Create initial scatter plot
fig = px.scatter(customers_df, x='Age', y='Annual_Income', 
                title='Customer Data - Age vs Annual Income',
                labels={'Age': 'Age (years)', 'Annual_Income': 'Annual Income ($000)'})

fig.update_traces(marker=dict(size=8, color='lightblue', 
                             line=dict(width=1, color='darkblue')))

fig.show()

## Step 4: Understanding K-Means Algorithm

K-Means clustering works by:
1. Choosing the number of clusters (K)
2. Placing K cluster centers randomly
3. Assigning each point to the nearest cluster center
4. Moving cluster centers to the middle of their assigned points
5. Repeating steps 3-4 until the centers stop moving

Let's implement this step by step!

In [None]:
# Manual K-Means implementation for educational purposes
def manual_kmeans(X, k, max_iters=100):
    """
    Manual K-Means implementation to show each step
    """
    # Step 1: Initialize centroids randomly
    centroids = X[np.random.choice(X.shape[0], k, replace=False)]
    
    history = {'centroids': [centroids.copy()], 'labels': []}
    
    for iteration in range(max_iters):
        # Step 2: Assign points to nearest centroid
        distances = cdist(X, centroids)
        labels = np.argmin(distances, axis=1)
        history['labels'].append(labels.copy())
        
        # Step 3: Update centroids
        new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])
        
        # Check for convergence
        if np.allclose(centroids, new_centroids):
            print(f"Converged after {iteration + 1} iterations")
            break
            
        centroids = new_centroids
        history['centroids'].append(centroids.copy())
    
    return labels, centroids, history

# Run K-Means with k=4
k = 4
data_points = customers_df[['Age', 'Annual_Income']].values
final_labels, final_centroids, clustering_history = manual_kmeans(data_points, k)

print(f"K-Means completed with {len(clustering_history['centroids'])} iterations")

## Step 5: Visualizing the Results

Now let's see how our customers have been grouped into clusters. Each color represents a different customer segment.

In [None]:
# Add cluster labels to our dataframe
customers_df['Cluster'] = final_labels

# Create clustered scatter plot
colors = ['red', 'blue', 'green', 'orange', 'purple']
cluster_names = [f'Cluster {i+1}' for i in range(k)]

fig = px.scatter(customers_df, x='Age', y='Annual_Income', color='Cluster',
                title='Customer Segments after K-Means Clustering',
                labels={'Age': 'Age (years)', 'Annual_Income': 'Annual Income ($000)'},
                color_discrete_sequence=colors[:k])

# Add centroids
for i, centroid in enumerate(final_centroids):
    fig.add_trace(go.Scatter(x=[centroid[0]], y=[centroid[1]], 
                            mode='markers', 
                            marker=dict(size=15, color='black', symbol='x'),
                            name=f'Centroid {i+1}',
                            showlegend=True))

fig.show()

# Print cluster summary
print("\nCluster Summary:")
for i in range(k):
    cluster_data = customers_df[customers_df['Cluster'] == i]
    avg_age = cluster_data['Age'].mean()
    avg_income = cluster_data['Annual_Income'].mean()
    count = len(cluster_data)
    print(f"Cluster {i+1}: {count} customers, Avg Age: {avg_age:.1f}, Avg Income: ${avg_income:.1f}k")

## Step 6: Manim Animations

Now let's create beautiful animations to show how K-Means clustering works step by step!

### Animation 1: What is Clustering?

In [None]:
%%manim -qm -v WARNING ClusteringIntro

class ClusteringIntro(Scene):
    def construct(self):
        # Title
        title = Text("What is Clustering?", font_size=48, color=BLUE)
        self.play(Write(title))
        self.wait(1)
        self.play(title.animate.to_edge(UP))
        
        # Create scattered points representing customers
        np.random.seed(42)
        points = []
        colors = [RED, BLUE, GREEN, YELLOW]
        
        # Create groups of points
        centers = [[-2, 2], [2, 2], [-2, -2], [2, -1]]
        
        for center in centers:
            for _ in range(8):
                x = center[0] + np.random.normal(0, 0.5)
                y = center[1] + np.random.normal(0, 0.5)
                point = Dot([x, y, 0], color=WHITE, radius=0.08)
                points.append(point)
        
        # Show scattered points
        explanation1 = Text(
            "Imagine we have many customers with different ages and incomes...",
            font_size=24, color=WHITE
        ).to_edge(DOWN)
        
        self.play(Write(explanation1))
        
        for point in points:
            self.add(point)
        
        self.wait(2)
        
        # Group them by color
        explanation2 = Text(
            "Clustering helps us group similar customers together!",
            font_size=24, color=WHITE
        ).to_edge(DOWN)
        
        self.play(ReplacementTransform(explanation1, explanation2))
        
        # Color the points by groups
        for i, center in enumerate(centers):
            group_points = []
            for point in points:
                px, py = point.get_center()[:2]
                if abs(px - center[0]) < 1.5 and abs(py - center[1]) < 1.5:
                    group_points.append(point)
            
            self.play(
                *[point.animate.set_color(colors[i]) for point in group_points],
                run_time=0.8
            )
        
        # Add group labels
        labels = ["Young & Low Income", "Young & High Income", 
                 "Older & Low Income", "Older & High Income"]
        
        for i, (center, label) in enumerate(zip(centers, labels)):
            label_text = Text(label, font_size=16, color=colors[i])
            label_text.move_to([center[0], center[1] - 1, 0])
            self.play(Write(label_text), run_time=0.5)
        
        self.wait(3)

### Animation 2: Understanding Distance

In [None]:
%%manim -qm -v WARNING DistanceConcept

class DistanceConcept(Scene):
    def construct(self):
        # Title
        title = Text("How Do We Measure Similarity?", font_size=42, color=BLUE)
        self.play(Write(title))
        self.wait(1)
        self.play(title.animate.to_edge(UP))
        
        # Create coordinate system
        axes = Axes(
            x_range=[0, 60, 10],
            y_range=[0, 100, 20],
            x_length=8,
            y_length=6,
            axis_config={"color": GREY}
        )
        
        x_label = Text("Age", font_size=20).next_to(axes.x_axis, DOWN)
        y_label = Text("Income ($k)", font_size=20).next_to(axes.y_axis, LEFT)
        
        self.play(Create(axes), Write(x_label), Write(y_label))
        
        # Create three customer points
        point_a = Dot(axes.coords_to_point(25, 40), color=RED, radius=0.1)
        point_b = Dot(axes.coords_to_point(30, 45), color=BLUE, radius=0.1)
        point_c = Dot(axes.coords_to_point(50, 80), color=GREEN, radius=0.1)
        
        label_a = Text("Customer A\n(25, $40k)", font_size=16, color=RED).next_to(point_a, DOWN)
        label_b = Text("Customer B\n(30, $45k)", font_size=16, color=BLUE).next_to(point_b, UP)
        label_c = Text("Customer C\n(50, $80k)", font_size=16, color=GREEN).next_to(point_c, UP)
        
        self.play(Create(point_a), Create(point_b), Create(point_c))
        self.play(Write(label_a), Write(label_b), Write(label_c))
        
        # Show distance between A and B
        line_ab = Line(point_a.get_center(), point_b.get_center(), color=YELLOW)
        distance_ab = Text("Short distance = Similar!", font_size=18, color=YELLOW)
        distance_ab.to_edge(LEFT).shift(DOWN*2)
        
        self.play(Create(line_ab), Write(distance_ab))
        self.wait(1)
        
        # Show distance between A and C
        line_ac = Line(point_a.get_center(), point_c.get_center(), color=ORANGE)
        distance_ac = Text("Long distance = Different!", font_size=18, color=ORANGE)
        distance_ac.next_to(distance_ab, DOWN)
        
        self.play(Create(line_ac), Write(distance_ac))
        
        # Show the formula
        formula = MathTex(
            r"\text{Distance} = \sqrt{(age_1 - age_2)^2 + (income_1 - income_2)^2}",
            font_size=24
        ).to_edge(DOWN)
        
        self.play(Write(formula))
        self.wait(3)

### Animation 3: K-Means Algorithm Step by Step

In [None]:
%%manim -qm -v WARNING KMeansAlgorithm

class KMeansAlgorithm(Scene):
    def construct(self):
        # Title
        title = Text("K-Means Algorithm in Action", font_size=42, color=BLUE)
        self.play(Write(title))
        self.wait(1)
        self.play(title.animate.to_edge(UP))
        
        # Create data points
        np.random.seed(42)
        data_points = []
        true_centers = [[-2, 1.5], [2, 1.5], [0, -1.5]]
        colors = [RED, BLUE, GREEN]
        
        for i, center in enumerate(true_centers):
            for _ in range(10):
                x = center[0] + np.random.normal(0, 0.4)
                y = center[1] + np.random.normal(0, 0.4)
                point = Dot([x, y, 0], color=WHITE, radius=0.06)
                data_points.append((point, i))  # Store true cluster
        
        # Show data points
        for point, _ in data_points:
            self.add(point)
        
        step_text = Text("Step 1: Start with data points", font_size=24, color=WHITE)
        step_text.to_edge(DOWN)
        self.play(Write(step_text))
        self.wait(1)
        
        # Step 2: Place initial centroids randomly
        initial_centroids = [
            Dot([-1, 0, 0], color=RED, radius=0.15),
            Dot([1, 2, 0], color=BLUE, radius=0.15),
            Dot([0.5, -0.5, 0], color=GREEN, radius=0.15)
        ]
        
        new_step = Text("Step 2: Place cluster centers randomly", font_size=24, color=WHITE)
        self.play(ReplacementTransform(step_text, new_step))
        
        for centroid in initial_centroids:
            self.play(Create(centroid), run_time=0.5)
        
        self.wait(1)
        
        # Step 3: Assign points to nearest centroid
        assign_step = Text("Step 3: Assign each point to nearest center", font_size=24, color=WHITE)
        self.play(ReplacementTransform(new_step, assign_step))
        
        # Color points based on nearest centroid
        for point, _ in data_points:
            point_pos = point.get_center()
            distances = [np.linalg.norm(point_pos - cent.get_center()) for cent in initial_centroids]
            nearest = np.argmin(distances)
            self.play(point.animate.set_color(colors[nearest]), run_time=0.1)
        
        self.wait(1)
        
        # Step 4: Move centroids
        move_step = Text("Step 4: Move centers to middle of their groups", font_size=24, color=WHITE)
        self.play(ReplacementTransform(assign_step, move_step))
        
        # Calculate new centroid positions
        for i, centroid in enumerate(initial_centroids):
            # Find points assigned to this centroid
            assigned_points = []
            for point, _ in data_points:
                if point.color.to_hex() == colors[i].to_hex():
                    assigned_points.append(point.get_center())
            
            if assigned_points:
                avg_pos = np.mean(assigned_points, axis=0)
                self.play(centroid.animate.move_to(avg_pos), run_time=1)
        
        # Final message
        final_step = Text("Repeat until centers stop moving!", font_size=24, color=YELLOW)
        self.play(ReplacementTransform(move_step, final_step))
        self.wait(3)

### Animation 4: Real Customer Example

In [None]:
%%manim -qm -v WARNING CustomerSegmentation

class CustomerSegmentation(Scene):
    def construct(self):
        # Title
        title = Text("Customer Segmentation Example", font_size=42, color=BLUE)
        self.play(Write(title))
        self.wait(1)
        self.play(title.animate.to_edge(UP))
        
        # Create coordinate system
        axes = Axes(
            x_range=[20, 70, 10],
            y_range=[20, 100, 20],
            x_length=8,
            y_length=6,
            axis_config={"color": GREY}
        )
        
        x_label = Text("Age", font_size=20).next_to(axes.x_axis, DOWN)
        y_label = Text("Income ($k)", font_size=20).next_to(axes.y_axis, LEFT)
        
        self.play(Create(axes), Write(x_label), Write(y_label))
        
        # Create customer data points
        np.random.seed(42)
        customers = {
            "Young Professionals": {"center": [30, 70], "color": RED, "n": 8},
            "Budget Conscious": {"center": [45, 35], "color": BLUE, "n": 8},
            "High Earners": {"center": [55, 85], "color": GREEN, "n": 8},
            "Retirees": {"center": [65, 45], "color": YELLOW, "n": 6}
        }
        
        all_points = []
        
        # Create and show all points initially as white
        for segment, data in customers.items():
            center = data["center"]
            for _ in range(data["n"]):
                age = center[0] + np.random.normal(0, 3)
                income = center[1] + np.random.normal(0, 8)
                
                point_coord = axes.coords_to_point(age, income)
                point = Dot(point_coord, color=WHITE, radius=0.08)
                all_points.append((point, segment, data["color"]))
        
        # Show all points
        explanation = Text("Our customers before clustering", font_size=24, color=WHITE)
        explanation.to_edge(DOWN)
        self.play(Write(explanation))
        
        for point, _, _ in all_points:
            self.add(point)
        
        self.wait(2)
        
        # Apply clustering colors
        new_explanation = Text("After K-Means clustering - distinct customer segments!", 
                             font_size=24, color=WHITE)
        self.play(ReplacementTransform(explanation, new_explanation))
        
        for point, segment, color in all_points:
            self.play(point.animate.set_color(color), run_time=0.1)
        
        # Add segment labels
        labels = [
            ("Young Professionals\n(High income, young)", [30, 70], RED),
            ("Budget Conscious\n(Lower income, middle age)", [45, 35], BLUE),
            ("High Earners\n(High income, older)", [55, 85], GREEN),
            ("Retirees\n(Moderate income, older)", [65, 45], YELLOW)
        ]
        
        for label_text, pos, color in labels:
            label = Text(label_text, font_size=14, color=color)
            label_pos = axes.coords_to_point(pos[0], pos[1] - 12)
            label.move_to(label_pos)
            self.play(Write(label), run_time=1)
        
        self.wait(3)

### Animation 5: Choosing the Right Number of Clusters

In [None]:
%%manim -qm -v WARNING ChoosingK

class ChoosingK(Scene):
    def construct(self):
        # Title
        title = Text("How Many Clusters Should We Use?", font_size=38, color=BLUE)
        self.play(Write(title))
        self.wait(1)
        self.play(title.animate.to_edge(UP))
        
        # Create sample data
        np.random.seed(42)
        centers = [[-2, 1], [2, 1], [0, -1.5]]
        data_points = []
        
        for center in centers:
            for _ in range(8):
                x = center[0] + np.random.normal(0, 0.3)
                y = center[1] + np.random.normal(0, 0.3)
                point = Dot([x, y, 0], color=WHITE, radius=0.06)
                data_points.append(point)
        
        # Show data
        for point in data_points:
            self.add(point)
        
        # Scenario 1: Too few clusters (K=1)
        scenario1 = Text("K=1: Too few clusters - all points in one group", 
                        font_size=24, color=RED).to_edge(DOWN)
        self.play(Write(scenario1))
        
        # Color all points red
        for point in data_points:
            self.play(point.animate.set_color(RED), run_time=0.05)
        
        # Add single centroid
        centroid1 = Dot([0, 0, 0], color=BLACK, radius=0.12)
        self.play(Create(centroid1))
        self.wait(2)
        
        # Reset points to white
        self.play(FadeOut(centroid1))
        for point in data_points:
            point.set_color(WHITE)
        
        # Scenario 2: Just right (K=3)
        scenario2 = Text("K=3: Just right - natural groups revealed!", 
                        font_size=24, color=GREEN).to_edge(DOWN)
        self.play(ReplacementTransform(scenario1, scenario2))
        
        colors = [RED, BLUE, GREEN]
        centroids = []
        
        for i, center in enumerate(centers):
            # Color nearby points
            for point in data_points:
                px, py = point.get_center()[:2]
                if abs(px - center[0]) < 1.2 and abs(py - center[1]) < 1.2:
                    self.play(point.animate.set_color(colors[i]), run_time=0.1)
            
            # Add centroid
            centroid = Dot(center + [0], color=BLACK, radius=0.12)
            centroids.append(centroid)
            self.play(Create(centroid), run_time=0.5)
        
        self.wait(2)
        
        # Scenario 3: Too many clusters (K=6)
        scenario3 = Text("K=6: Too many clusters - overfitting!", 
                        font_size=24, color=ORANGE).to_edge(DOWN)
        self.play(ReplacementTransform(scenario2, scenario3))
        
        # Remove old centroids
        self.play(*[FadeOut(c) for c in centroids])
        
        # Add many small clusters
        many_colors = [RED, BLUE, GREEN, YELLOW, PURPLE, PINK]
        point_groups = [data_points[i:i+4] for i in range(0, len(data_points), 4)]
        
        for i, group in enumerate(point_groups[:6]):
            color = many_colors[i % len(many_colors)]
            for point in group:
                self.play(point.animate.set_color(color), run_time=0.1)
            
            # Add tiny centroid
            if group:
                avg_pos = np.mean([p.get_center() for p in group], axis=0)
                tiny_centroid = Dot(avg_pos, color=BLACK, radius=0.08)
                self.play(Create(tiny_centroid), run_time=0.3)
        
        # Final message
        final_msg = Text("The key is finding the right balance!", 
                        font_size=28, color=YELLOW)
        final_msg.to_edge(LEFT).shift(DOWN*2)
        self.play(Write(final_msg))
        self.wait(3)

## Step 7: Finding the Optimal Number of Clusters

In practice, we use the "Elbow Method" to find the best number of clusters. Let's implement this!

In [None]:
# Elbow Method - find optimal K
def calculate_wcss(data, max_k=10):
    """
    Calculate Within-Cluster Sum of Squares for different K values
    """
    wcss = []
    k_range = range(1, max_k + 1)
    
    for k in k_range:
        kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
        kmeans.fit(data)
        wcss.append(kmeans.inertia_)
    
    return k_range, wcss

# Calculate WCSS for our customer data
k_range, wcss_values = calculate_wcss(data_points, max_k=8)

# Create elbow plot
fig = go.Figure()
fig.add_trace(go.Scatter(x=list(k_range), y=wcss_values, 
                        mode='lines+markers',
                        line=dict(color='blue', width=3),
                        marker=dict(size=8, color='red')))

fig.update_layout(
    title='Elbow Method - Finding Optimal Number of Clusters',
    xaxis_title='Number of Clusters (K)',
    yaxis_title='Within-Cluster Sum of Squares (WCSS)',
    showlegend=False
)

# Add annotation for the elbow
fig.add_annotation(
    x=4, y=wcss_values[3],
    text="Elbow Point<br>(Optimal K=4)",
    showarrow=True,
    arrowhead=2,
    arrowcolor="red",
    font=dict(size=12, color="red")
)

fig.show()

print("WCSS values for each K:")
for k, wcss in zip(k_range, wcss_values):
    print(f"K={k}: WCSS={wcss:.2f}")

## Step 8: Business Applications

Now let's interpret our customer segments and think about how a business might use this information.

In [None]:
# Analyze our customer segments
def analyze_segments(df):
    """
    Provide business insights for each customer segment
    """
    segment_insights = {}
    
    for cluster in df['Cluster'].unique():
        cluster_data = df[df['Cluster'] == cluster]
        
        insights = {
            'count': len(cluster_data),
            'avg_age': cluster_data['Age'].mean(),
            'avg_income': cluster_data['Annual_Income'].mean(),
            'age_range': (cluster_data['Age'].min(), cluster_data['Age'].max()),
            'income_range': (cluster_data['Annual_Income'].min(), 
                           cluster_data['Annual_Income'].max())
        }
        
        # Business interpretation
        if insights['avg_age'] < 35 and insights['avg_income'] > 60:
            insights['segment_name'] = "Young Professionals"
            insights['marketing_strategy'] = "Tech products, career development, premium services"
        elif insights['avg_age'] > 50 and insights['avg_income'] > 60:
            insights['segment_name'] = "Affluent Seniors"
            insights['marketing_strategy'] = "Luxury goods, travel, health & wellness"
        elif insights['avg_income'] < 50:
            insights['segment_name'] = "Budget Conscious"
            insights['marketing_strategy'] = "Value products, discounts, essential services"
        else:
            insights['segment_name'] = "Middle Market"
            insights['marketing_strategy'] = "Quality products, family-oriented services"
        
        segment_insights[f'Cluster {cluster + 1}'] = insights
    
    return segment_insights

# Get business insights
business_insights = analyze_segments(customers_df)

print("\n" + "="*60)
print("CUSTOMER SEGMENT ANALYSIS & MARKETING RECOMMENDATIONS")
print("="*60)

for cluster_name, insights in business_insights.items():
    print(f"\n{cluster_name}: {insights['segment_name']}")
    print(f"  • Size: {insights['count']} customers")
    print(f"  • Average Age: {insights['avg_age']:.1f} years")
    print(f"  • Average Income: ${insights['avg_income']:.1f}k")
    print(f"  • Marketing Focus: {insights['marketing_strategy']}")
    print("-" * 40)

## Step 9: Interactive Clustering Tool

Let's create an interactive visualization where you can experiment with different numbers of clusters!

In [None]:
def create_interactive_clustering(data, k_values=[2, 3, 4, 5, 6]):
    """
    Create subplots showing clustering results for different K values
    """
    from plotly.subplots import make_subplots
    
    n_plots = len(k_values)
    cols = min(3, n_plots)
    rows = (n_plots + cols - 1) // cols
    
    fig = make_subplots(
        rows=rows, cols=cols,
        subplot_titles=[f'K = {k}' for k in k_values],
        horizontal_spacing=0.1,
        vertical_spacing=0.15
    )
    
    colors = ['red', 'blue', 'green', 'orange', 'purple', 'brown']
    
    for idx, k in enumerate(k_values):
        row = (idx // cols) + 1
        col = (idx % cols) + 1
        
        # Perform clustering
        kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
        labels = kmeans.fit_predict(data)
        centroids = kmeans.cluster_centers_
        
        # Add data points
        for cluster in range(k):
            cluster_data = data[labels == cluster]
            fig.add_trace(
                go.Scatter(
                    x=cluster_data[:, 0],
                    y=cluster_data[:, 1],
                    mode='markers',
                    marker=dict(color=colors[cluster % len(colors)], size=6),
                    name=f'Cluster {cluster + 1}' if idx == 0 else None,
                    showlegend=idx == 0,
                    legendgroup=f'cluster_{cluster}'
                ),
                row=row, col=col
            )
        
        # Add centroids
        fig.add_trace(
            go.Scatter(
                x=centroids[:, 0],
                y=centroids[:, 1],
                mode='markers',
                marker=dict(color='black', size=12, symbol='x'),
                name='Centroids' if idx == 0 else None,
                showlegend=idx == 0
            ),
            row=row, col=col
        )
    
    fig.update_layout(
        title_text="K-Means Clustering: Comparing Different Numbers of Clusters",
        height=400 * rows,
        showlegend=True
    )
    
    # Update axis labels
    for i in range(1, rows + 1):
        for j in range(1, cols + 1):
            fig.update_xaxes(title_text="Age", row=i, col=j)
            fig.update_yaxes(title_text="Income ($k)", row=i, col=j)
    
    return fig

# Create interactive comparison
interactive_fig = create_interactive_clustering(data_points, [2, 3, 4, 5, 6])
interactive_fig.show()

print("\nCompare the different clustering results above!")
print("Notice how K=4 seems to capture the natural groups best.")

## How to Run the Animations

To run these Manim animations:

1. **Install Manim** if you haven't already:
   ```bash
   pip install manim
   ```

2. **Run each animation cell** one by one. The `%%manim` magic command will:
   - Generate the animation
   - Save it as a video file
   - Display it in the notebook

3. **Animation Quality**: The animations are set to medium quality (`-qm`) for a good balance of visual quality and file size.

**Note**: The first time you run Manim, it might take a moment to install dependencies and set up the environment.

## Summary

🎯 **What we learned about K-Means Clustering:**

- **Clustering groups similar data points together automatically**
- **K-Means uses distance to measure similarity**
- **The algorithm iteratively improves cluster assignments**
- **Choosing the right number of clusters is crucial**
- **Business applications include customer segmentation, market research, and more**

The animations help visualize these abstract concepts, making machine learning more accessible and intuitive!