# Practical 3: Vector Quantization

**Course:** WBCS032-05 Introduction to Machine Learning  
**Student Names:**  
**Student Numbers:**  

---

## Assignment Overview

In this assignment, you will implement Winner-Takes-All unsupervised competitive learning (VQ) as discussed in class, using the Euclidean distance measure. You will work with the dataset `simplevqdata.csv`, which contains 1000 unlabeled two-dimensional data points.  

## 1. Introduction (1 point)

Describe the goal of this assignment.

**Your answer here:**

## 2. Methods (3 points)

### 2.1 Explain Vector Quantization (0.5 points)

Explain the algorithm in a general manner.

**Your answer here:**

### 2.2 Implementation (2.5 points)

You need to implement the VQ algorithm **yourself**. Both the code quality and correctness will be graded.

*__Note:__* **Do not change the cell labels! Themis will use them to automatically grade your submission.**

In [None]:
# Load required libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Read the file containing the data
data_file_path = 'simplevqdata.csv'

# The values below are sample values
num_prototypes = 2 # the number of prototypes
learning_rate = 0.1 # the learning rate
t_max = 100 # the maximum number of epochs

In [None]:
# Read input data

df = pd.read_csv(data_file_path, header=None)
data = np.array(df)

#### VQ Algorithm Steps

Implement the following steps:

1. **Initialize the prototypes, e.g., by random selection of K data points**

2. **Repeat for epochs $t = 1$ to $t_{max}$:**

      

      *   In each epoch, present all of the dataset in a randomized order (every example is presented exactly once, the order of examples is different in every epoch)

      *   perform an epoch of training using all of the P examples. At every individual step present a single example to the system, evaluate the distances from all prototypes and update the winning prototype

      *   plot the data and prototype positions after each epoch, so you can observe/show how they approach their final positions

      *   evaluate the quantization error HV Q after each epoch (not after each individual update step!)


In [None]:
def quantization_error(data, prototypes):
    """
    Compute quantization error

    Args:
        data (ndarray): Data points.
        prototypes (ndarray): The prototypes

    Returns:
        HVQ (float): quantization error
    """
    pass

In [None]:
def vector_quantization(data, num_prototypes, learning_rate, max_epoch, init_type='random'):
    """
    Competitive learning with vector quantization

    Args:
        data (ndarray): Data points.
        num_prototypes (int): The number of prototypes
        learning_rate (float): The learning rate
        max_epoch (int): The maximum number of epochs
        init_type (str): Initialization type ('random' or 'stupid')

    Returns:
        prototype_trace (list): the trace of the prototypes over all epochs
        HVQ_trace (list): the quantization error over all epochs
    """
    pass

## 3. Experimental Results (4 points)

### 3.1 Learning curves

*__Note:__* This section **is graded** by Themis.

Implement the function below in order to generate plots of the quantization error $H_{VQ}$ as a function of the number of epochs $t$. Include learning curve figures for three different learning rate values $\eta$, considering prototype counts $K = 2$ and $K = 4$.

Initialize the prototypes using randomly selected data points.

In [None]:
def plot_quantization_error(data, num_prototypes, learning_rate, t_max):
    """
    Plot quantization error over epochs.

    Args:
        data (ndarray): Data points.
        num_prototypes (int): Number of prototypes.
        learning_rate (float): Learning rate.
        t_max (int): Maximum number of epochs.
    """
    pass

### 3.2 Trajectories of prototypes during learning

*__Note:__* This section is graded **both by Themis and manually**.

Implement the function below in order to generate plots that display the trajectories of the prototypes during learning for $K = 2$ and $K = 4$. The plots should illustrate the paths (traces) of the prototypes from their initial positions to their final positions, together with the data points.

Use a learning rate that produces non-trivial and meaningful learning behavior.

Perform the training using at least two different initialization strategies:
- One where the prototypes are initialized with randomly selected data points.
- One using a deliberately poor (“stupid”) initialization.

In [None]:
def plot_prototype_trajectories(
    data, num_prototypes, learning_rate, t_max, colors, init_type="random"
):
    """
    Plot the trajectories of prototypes during VQ learning.

    Args:
        data (ndarray): Data points.
        num_prototypes (int): Number of prototypes.
        learning_rate (float): Learning rate.
        t_max (int): Maximum number of epochs.
        colors (list): List of colors, one per prototype.
        init_type (str): Initialization type ('random' or 'stupid').
    """
    pass


## 4. Discussion (2 points)

Discuss your observations on the obtained results, in particular with respect to the role of the learning rate, addressing the following questions:

- How does the final value of the cost function change with $\eta$?
- What happens if $\eta$ is too large or too small?

**Your answer here:**



## Contribution

State your individual contribution.

**Your answer here:**