# Self Organising Map Challenge

## The Kohonen Network

The Kohonen Self Organising Map (SOM) provides a data visualization technique which helps to understand high dimensional data by reducing the dimensions of data to a map. SOM also represents clustering concept by grouping similar data together.

Unlike other learning technique in neural networks, training a SOM requires no target vector. A SOM learns to classify the training data without any external supervision.

![Network](http://www.pitt.edu/~is2470pb/Spring05/FinalProjects/Group1a/tutorial/kohonen1.gif)

### Structure
A network has a width and a height that descibes the grid of nodes.  For example, the grid may be 4x4, and so there would be 16 nodes.

Each node has a weight for each value in the input vector.  A weight is simply a float value that the node multiplies the input value by to determine how influential it is (see below)

Each node has a set of weights that match the size of the input vector.  For example, if the input vector has 10 elements, each node would have 10 weights.

### Training 
To train the network

1. Each node's weights are initialized.
2. We enumerate through the training data for some number of iterations (repeating if necessary).  The current value we are training against will be referred to as the `current input vector`
3. Every node is examined to calculate which one's weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).
4. The radius of the neighbourhood of the BMU is now calculated. This is a value that starts large, typically set to the 'radius' of the lattice,  but diminishes each time-step. Any nodes found within this radius are deemed to be inside the BMU's neighbourhood.
5. Each neighbouring node's (the nodes found in step 4) weights are adjusted to make them more like the input vector. The closer a node is to the BMU, the more its weights get altered.
6. Go to step 2 until we've completed N iterations.
    

### Calculating the Best Matching Unit (BMU)

To determine the best matching unit, one method is to iterate through all the nodes and calculate the Euclidean distance between each node's weight vector and the current input vector. The node with a weight vector closest to the input vector is tagged as the BMU.

The Euclidean distance $\mathsf{distance}_{i}$ (from the input vector $V$ to the $i$th node's weights $W_i$)is given as (using Pythagoras):

$$ \mathsf{distance}_{i}=\sqrt{\sum_{k=0}^{k=n}(V_k - W_{i_k})^2}$$

where V is the current input vector and $W_i$ is the node's weight vector.  $n$ is the size of the input & weight vector.

*Note*: $V$ and $W$ are vectors.  $V$ is the input vector, and $W_i$ is the weight vector of the $i$th node.  $V_k$ and $W_{i_k}$ represent the $k$'th value within those vectors.  

The BMU is the node with the minimal distance for the current input vector

### Calculating the Neighbourhood Radius

The next step is to calculate which of the other nodes are within the BMU's neighbourhood. All these nodes will have their weight vectors altered.

First we calculate what the radius of the neighbourhood should be and then use Pythagoras to determine if each node is within the radial distance or not.

A unique feature of the Kohonen learning algorithm is that the area of the neighbourhood shrinks over time. To do this we use the exponential decay function:

Given a desired number of training iterations $n$:
$$n_{\mathsf{max iterations}} = 100$$

Calculate the radius $\sigma_t$ at iteration number $t$:

$$\sigma_t = \sigma_0 \exp\left(- \frac{t}{\lambda} \right) \qquad t = 1,2,3,4... $$

Where $\sigma_0$ denotes the neighbourhood radius at iteration $t=0$, $t$ is the current iteration. We define $\sigma_0$ (the initial radius) and $\lambda$ (the time constant) as below:

$$\sigma_0 = \frac{\max(width,height)}{2} \qquad \lambda = \frac{n_{\mathsf{max iterations}}}{\log(\sigma_0)} $$

Where $width$ & $height$ are the width and height of the grid.

### Calculating the Learning Rate

We define the initial leanring rate $\alpha_0$ at iteration $t = 0$ as:
$$\alpha_0 = 0.1$$

So, we can calculate the learning rate at a given iteration t as:

$$\alpha_t = \alpha_0 \exp \left(- \frac{t}{\lambda} \right) $$

where $t$ is the iteration number, $\lambda$ is the time constant (calculated above)
        
### Calculating the Influence

As well as the learning rate, we need to calculate the influence $\theta_t$ of the learning/training at a given iteration $t$.  

So for each node, we need to caclulate the euclidean distance $d_i$ from the BMU to that node.  Similar to when we calculate the distance to find the BMU, we use Pythagoras.  The current ($i$th) node's x position is given by $x(W_i)$, and the BMU's x position is, likewise, given by $x(Z)$.  Similarly, $y()$ returns the y position of a node.

$$ d_{i}=\sqrt{(x(W_i) - x(Z))^2 + (y(W_i) - y(Z))^2} $$

Then, the influence decays over time according to:

$$\theta_t = \exp \left( - \frac{d_{i}^2}{2\sigma_t^2} \right) $$

Where $\sigma_t$ is the neighbourhood radius at iteration $t$ as calculated above. 

Note: You will need to come up with an approach to x() and y().


### Updating the Weights

To update the weights of a given node, we use:

$$W_{i_{t+1}} = W_{i_t} + \alpha_t \theta_t (V_t - W_{i_t})$$
        
So $W_{i_{t+1}}$ is the new value of the weight for the $i$th node, $V_t$ is the current value of the training data, $W_{i_t}$ is the current weight and $\alpha_t$ and $\theta_t$ are the learning rate and influence calculated above.

*Note*: the $W$ and $V$ are vectors 

## Challenge

Sam has written an implementation of a Self Organising Map. Consider the following criteria when assessing Sam's code:

- Could the code be made more efficient? A literal interpretation of the instructions above is not necessary.
  - Using too many python for loops. Must us numpy and vectors. This will also allow to use GPU in case that is available **Saad**
- Is the code best structured for later use by other developers and in anticipation of productionisation?
  - Nope, with this code it is very hard to understand what is happening at a glance
    - Make a class
    - Make internal functions for BMU, neighbourhood radius, learning rate and influence
- How would you approach productionising this application?
- Anything else you think is relevant.

In [1]:
# kohonen.py
import matplotlib.pyplot as plt
import numpy as np


def train(input_data, n_max_iterations, width, height, init_weights=None):
    σ0 = max(width, height) / 2
    α0 = 0.1
    weights = (
        init_weights
        if init_weights is not None
        else np.random.random((width, height, 3))
    )
    λ = n_max_iterations / np.log(σ0)
    for t in range(n_max_iterations):
        σt = σ0 * np.exp(-t / λ)
        αt = α0 * np.exp(-t / λ)
        for vt in input_data:
            bmu = np.argmin(np.sum((weights - vt) ** 2, axis=2))
            bmu_x, bmu_y = np.unravel_index(bmu, (width, height))
            for x in range(width):
                for y in range(height):
                    di = np.sqrt(((x - bmu_x) ** 2) + ((y - bmu_y) ** 2))
                    θt = np.exp(-(di**2) / (2 * (σt**2)))
                    weights[x, y] += αt * θt * (vt - weights[x, y])
    return weights

## Increasing efficiency 

- Use vector broad casting
- numpy vector operations instead of loops
- Update can be done without looping for x and y with meshgrid and broadcast

In [2]:
def e_train(input_data, n_max_iterations, width, height, init_weights=None):
    init_learning_rate = 0.1
    init_radius = max(width, height) / 2
    time_constant = n_max_iterations / np.log(init_radius)
    
    weights = (
        init_weights
        if init_weights is not None
        else np.random.random((width, height, input_data.shape[-1]))
    )
    coord_x, coord_y = np.meshgrid(np.arange(width), np.arange(height), indexing="ij")
    for t in range(n_max_iterations):
        current_radius = init_radius * np.exp(-t / time_constant)
        current_learning_rate = init_learning_rate * np.exp(-t / time_constant)
        for vt in input_data:
            bmu = np.argmin(np.sum((weights - vt) ** 2, axis=2))
            bmu_x, bmu_y = np.unravel_index(bmu, (width, height))

            influence = np.sqrt((coord_x - bmu_x) ** 2 + (coord_y - bmu_y) ** 2)
            influence_decay = np.exp(-(influence**2) / (2 * current_radius**2))
            # broadcasting
            influence_decay = influence_decay.reshape(
                influence_decay.shape + (1,) * (weights.ndim - influence_decay.ndim)
            )

            weights += current_learning_rate * influence_decay * (vt - weights)
    return weights

# Production worth - Kohonen Map

- Logger class
- Kohonen Map class
- Model registry
- Build, retraining, and inference pipelines

## Logger

A `Telemetry` class which can log, logs to a jsonl file. It is a helper class, for the telemetry, so no need to look too much at it.

In [3]:
import time, os, logging, json

class Telemetry:
    def __init__(self, level=logging.INFO, log_path='.logs/telemetry.log'):
        """Initializes the telemetry system with a logger."""
        self.__logger = logging.getLogger('TelemetryLogger')
        self.__logger.setLevel(level)
        os.makedirs(os.path.dirname(log_path), exist_ok=True)
        handler = logging.FileHandler(log_path)
        formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s', datefmt='%Y-%m-%d %H:%M:%S')
        handler.setFormatter(formatter)
        self.__logger.addHandler(handler)

    def log(self, message: str, level=logging.INFO, **kwargs):
        """Logs a message with a given level."""
        self.__logger.log(level, message, extra=kwargs)

    def error(self, error: Exception):
        self.__logger(logging.ERROR, f"[Training Error] f{str(error)}", extra=dict(error=error))

## Production worthy Kohonen Maps

- Make is a class
- Attach telemetry
- The private methods must tell the steps in the algorithm
- Use numpy boradcasts and vector algorithms for faster iterations
- Add inference endpoint.
- Add versioning for model code, param change, and retraining

In [9]:
from typing import Tuple, Optional, Callable
import numpy as np
from itertools import product
from tqdm import tqdm
import os, pickle


class KohonenMap:
    """
    Implementation of a Kohonen Self-Organizing Map (SOM).
    """

    def __init__(
        self,
        width: int,
        height: int,
        input_dim: int,
        model_version: int = 0, # Model build/ code version
        model_last_revision: int = 0, # Model training version
        max_iterations: int = 1000,
        learning_rate: float = 0.1,
        weights: Optional[np.ndarray] = None,
        telemetry: Telemetry = Telemetry(),
        checkpoint_dir: str = './checkpoints'
    ):
        """
        Initializes a Kohonen Map with the given dimensions and training parameters.

        :param width: Width of the map.
        :param height: Height of the map.
        :param input_dim: Number of dimensions of the input vectors.
        :param max_iterations: Maximum number of iterations for training.
        :param learning_rate: Initial learning rate.
        :param weights: Initial weights of the SOM. Randomly initialized if None.
        :param telemetry: Telemetry object for logging.
        :param checkpoint_dir: Directory where checkpoints will be saved.
        """
        self.model_code_version: int = 0 # Update this when model code changes
        self.model_version = model_version
        self.model_last_revision = model_last_revision
        self.width = width
        self.height = height
        self.input_dim = input_dim
        self.learning_rate = learning_rate
        self.weights = (
            weights
            if weights is not None
            else np.random.random((width, height, input_dim))
        )
        assert self.weights.shape == (
            width,
            height,
            input_dim,
        ), f"The weights must be of shape {(width, height, input_dim)}. The given is {self.weights.shape}"
        self.max_iterations = max_iterations
        self.meshgrid = np.meshgrid(np.arange(width), np.arange(height), indexing="ij")
        self.init_radius = max(width, height) / 2
        self.time_constant = max_iterations / np.log(self.init_radius)
        self.telemetry = telemetry
        self.checkpoint_dir = checkpoint_dir
        os.makedirs(self.checkpoint_dir, exist_ok=True)

    def __get_bmu(self, vector: np.ndarray) -> Tuple[int, int]:
        """
        Identifies the best matching unit (BMU) for a given input vector.

        :param vector: Input vector.
        :return: Tuple of indices for the BMU.
        """
        self.telemetry.log(
            "Entered __get_bmu function", logging.DEBUG, input=dict(vector=vector)
        )
        bmu = np.argmin(np.sum((self.weights - vector) ** 2, axis=2))
        return np.unravel_index(bmu, (self.width, self.height))

    def __calculate_influence(
        self, bmu: Tuple[int, int], current_radius: float
    ) -> np.ndarray:
        """
        Calculates the influence of the BMU over the map's neurons.

        :param bmu: Best matching unit (BMU) indices.
        :param current_radius: Current neighborhood radius.
        :return: Influence matrix.
        """
        self.telemetry.log(
            "Entered __calculate_influence function",
            logging.DEBUG,
            input=dict(bmu=bmu, current_radius=current_radius),
        )
        bmu_x, bmu_y = bmu
        coord_x, coord_y = self.meshgrid
        influence = np.sqrt((coord_x - bmu_x) ** 2 + (coord_y - bmu_y) ** 2)
        influence_decay = np.exp(-(influence**2) / (2 * current_radius**2))
        return influence_decay.reshape(
            influence_decay.shape + (1,) * (self.weights.ndim - influence_decay.ndim)
        )

    def __save_checkpoint(self, iteration: int):
        """
        Saves a checkpoint of the current model state.

        :param iteration: The current training iteration.
        """
        checkpoint_file = os.path.join(
            self.checkpoint_dir, f"checkpoint_{self.model_code_version}.{self.model_version}.{self.model_last_revision}.{iteration}.pkl"
        )
        checkpoint_data = {
            "iteration": iteration,
            "weights": self.weights,
            "learning_rate": self.learning_rate,
            "model_code_version": self.model_code_version,
            "model_version": self.model_version,
            "model_last_revision": self.model_last_revision,
        }
        with open(checkpoint_file, 'wb') as f:
            pickle.dump(checkpoint_data, f)
        self.telemetry.log(f"Checkpoint saved at iteration {iteration}", logging.INFO)

    def __load_checkpoint(self) -> Optional[int]:
        """
        Loads the most recent checkpoint if available.

        :return: The iteration number from the checkpoint or None if no checkpoint exists.
        """
        checkpoint_files = [f for f in os.listdir(self.checkpoint_dir) if f.startswith("checkpoint_")]
        if not checkpoint_files:
            return None

        latest_checkpoint = max(checkpoint_files, key=os.path.getctime)
        checkpoint_file = os.path.join(self.checkpoint_dir, latest_checkpoint)
        with open(checkpoint_file, 'rb') as f:
            checkpoint_data = pickle.load(f)
        
        self.weights = checkpoint_data["weights"]
        self.learning_rate = checkpoint_data["learning_rate"]
        self.model_code_version = checkpoint_data["model_code_version"]
        self.model_version = checkpoint_data["model_version"]
        self.model_last_revision = checkpoint_data["model_last_revision"]
        
        iteration = checkpoint_data["iteration"]
        self.telemetry.log(f"Loaded checkpoint from iteration {iteration}", logging.INFO)
        return iteration

    def __delete_checkpoints(self):
        """
        Deletes all checkpoint files in the checkpoint directory.
        """
        try:
            checkpoint_files = [f for f in os.listdir(self.checkpoint_dir) if f.startswith(f"checkpoint_{self.model_code_version}.{self.model_version}.{self.model_last_revision}")]
            for file_name in checkpoint_files:
                os.remove(os.path.join(self.checkpoint_dir, file_name))
            self.telemetry.log("All checkpoints have been deleted.", logging.INFO)
        except Exception as e:
            self.telemetry.error(e)

    def __train(self, input_data: np.ndarray, training_checkpoint: int):
        """
        Trains the Kohonen map using the provided input data.

        :param input_data: Input data array.
        """
        assert (
            len(input_data.shape) == 2 and input_data.shape[-1] == self.input_dim
        ), f"The input_data must be of shape (N, {self.input_dim}). The given is {input_data.shape}"

        # Resume iterations just in case
        start_iteration = self.__load_checkpoint() or 0

        for iteration in tqdm(
            range(self.max_iterations),
            total=self.max_iterations,
            initial=start_iteration,
            desc="Kohonen fitting iterations",
        ):
            start_time = time.time()
            ## Kohonene fitting - start
            current_radius = self.init_radius * np.exp(-iteration / self.time_constant)
            current_learning_rate = self.learning_rate * np.exp(
                -iteration / self.time_constant
            )
            for vector in input_data:
                bmu = self.__get_bmu(vector)
                influence_decay = self.__calculate_influence(bmu, current_radius)
                self.weights += (
                    current_learning_rate * influence_decay * (vector - self.weights)
                )
            ## Kohonen fitting - end
            if iteration % training_checkpoint == 0:
                self.__save_checkpoint(iteration)
            elapsed_time = time.time() - start_time
            telemetry_message = f"Iteration {iteration}/{self.max_iterations} complete in f{elapsed_time:.5f}s"
            self.telemetry.log(telemetry_message)
            self.telemetry.log(
                telemetry_message,
                level=logging.DEBUG,
                params=dict(
                    iteration=iteration,
                    total_iterations=self.max_iterations,
                    kohonen_map_size=(self.width, self.height),
                    input_data_shape=input_data.shape,
                    init_radius=self.init_radius,
                    time_constant=self.time_constant,
                    current_radius=current_radius,
                    current_learning_rate=current_learning_rate,
                    elapsed_time=elapsed_time,
                ),
            )
        self.__delete_checkpoints()
        return self.weights

    def train(self, input_data: np.ndarray, training_checkpoint: int = 100):
        """
        Public method to start the training process.

        :param input_data: Input data array.
        """
        try:
            self.telemetry.log(
                "Entered train function",
                level=logging.DEBUG,
                input=dict(
                    input_data_shape=input_data.shape,
                    training_checkpoint=training_checkpoint,
                ),
            )
            start_time = time.time()
            
            result = self.__train(input_data, training_checkpoint)
            self.model_last_revision += 1

            elapsed_time = time.time() - start_time
            self.telemetry.log(f"Training completed in {elapsed_time}s")
            return result
        except Exception as e:
            self.telemetry.error(e)
            raise e
            

    def infer(self, input_data: np.ndarray) -> Tuple[int, int, float]:
        """
        Infers the best matching unit (BMU) for a given input vector.

        :param input_vector: Input vector to locate the BMU for.
        :return: Tuple of indices representing the position of the BMU and the eucleadian distance for the bmu.
        """
        try:
            self.telemetry.log(
                f"Entered infer function with model ({self.model_version}.{self.model_last_revision})",
                level=logging.DEBUG,
                input=dict(input_data_shape=input_data.shape),
            )
            assert len(input_data.shape) == 2 and input_data.shape[-1] == self.input_dim, f'The input vector MUST be of shape (N, {self.input_dim}). The provided is f{input_data.shape}'
            result: list[Tuple[int, int, float]] = []
            for vector in input_data:
                bmu_index = np.argmin(np.linalg.norm(self.weights - vector, axis=2))
                bmu_coords = np.unravel_index(bmu_index, (self.width, self.height))
                euclidean_distance = np.linalg.norm(self.weights[bmu_coords] - vector)
                result.append((*bmu_coords, euclidean_distance))
            return result
        except Exception as e:
            self.telemetry.error(e)
            raise e

## Model Resgistry

Store the model based on its version, environment, and revision

In [11]:
import pickle

class KohonenMapRegistry:
    def __init__(
        self,
        dir_path: str,
        environment: str = 'dev',
        artefact_base_name: str = 'kohonen',
    ):
        # Registery directory
        self.__dir_path = os.path.join(dir_path, environment)
        self.__environment = environment
        self.__artefact_base_name = artefact_base_name
        os.makedirs(self.__dir_path, exist_ok=True)

    def save(self, kohonenMap: KohonenMap):
        version = kohonenMap.model_version
        revision = kohonenMap.model_last_revision
        code_version = kohonenMap.model_code_version
        model_artefact = dict(
            width=kohonenMap.width,
            height=kohonenMap.height,
            input_dim=kohonenMap.input_dim,
            max_iterations=kohonenMap.max_iterations,
            learning_rate=kohonenMap.learning_rate,
            weights=kohonenMap.weights,
            build_version=version,
            code_version=code_version,
            training_revision=revision,
            execution_environment=self.__environment,
        )
        model_artefact_file_name = f"{self.__artefact_base_name}-{self.__environment}-{code_version}.{version}.{revision}.pkl"
        file_path = os.path.join(self.__dir_path, model_artefact_file_name)
        if os.path.exists(file_path):
            raise FileExistsError(f"The file {file_path} already exists.")
        with open(file_path, 'wb') as f:
            pickle.dump(model_artefact, f)
        print(f"Model artefact saved at {file_path}")
        
    def load(self, code_version: int, version: int, revision: int) -> KohonenMap:
        model_artefact_file_name = f"{self.__artefact_base_name}-{self.__environment}-{code_version}.{version}.{revision}.pkl"
        file_path = os.path.join(self.__dir_path, model_artefact_file_name)
        if not os.path.exists(file_path):
            raise FileNotFoundError(f"The file {file_path} does not exist.")
        with open(file_path, 'rb') as f:
            model_artefact = pickle.load(f)
        kohonenMap = KohonenMap(
            width=model_artefact['width'],
            height=model_artefact['height'],
            input_dim=model_artefact['input_dim'],
            max_iterations=model_artefact['max_iterations'],
            learning_rate=model_artefact['learning_rate'],
            weights=model_artefact['weights'],
            model_version=version,
            model_last_revision=revision
        )
        print(f"Model artefact loaded from {file_path}")
        return kohonenMap

    def list(self) -> list[Tuple[int, int, int]]:
        artefacts = []
        for file_name in os.listdir(self.__dir_path):
            if file_name.startswith(self.__artefact_base_name) and file_name.endswith('.pkl'):
                parts = file_name.split('-')
                env, version_revision = parts[1], parts[2]
                if env == self.__environment:
                    code_version, version, revision = map(int, version_revision.split('.')[0:3])
                    artefacts.append((code_version, version, revision))
        
        artefacts.sort(reverse=True, key=lambda x: (x[0], x[1]))
        return artefacts

In [7]:





if __name__ == "__main__":
    # Generate data
    w, h = (400, 400)
    iterations = 100
    input_data = np.random.random((40, 3))
    init_weights = np.random.random((w, h, 3))
    plt.imsave(f"images/init{w}.png", init_weights)
    
    image_data = p_train(input_data, iterations, w, h, init_weights)
    plt.imsave(f"images/p{w}.png", image_data)

    # image_data = e_train(input_data, iterations, w, h, init_weights)
    # plt.imsave("e100.png", image_data)

    # image_data = train(input_data, iterations, w, h, init_weights)
    # plt.imsave("100.png", image_data)

    # Generate data
    # input_data = np.random.random((10,3))
    # image_data = train(input_data, 1000, 100, 100)

    # plt.imsave('1000.png', image_data)

Processing Data:  20%|█▉        | 793/4000 [00:09<00:40, 79.74it/s]


KeyboardInterrupt: 