
<table style="border-collapse: collapse; width: 50%; font-family: Arial;">
  <tr style="background-color: #4CAF50; color: white;">
    <th style="border: 1px solid #ddd; padding: 8px;">Subject</th>
    <th style="border: 1px solid #ddd; padding: 8px;">Group</th>
  </tr>
  <tr>
    <td style="border: 1px solid #ddd; padding: 8px;">Subject 1</td>
    <td style="border: 1px solid #ddd; padding: 8px;">Healthy Control</td>
  </tr>
  <tr style="background-color: #f2f2f2;">
    <td style="border: 1px solid #ddd; padding: 8px;">Subject 2</td>
    <td style="border: 1px solid #ddd; padding: 8px;">Patient</td>
  </tr>
  <tr>
    <td style="border: 1px solid #ddd; padding: 8px;">Subject 3</td>
    <td style="border: 1px solid #ddd; padding: 8px;">Healthy Control</td>
  </tr>
  <tr style="background-color: #f2f2f2;">
    <td style="border: 1px solid #ddd; padding: 8px;">Subject 4</td>
    <td style="border: 1px solid #ddd; padding: 8px;">Patient</td>
  </tr>
</table>


In [7]:
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import pickle

class SOM:
    def __init__(self, grid_size, input_size, lr=0.5, sigma=None):
        self.grid_size = grid_size
        self.input_size = input_size
        self.lr = lr
        self.sigma = sigma if sigma else max(grid_size) / 2
        self.weights = np.random.rand(*grid_size, input_size)
        self.coords = np.array([[i, j] for i in range(grid_size[0]) for j in range(grid_size[1])])

    def find_bmu(self, sample):
        dist = np.linalg.norm(self.weights - sample, axis=2)
        return np.unravel_index(np.argmin(dist), dist.shape)

    def train(self, data, epochs=100, show_progress=True):
        for epoch in range(epochs):
            lr = self.lr * np.exp(-epoch / epochs)
            sigma = self.sigma * np.exp(-epoch / epochs)

            for sample in data:
                bmu = self.find_bmu(sample)
                dists = np.linalg.norm(self.coords - bmu, axis=1)
                influence = np.exp(-dists**2 / (2 * sigma**2))

                for idx, (x, y) in enumerate(self.coords):
                    self.weights[x, y] += lr * influence[idx] * (sample - self.weights[x, y])

            if show_progress and (epoch + 1) % 10 == 0:
                print(f"Epoch {epoch + 1}/{epochs} completed")

def load_data(healthy_file, patient_file):
    healthy = np.loadtxt(healthy_file)
    patient = np.loadtxt(patient_file)
    labels = np.concatenate([np.zeros(len(healthy)), np.ones(len(patient))])
    data = np.vstack([healthy, patient])
    scaler = MinMaxScaler()
    return scaler.fit_transform(data), labels, scaler

def assign_labels_to_bmu(som, data, labels):
    bmu_map = {}
    for sample, label in zip(data, labels):
        bmu = som.find_bmu(sample)
        bmu_map.setdefault(bmu, []).append(label)
    return {bmu: int(round(np.mean(lbls))) for bmu, lbls in bmu_map.items()}

def predict_new_data(txt_file, model_path="trained-som.pkl"):
    with open(model_path, "rb") as f:
        saved = pickle.load(f)

    new_data = np.loadtxt(txt_file)
    scaled = saved["scaler"].transform(new_data)
    preds = [saved["bmu_label_map"].get(saved["som"].find_bmu(sample), -1) for sample in scaled]

    for i, pred in enumerate(preds):
        label = {0: "Healthy", 1: "Patient"}.get(pred, "Unknown")
        print(f"Sample {i + 1}: Predicted = {label}")

    return preds

if __name__ == "__main__":
    GRID_SIZE = (3, 3)
    EPOCHS = 100
    LEARNING_RATE = 0.5

    data, labels, scaler = load_data("control.txt", "patient.txt")
    som = SOM(GRID_SIZE, data.shape[1], LEARNING_RATE)
    som.train(data, epochs=EPOCHS)

    bmu_label_map = assign_labels_to_bmu(som, data, labels)

    print("\nBMU to Label Mapping:")
    for bmu, label in bmu_label_map.items():
        print(f"BMU at {bmu}: {'Patient' if label else 'Healthy'}")

    with open("trained-som.pkl", "wb") as f:
        pickle.dump({"som": som, "scaler": scaler, "bmu_label_map": bmu_label_map}, f)

    print("\nModel saved")


Epoch 10/100 completed
Epoch 20/100 completed
Epoch 30/100 completed
Epoch 40/100 completed
Epoch 50/100 completed
Epoch 60/100 completed
Epoch 70/100 completed
Epoch 80/100 completed
Epoch 90/100 completed
Epoch 100/100 completed

BMU to Label Mapping:
BMU at (np.int64(2), np.int64(0)): Healthy
BMU at (np.int64(2), np.int64(1)): Healthy
BMU at (np.int64(1), np.int64(0)): Healthy
BMU at (np.int64(0), np.int64(2)): Patient
BMU at (np.int64(0), np.int64(1)): Patient
BMU at (np.int64(1), np.int64(2)): Patient

Model saved


In [8]:
predict_new_data("khalil.txt")

Sample 1: Predicted = Patient
Sample 2: Predicted = Patient
Sample 3: Predicted = Healthy
Sample 4: Predicted = Healthy


[1, 1, 0, 0]

# Comparing Key Learning Techniques in Machine Learning

## 1. **Self-Organizing Map (SOM / Kohonen Network)**
- **Type**: Unsupervised learning, great for **dimensionality reduction** and **data visualization**.
- **How it learns**: Uses **competitive learning** to organize input data on a 2D grid, while maintaining the **topological relationships** between data points.
- **Use case**: Ideal when you want to **visualize clusters** and **detect patterns** in high-dimensional data.

---

## 2. **K-Means Clustering**
- **Type**: Classic **unsupervised clustering** algorithm.
- **How it works**: Assigns data points to **K clusters** based on their distance to **cluster centroids**, which are updated iteratively.
- **Topology awareness**: Unlike SOM, K-Means doesn't preserve **spatial structure** or **topology**—it's purely about **grouping by similarity**.

---

## 3. **K-Nearest Neighbors (KNN)**
- **Type**: **Supervised learning** used for both **classification** and **regression**.
- **How it works**: Given a new point, it looks at the **K closest labeled examples** and predicts the majority class (or average value).
- **Lazy learner**: KNN doesn’t learn an internal model—it just memorizes the training data and makes decisions at prediction time.

---

> **Summary**:
- **SOM** = maps structure + patterns (unsupervised + topology-aware)
- **K-Means** = finds groups without preserving structure (unsupervised)
- **KNN** = compares with known labels at prediction time (supervised)
