This file contains the implementation of training and testing the kohonen Network. The instructor has provided me with 3 files, the ‘healthy.mat’ file contains data from healthy subjects and the ‘patient.mat’ contains patient data. Each line corresponds to the data (time series) coming from one subject.  The time series is made up of the displacements of markers placed on the joints of subjects. There are ten subjects in each file. Of course, the same markers are used for all subjects. I do not have to adjust anything in each time series as the information from each marker has already been put in the correct position in the time series. 



First lets import all the relevant libraries.

In [1]:
import numpy as np
from minisom import MiniSom
from scipy.io import loadmat

Then I loaded all the .mat files and relevant data into the code.

In [2]:
data = loadmat('healthy.mat')
data2 = loadmat('patient.mat')
testing = loadmat('ubaid.mat')

#keys = testing_data.keys()
#print(keys)

healthy_data = data['x']
patient_data = data2['x']
testing_data = testing['z']

print(testing_data)
print(patient_data)
print(healthy_data)


[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]


Then I concatenated both the patient_data as well as healthy_data into one single all_data. After that, I calculated the length of each time series in the all_data array by accessing the second dimension of the array's shape. Then I normalized the all_data and testing_data array by dividing each element by the Euclidean norm of the entire array.
For defining the dimensions of kohonen network, I set the input_shape variable to the length of each time series, as calculated earlier. Then set the network_shape variable to a tuple representing the dimensions of the Kohonen network. Then I created an instance of the MiniSom class, representing the Kohonen network

In [3]:
all_data = np.concatenate((healthy_data, patient_data), axis=0)

# Get the length of each time series
time_series_length = all_data.shape[1]

# Normalize the data
all_data = all_data / np.linalg.norm(all_data)
normalized_testing = testing_data / np.linalg.norm(testing_data)

# Define the dimensions of the Kohonen network
input_shape = time_series_length
network_shape = (8, 8)  # Adjust the network shape as needed

# Create the Kohonen network
network = MiniSom(network_shape[0], network_shape[1], input_shape, sigma=0.5, learning_rate=0.5)

After everything is set, the next step is training, First i initialized the weights randomly and make iterations 1000 (its variable), and then i train the all_data.

In [4]:
# Initialize the weights randomly
network.random_weights_init(all_data)

# Train the network
num_iterations = 1000  # Adjust the number of iterations as needed
network.train_batch(all_data, num_iterations)

Then comes the part of testing. I initialized an array of zeros with a length equal to the number of testing samples. Then a loop that iterates over the normalized testing samples.The sample is flattened to a 1D array using flatten() to match the input shape expected by the winner method of the Kohonen network. Then I calculated the index of the winning neuron in a flattened network representation. It multiplies the row index by the number of columns in the network (network_shape[1]) and adds the col index.
Lastly, I assigned the calculated winner_index to the i-th element of the winners array, indicating the index of the winning neuron for the current testing sample.

In [9]:
winners = np.zeros(len(normalized_testing))
for i, sample in enumerate(normalized_testing):
    row, col = network.winner(sample.flatten())  # Get the row and column coordinates of the winner
    winner_index = row * network_shape[1] + col  # Calculate the index of the winner in a flattened network
    winners[i] = winner_index

# Define the class labels for healthy and patient
class_labels = ['Healthy', 'Patient']

# Map the winners to the class labels
predicted_labels = [class_labels[int(winner_index) % len(class_labels)] for winner_index in winners]

print("Predicted labels for testing samples:")
print(predicted_labels)

Predicted labels for testing samples:
['Healthy', 'Healthy', 'Healthy', 'Healthy']


Main differences between Kohonen network, K-means, and K-nearest neighbor (KNN) techniques:

### Kohonen Network

*   A type of self-organizing map (SOM) that uses unsupervised learning to produce a low-dimensional (typically 2D) representation of the input space.
*   It clusters similar data points together in the same region of the map.
*   It creates a topological structure that preserves the relationships between the input data points.
*   It is useful for visualizing high-dimensional data and identifying patterns in the data.

### K-means

*   A clustering algorithm that partitions data into K distinct clusters based on their similarity.
*   It requires the user to specify the number of clusters (K) in advance.
*   It tries to minimize the sum of squared distances between points and their assigned cluster centers.
*   It is computationally efficient and works well with large datasets.

### K-nearest neighbor (KNN)

*   A classification algorithm that classifies data points based on their proximity to other data points in the feature space.
*   It requires the user to specify the number of nearest neighbors (K) to consider.
*   It assigns the class label of the majority of the K nearest neighbors to the new data point.
*   It is a lazy learning algorithm, meaning it does not have a training phase, and instead, relies on the entire training dataset during classification.

In summary, the Kohonen network is a self-organizing map that creates a low-dimensional representation of the input space, while K-means is a clustering algorithm that partitions data into K distinct clusters based on similarity, and KNN is a classification algorithm that assigns the class label of the majority of the K nearest
