# Self Organizing Map

To train a SOM, we follow these steps:

1. Choose grid size
2. Set random weight W of shape (1, n_cols)
3. Choose an input vector from the data
4. Calculte Euclidian distance between Xi and Wj,i  - the minimum is the BMU
5. Update the weights: w(t+1) = w(t) + h(t) * lr(t) * (x - w(t))
6. Repeat: Repeat steps 3-5 for a specified number of iterations, adjusting the learning rate and neighborhood function over time as needed.


Where: 
* **Wj,i**: represent the Weights of the node j (W1,1 W1,2, ... W1,n_cols)
* **BMU** (Best Matching Unit) is the node with the minimum euclidian to the input X.
* **h**: the neighboring value. The closer a node is to the BMU, the higher h will be.
* **lr**: is the learning rate


**Neighboring Function**
The neighboring fucntion calculates the dist

In [None]:
def neighborhood_function(winning_neuron, current_neuron, radius):
    # Euclidan distance between BMU and another neuron
    # D = sqrt(sum(Xi - Wi) ^ 2)
    distance = np.linalg.norm(winning_neuron - current_neuron)

    # we  update only neurons in the perimeter
    # the smaller the distance between BMU and the neuron, the higher the returned value will be
    # which means drastic updates.
    if distance <= radius:
        return np.exp(-distance**2 / (2*radius**2))
    else:
        return 0


## Install MiniSom Package

In [None]:
# !pip install MiniSom

### Importing the libraries


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math

## Importing the dataset

This dataset contains information about customers who applied for a credit card. 0 means  rejected and 1 means approved.
Our goal is to detect the outliers, i.e the approved customers who are potentially fraudulent.

Since SOMs are unsupervised, we won't use the labels for training. Instead, they will compared to the SOM results.

In [None]:
dataset = pd.read_csv('Credit_Card_Applications.csv')
X = dataset.iloc[:, :-1].values 
y = dataset.iloc[:, -1].values

In [None]:
print(dataset.info())
dataset.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 690 entries, 0 to 689
Data columns (total 16 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   CustomerID  690 non-null    int64  
 1   A1          690 non-null    int64  
 2   A2          690 non-null    float64
 3   A3          690 non-null    float64
 4   A4          690 non-null    int64  
 5   A5          690 non-null    int64  
 6   A6          690 non-null    int64  
 7   A7          690 non-null    float64
 8   A8          690 non-null    int64  
 9   A9          690 non-null    int64  
 10  A10         690 non-null    int64  
 11  A11         690 non-null    int64  
 12  A12         690 non-null    int64  
 13  A13         690 non-null    int64  
 14  A14         690 non-null    int64  
 15  Class       690 non-null    int64  
dtypes: float64(3), int64(13)
memory usage: 86.4 KB
None


Unnamed: 0,CustomerID,A1,A2,A3,A4,A5,A6,A7,A8,A9,A10,A11,A12,A13,A14,Class
0,15776156,1,22.08,11.46,2,4,4,1.585,0,0,0,1,2,100,1213,0
1,15739548,0,22.67,7.0,2,8,4,0.165,0,0,0,0,2,160,1,0
2,15662854,0,29.58,1.75,1,4,4,1.25,0,0,0,1,2,280,1,0
3,15687688,0,21.67,11.5,1,5,3,0.0,1,1,11,1,2,0,1,1
4,15715750,1,20.17,8.17,2,6,4,1.96,1,1,14,0,2,60,159,1


## Feature Scaling


In [None]:
from sklearn.preprocessing import MinMaxScaler
sc = MinMaxScaler(feature_range = (0,1))
X = sc.fit_transform(X)

In [None]:
X[0]

array([0.84268147, 1.        , 0.12526316, 0.40928571, 0.5       ,
       0.23076923, 0.375     , 0.05561404, 0.        , 0.        ,
       0.        , 1.        , 0.5       , 0.05      , 0.01212   ])

## Training the SOM


According to the doc, a general rule of thumb to set the size of the grid for a dimensionality reduction task is that it should contain 5 * sqrt(N) neurons where N is the number of samples.

Our grid_size will be sqrt(5 * 690) = 58
grid_size = grid_x * grid_y = (12, 12)

In [None]:
grid_size = 5 * math.sqrt(len(X))
grid_x = grid_y = math.ceil(math.sqrt(grid_size))

In [None]:
print(f'(grid_x, grid_y) = ({grid_x}, {grid_y}')
print(f'Number of features: {X.shape[1]}')

(grid_x, grid_y) = (12, 12
Number of features: 15
