# <center>Data Mining Project Code</center>

** **
## <center>*05 - Neural Network-based Notebook*</center>

** **

In this notebook, we continue our customer segmentation using Neural Network-based clustering (SOM). In this case, some additional algorithms are going to be applied on the results of the SOM. Each algorithm is going to be applied to different datasets which suffered from different transformations.


The members of the `team` are:
- Ana Farinha  - 20211514
- António Oliveira - 20211595
- Mariana Neto - 20211527
- Salvador Domingues - 20240597


# Table of Contents

<a class="anchor" id="top"></a>


1. [Importing Libraries & Data](#1.-Importing-Libraries-&-Data) <br><br>



# 1. Importing Libraries & Data

In [None]:
# Data manipulation
import pandas as pd
import numpy as np

# Clustering algorithms
from minisom import MiniSom


# Visualizations
import matplotlib.pyplot as plt
from matplotlib.patches import RegularPolygon
from matplotlib import cm, colors as mpl_colors, colorbar
from mpl_toolkits.axes_grid1 import make_axes_locatable

# Utils
from functions import *

In [2]:
# change data file
data = pd.read_csv('data/data_capped.csv', index_col = "customer_id")
data.head(3)

Unnamed: 0_level_0,customer_age,vendor_count,product_count,is_chain,first_order,last_order,CUI_American,CUI_Asian,CUI_Beverages,CUI_Cafe,...,20_23h,customer_region,last_promo,payment_method,promo_DELIVERY,promo_DISCOUNT,promo_FREEBIE,pay_CARD,pay_CASH,is_repeat_customer
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1b8f824d5e,18.0,2.0,5.0,1.0,0,1,0.0,0.0,0.0,0.0,...,0.0,2360,DELIVERY,DIGI,1,0,0,0,0,1
5d272b9dcb,17.0,2.0,2.0,2.0,0,1,12.82,6.39,0.0,0.0,...,0.0,8670,DISCOUNT,DIGI,0,1,0,0,0,1
f6d1b2ba63,38.0,1.0,2.0,2.0,0,1,9.2,0.0,0.0,0.0,...,0.0,4660,DISCOUNT,CASH,0,1,0,0,1,1


In [3]:
num_variables = ['customer_age', 'vendor_count', 'product_count', 'is_chain',
       'first_order', 'last_order', 'CUI_American', 'CUI_Asian',
       'CUI_Beverages', 'CUI_Cafe', 'CUI_Chicken Dishes', 'CUI_Chinese',
       'CUI_Desserts', 'CUI_Healthy', 'CUI_Indian', 'CUI_Italian',
       'CUI_Japanese', 'CUI_Noodle Dishes', 'CUI_OTHER',
       'CUI_Street Food / Snacks', 'CUI_Thai', 'days_between', 'total_orders',
       'avg_order_hour', 'total_spend', 'avg_spend_prod',
       '1_7h', '8_14h', '15_19h', '20_23h']

# 2. Neural Network-based

<a href="#top">Top &#129033;</a>

## 2.1 SOM

In [None]:
# Training a Self-Organized Map with a 15 by 15 grid, using 52 features.
som = MiniSom(
    15, 
    15, 
    52,
    sigma=0.5,
    learning_rate=1,
    neighborhood_function='gaussian',
    random_seed=42
    )

In [None]:
# Set the seed of Numpy just to be sure of replicability
np.random.seed(42)

num_iterations = 1000

q_errors = []
for i in range(1, num_iterations):
    som.train_batch(data[num_variables], i)
    q_errors.append(som.quantization_error(data[num_variables]))

In [None]:
plt.plot(q_errors)

In [None]:
# Setting up the same som again and retraining it 
som = MiniSom(
    15, 15, 52, sigma=0.5, 
    learning_rate=1, neighborhood_function='gaussian', random_seed=42)
som.train(data[num_variables], 800)

In [None]:
def plot_som_hexagons(som,
                      matrix,
                      cmap=cm.Blues,
                      figsize=(20,20),
                      annotate=True,
                      title="SOM Matrix",
                      cbar_label="Color Scale"
                ):

    xx, yy = som.get_euclidean_coordinates()

    f = plt.figure(figsize=figsize)
    ax = f.add_subplot(111)

    ax.set_aspect('equal')
    ax.set_title(title, fontsize=20)

    colornorm = mpl_colors.Normalize(vmin=np.min(matrix), 
                                     vmax=np.max(matrix))

    for i in range(xx.shape[0]):
        for j in range(xx.shape[1]):
            wy = yy[(i, j)] * np.sqrt(3) / 2
            hexagon = RegularPolygon((xx[(i, j)], wy), 
                                 numVertices=6, 
                                 radius=.95 / np.sqrt(3),
                                 facecolor=cmap(colornorm(matrix[i, j])), 
                                 alpha=1)
            ax.add_patch(hexagon)

            if annotate:
                annot_vals = np.round(matrix[i, j],2)
                if annot_vals > 1:
                    annot_vals = int(annot_vals)
                
                ax.text(xx[(i, j)], wy, annot_vals, 
                        ha='center', va='center', 
                        fontsize=figsize[1], 
                        )

    ax.margins(.05)
    ax.axis("off")

    ## Create a Mappable object
    cmap_sm = plt.cm.ScalarMappable(cmap=cmap, norm=colornorm)
    cmap_sm.set_array([])
    
    divider = make_axes_locatable(plt.gca())
    ax_cb = divider.new_horizontal(size="2%", pad=0)    
    cb1 = colorbar.ColorbarBase(ax_cb, 
                                orientation='vertical', 
                                alpha=1,
                                mappable=cmap_sm
                               )
    cb1.ax.get_yaxis().labelpad = 16
    cb1.ax.set_ylabel(cbar_label, fontsize=18)
    plt.gcf().add_axes(ax_cb)

    return plt

In [None]:
umatrix = som.distance_map(scaling='mean')

fig = plot_som_hexagons(som, umatrix, cmap=cm.RdYlBu_r, title="SOM U-Matrix")
plt.show()