## Sample MPC Notebook

This notebook runs a sample use of the mpc algorithm using iris data. The functionality captured in this notebook is implemented in the following files:

- `mpc.py`: Contains main function for doing MultiPolytope Clustering
- `mpc_helpers.py`: Helper functions to support mpc function
- `mpc_init.py`: Code to run initialization scheme (alternating minimization)
- `cluster_assignment.py`: Code to perform clustering sub-routine of initialization scheme
- `pairwise_integer_cut.py`: Code to perform separating hyperplane sub-routing of initialization scheme

### Code Demonstration

We'll start by importing the main function from `mpc.py`.

In [2]:
from mpc import *
import pandas as pd

In [3]:
#Read in date ('uci_clean' contains normalized code so we don't need to run minmax scaling)
data = pd.read_csv('uci_clean/iris.csv')

#Note the data has 4 features and is normalized between 0 and 1
data.describe()

Unnamed: 0,0,1,2,3
count,150.0,150.0,150.0,150.0
mean,0.428704,0.439167,0.467571,0.457778
std,0.230018,0.180664,0.299054,0.317984
min,0.0,0.0,0.0,0.0
25%,0.222222,0.333333,0.101695,0.083333
50%,0.416667,0.416667,0.567797,0.5
75%,0.583333,0.541667,0.694915,0.708333
max,1.0,1.0,1.0,1.0


### Run MPC

To perform clustering we just need to run the MPCPolytopeOpt function. It takes the following as input:

- data: data (row samples, column features) in numpy format
- k: number of clusters for initialization schem
- metric: Clustering metric to optimize
- card: Number of non-zero coefficients in separating hyperplanes
- M: Maximum integer value for separating hyperplanes
- max_k: Maximum number of clusters that we can generate during local search
- verbose: whether to print intermediary updates

The output is a set of ordered data (rearranged data), the cluster labels, and the separating hyperplanes

In [4]:
#This code clusters iris data into 2 clusters using silhouette
X, labels, w, b = MPCPolytopeOpt(data.to_numpy(), 4, 
                                           metric = 'silhouette', 
                                           card = 1, 
                                           M = 1, 
                                           verbose=True)

print('Silhouette: ', silhouette_score(X, labels))

Starting sil (unfiltered):  0.3828936107757873
Filtered points:  1
Starting sil (filtered):  0.3723508872545404
New perf:  0.3886936058106495
New perf:  0.3916194856119582
New perf:  0.42810005005290064
New perf:  0.43524633748330815
New perf:  0.43639426387667335
New perf:  0.45476875579708675
New perf:  0.4771240519783585
minor improvement
Last sil 0.4771240519783585
Current sil 0.4771240519783585
Silhouette:  0.47663104192059375
