# <center>Data Mining Project Code</center>

** **
## <center>*06 - Model-based Notebook*</center>

** **

In this notebook, we continue our customer segmentation using two Model-based clustering methods: Gaussian Mixture Models and Hidden Markov Models. Each algorithm is going to be applied to different datasets which suffered from different transformations.


The members of the `team` are:
- Ana Farinha  - 20211514
- António Oliveira - 20211595
- Mariana Neto - 20211527
- Salvador Domingues - 20240597


# Table of Contents

<a class="anchor" id="top"></a>


1. [Importing Libraries & Data](#1.-Importing-Libraries-&-Data) <br><br>



# 1. Importing Libraries & Data

In [1]:
# Data manipulation
import pandas as pd

# Clustering algorithms
#%pip install hmmlearn
from sklearn.mixture import GaussianMixture
from hmmlearn.hmm import GaussianHMM

# Utils
from functions import *

In [2]:
# change data file
data = pd.read_csv('data/data_capped.csv', index_col = "customer_id")
data.head(3)

Unnamed: 0_level_0,customer_age,vendor_count,product_count,is_chain,first_order,last_order,CUI_American,CUI_Asian,CUI_Beverages,CUI_Cafe,...,20_23h,customer_region,last_promo,payment_method,promo_DELIVERY,promo_DISCOUNT,promo_FREEBIE,pay_CARD,pay_CASH,is_repeat_customer
customer_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1b8f824d5e,18.0,2.0,5.0,1.0,0,1,0.0,0.0,0.0,0.0,...,0.0,2360,DELIVERY,DIGI,1,0,0,0,0,1
5d272b9dcb,17.0,2.0,2.0,2.0,0,1,12.82,6.39,0.0,0.0,...,0.0,8670,DISCOUNT,DIGI,0,1,0,0,0,1
f6d1b2ba63,38.0,1.0,2.0,2.0,0,1,9.2,0.0,0.0,0.0,...,0.0,4660,DISCOUNT,CASH,0,1,0,0,1,1


In [3]:
num_variables = ['customer_age', 'vendor_count', 'product_count', 'is_chain',
       'first_order', 'last_order', 'CUI_American', 'CUI_Asian',
       'CUI_Beverages', 'CUI_Cafe', 'CUI_Chicken Dishes', 'CUI_Chinese',
       'CUI_Desserts', 'CUI_Healthy', 'CUI_Indian', 'CUI_Italian',
       'CUI_Japanese', 'CUI_Noodle Dishes', 'CUI_OTHER',
       'CUI_Street Food / Snacks', 'CUI_Thai', 'days_between', 'total_orders',
       'avg_order_hour', 'total_spend', 'avg_spend_prod',
       '1_7h', '8_14h', '15_19h', '20_23h']

# 2. Model-based

<a href="#top">Top &#129033;</a>

## 2.1 Gaussian Mixture Models

In [4]:
gmm = GaussianMixture(n_components=3, covariance_type='full')
gmm.fit(data[num_variables])

In [5]:
data['gmb_clusters'] = gmm.predict(data[num_variables])

In [6]:
# plot_dim_reduction(umap_embedding, targets=data['gmb_clusters'], technique='UMAP')

## 2.2 Hidden Markov Models

In [7]:
hmm = GaussianHMM(n_components=3, covariance_type='full')
hmm.fit(data[num_variables])

In [8]:
data['hmm_clusters'] = hmm.predict(data[num_variables])

In [9]:
# plot_dim_reduction(umap_embedding, targets = data['hmm_clusters'], 
#                    technique='UMAP')