# Task 2 (Unsupervised Learning) - Charactering Adopted Pets and Adoption Speed

In this task you should **use unsupervised learning algorithms and try to characterize pets that were actually adopted and their adoption speed**. You can use:
* **Association rule mining** to find **associations between the features and the target Adoption/AdoptionSpeed**.
* **Clustering algorithms to find similar groups of pets**. Is it possible to find groups of pets with the same/similar adoption speed.
* **Be creative and define your own unsupervised analysis!** What would it be interesting to find out ?

## Loading Datasets

In [1]:
# Load data
import pandas as pd

Bl = pd.read_csv("Balanced_Dataset.csv", index_col=0)
Bn = pd.read_csv('Binary_Dataset.csv', index_col=0)
Bin = pd.read_csv('Binary_Imbalanced_Dataset.csv', index_col=0)
#Create Multy class target
my = Bl['Target'].values
Bl = Bl.drop(['Target'], axis=1)
#Create Binary Target
by = Bn['Target'].values
Bn = Bn.drop(['Target'], axis=1)
#Create Binary Imbalanced Target
iby = Bin['Target'].values
Bin = Bin.drop(['Target'], axis=1)

## Importing Modules and Functions

In [2]:
##Import all Modules
import sys
import numpy as np
import math
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.cluster import AgglomerativeClustering, KMeans
from sklearn import metrics
from IPython.display import display
## Import Functions
from Task2_functions import *
from Task1_functions import feature_selector

## 2.1. Preprocessing Data for Association Rule Mining

For rule association we require a binary dataset, for our binary features we only need to change 0 and 1 to false or true, respectively, so it sits accordingly with the other features. For non-binary features, we need to create transactions and arrange them in a binary dataset.

### Multi-Class Target

In [8]:
MT_db = create_bin(Bl,my)
MT_db.head()

Unnamed: 0,Gender1,Gender2,Gender3,Photocat0,Photocat1,Photocat2,Photocat3,Photocat4,Photocat5,Target0,...,Type,FurLength,namecat,descriptcat,colorcat2,healthcat,statecat,MaturitySizecat,quantitycat,paidcat
0,True,False,False,False,True,False,False,False,False,False,...,True,True,False,False,True,False,True,False,False,True
1,True,False,False,False,False,True,False,False,False,True,...,True,True,False,False,True,False,False,True,False,False
2,True,False,False,False,False,False,False,False,True,False,...,True,True,False,False,True,True,True,True,False,False
3,False,True,False,False,False,False,False,False,True,False,...,True,True,True,False,True,True,False,True,False,True
4,True,False,False,False,False,False,True,False,False,False,...,True,True,True,False,True,False,True,True,False,False


### Binary Target

In [9]:
BT_db = create_bin(Bn,by)
BT_db.head()

Unnamed: 0,Gender1,Gender2,Gender3,Photocat0,Photocat1,Photocat2,Photocat3,Photocat4,Photocat5,agecat0,...,breedcat,namecat,descriptcat,colorcat2,healthcat,statecat,MaturitySizecat,quantitycat,paidcat,Target
0,True,False,False,False,True,False,False,False,False,False,...,False,False,False,True,False,True,False,False,True,True
1,True,False,False,False,False,True,False,False,False,True,...,False,False,False,True,False,False,True,False,False,True
2,True,False,False,False,False,False,False,False,True,True,...,False,False,False,True,True,True,True,False,False,True
3,False,True,False,False,False,False,False,False,True,False,...,False,True,False,True,True,False,True,False,True,True
4,True,False,False,False,False,False,True,False,False,True,...,False,True,False,True,False,True,True,False,False,True


## 2.2 Association Rules

In [12]:
# Multi-Class Target
rules = get_rules(MT_db)
if type(rules) is str: print('Warning: ',rules)
else: display(rules.head())



In [15]:
#Binary Target
rules = get_rules(BT_db)
if type(rules) is str: print('Warning: ',rules)
else: display(rules.head())

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
42,(statecat),(Target),0.564807,0.5,0.304417,0.538974,1.077949,0.022013,1.084538
132,"(statecat, Type)",(Target),0.564807,0.5,0.304417,0.538974,1.077949,0.022013,1.084538
159,"(statecat, FurLength)",(Target),0.564807,0.5,0.304417,0.538974,1.077949,0.022013,1.084538
243,"(statecat, Type, FurLength)",(Target),0.564807,0.5,0.304417,0.538974,1.077949,0.022013,1.084538


## 2.3 Association Rules - Results and Discussion 

...

## 2.4. Preprocessing Data for Clustering

Under normal circonstances, since we are working with categorical data, we would need to create a binary dataset to be able to apply clustering, because categorical features might not be encoded in a way that distances can be calculated.

For example color 1 is not more distant from color 3 than it is from color 2.

In this case there was no need to preprocess data, since our dataset only uses binary features, or features were distance measures can be applied. (Ex:age)

## 2.5. Finding Groups

### 2.5.1 Multi-Class Target

#### 2.5.1.1 Hierarchical Clustering

##### Basic Training

In [3]:
#Train Cluster
cluster = AgglomerativeClustering(n_clusters=5, affinity='euclidean', compute_full_tree='auto', linkage='ward')
cluster.fit(Bl)
#Get score
cm, metric = model_metrics(my, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

0.21552098157769733


In [6]:
#Takes to long to compute
#Plot Dendogram
# plot the top three levels of the dendrogram
plot_dendo(cluster, Bl.values, truncate_mode='level')
plt.xlabel("Number of points in node (or index of point if no parenthesis).")
plt.show()

KeyboardInterrupt: 

##### Feature Selection

In [3]:
X = feature_selector(Bl, my)

dropping 'Type' at index: 0
dropping 'Gender' at index: 0
dropping 'FurLength' at index: 0
Remaining variables:
Index(['agecat', 'breedcat', 'namecat', 'descriptcat', 'colorcat', 'colorcat2',
       'healthcat', 'statecat', 'MaturitySizecat', 'Photocat', 'quantitycat',
       'paidcat'],
      dtype='object')


In [4]:
#Train Cluster
cluster = AgglomerativeClustering(n_clusters=5, linkage='ward')
cluster.fit(X)
#Get score
cm, metric = model_metrics(my, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

Score:  0.2270526610466766
Confusion Matrix: 
       0     1     2     3     4
0  1618  1518  1374  1125   868
1   650   587   747   923   724
2  1078   999  1018  1077  1504
3   385   623   613   610   225
4   462   344   265   304   696


##### Linkage Selection

In [6]:
#Find best linkage function
cluster, score, cm = ACparm(X, my, 5)
print('Linkage: ', cluster.linkage)
print('Affinity Metric: ', cluster.affinity)
print('Score: ', round(score, 2))
print('Confusion Matrix: \n', cm)

Linkage:  ward
Affinity Metric:  euclidean
Score:  0.23
Confusion Matrix: 
       0     1     2     3     4
0  1618  1518  1374  1125   868
1   650   587   747   923   724
2  1078   999  1018  1077  1504
3   385   623   613   610   225
4   462   344   265   304   696


#### 2.5.1.2 K_means

##### Basic Training

In [28]:
kmeans = KMeans(n_clusters=5, random_state=0)
kmeans.fit(Bl)
#Get score
cm, metric = model_metrics(my, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

0.21210000021079156


##### Feature Selection

In [29]:
kmeans = KMeans(n_clusters=5, random_state=0)
kmeans.fit(X)
#Get score
cm, metric = model_metrics(my, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

0.21502359490997358


### 2.5.2 Binary Target

#### 2.5.2.1 Hierarchical Clustering

##### Basic Training

In [30]:
#Train Cluster
cluster = AgglomerativeClustering(n_clusters=2, affinity='euclidean', compute_full_tree='auto', linkage='ward')
cluster.fit(Bn)
#Get score
cm, metric = model_metrics(by, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

0.5248270376213008


##### Feature Selection

In [3]:
X = feature_selector(Bn, by)
#Train Cluster
cluster = AgglomerativeClustering(n_clusters=2, linkage='ward')
cluster.fit(X)
#Get score
cm, metric = model_metrics(by, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

dropping 'Type' at index: 0
dropping 'Gender' at index: 0
dropping 'FurLength' at index: 0
Remaining variables:
Index(['agecat', 'breedcat', 'namecat', 'descriptcat', 'colorcat', 'colorcat2',
       'healthcat', 'statecat', 'MaturitySizecat', 'Photocat', 'quantitycat',
       'paidcat'],
      dtype='object')
0.5185210072083354


##### Linkage Selection

In [5]:
#Find best linkage function
cluster, score, cm = ACparm(X, by, 2)
print('Linkage: ', cluster.linkage)
print('Affinity Metric: ', cluster.affinity)
print('Score: ', round(score, 2))
print('Confusion Matrix: \n', cm)

Linkage:  single
Score:  0.71


#### 2.5.2.2 K_means

##### Basic Training

In [6]:
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(Bn)
#Get score
cm, metric = model_metrics(by, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

0.528874856207255


##### Feature Selection

In [7]:
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(X)
#Get score
cm, metric = model_metrics(by, cluster)
print('Score: ',metric)
print('Confusion Matrix: \n', cm)

0.5185335525691381


## 2.6. Clustering - Results and Discussion 

...