#### Case Study
##### _Clustering_
1. Lakukan clustering pada data `abalone_cluster.csv`
2. Gunakan model berikut sebagai percobaan (dengan mencoba 2 _cluster_):
 * _K-Means_
 * _Agglomerative Hierarchical Clustering_
3. Lakukan evaluasi dari model tersebut dengan _silhouette score_.


##### _Association rules_
1. _Load_ data pada `flowers.txt`
2. Ubah data sehingga memiliki format _one hot encoding_
3. Dapatkan _frequent itemsets_ dengan _support_ di atas 20%
4. Buat _rules_ dengan _metric_
 * _Confidence_ > 0.6
 * _Lift_ > 1.5

# _Clustering_
1. Lakukan clustering pada data `abalone_cluster.csv`

In [1]:
import pandas as pd 

data = pd.read_csv("..\\data\\input\\abalone_cluster.csv")
data.head()

Unnamed: 0,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,ring
0,0.455,0.365,0.095,0.514,0.2245,0.101,0.15,15
1,0.35,0.265,0.09,0.2255,0.0995,0.0485,0.07,7
2,0.53,0.42,0.135,0.677,0.2565,0.1415,0.21,9
3,0.44,0.365,0.125,0.516,0.2155,0.114,0.155,10
4,0.33,0.255,0.08,0.205,0.0895,0.0395,0.055,7


2. Gunakan model berikut sebagai percobaan (dengan mencoba 2 _cluster_):
 * _K-Means_
 * _Agglomerative Hierarchical Clustering_

In [2]:
#normalisasi atribut
from sklearn.preprocessing import MinMaxScaler

scaler= MinMaxScaler()
scaler.fit(data)

MinMaxScaler()

In [3]:
data_new = scaler.transform(data)
data_new

array([[0.51351351, 0.5210084 , 0.0840708 , ..., 0.1323239 , 0.14798206,
        0.5       ],
       [0.37162162, 0.35294118, 0.07964602, ..., 0.06319947, 0.06826109,
        0.21428571],
       [0.61486486, 0.61344538, 0.11946903, ..., 0.18564845, 0.2077728 ,
        0.28571429],
       ...,
       [0.70945946, 0.70588235, 0.18141593, ..., 0.37788018, 0.30543099,
        0.28571429],
       [0.74324324, 0.72268908, 0.13274336, ..., 0.34298881, 0.29347285,
        0.32142857],
       [0.85810811, 0.84033613, 0.17256637, ..., 0.49506254, 0.49177877,
        0.39285714]])

In [4]:
data_normalized = pd.DataFrame(data_new, columns=data.columns)
data_normalized.head()

Unnamed: 0,length,diameter,height,whole_weight,shucked_weight,viscera_weight,shell_weight,ring
0,0.513514,0.521008,0.084071,0.181335,0.150303,0.132324,0.147982,0.5
1,0.371622,0.352941,0.079646,0.079157,0.066241,0.063199,0.068261,0.214286
2,0.614865,0.613445,0.119469,0.239065,0.171822,0.185648,0.207773,0.285714
3,0.493243,0.521008,0.110619,0.182044,0.14425,0.14944,0.152965,0.321429
4,0.344595,0.336134,0.070796,0.071897,0.059516,0.05135,0.053313,0.214286


In [5]:
from sklearn.cluster import AgglomerativeClustering, KMeans

agglo = AgglomerativeClustering(n_clusters=2)
kmeans = KMeans(n_clusters=2)

In [6]:
#KMeans
model_kmeans= kmeans.fit(data_normalized)
model_kmeans

KMeans(n_clusters=2)

In [7]:
model_kmeans.predict(data_normalized)

array([1, 1, 1, ..., 0, 0, 0])

In [8]:
#Agglomerative Hierarchical Clustering
agglo.fit_predict(data_normalized)

array([1, 1, 0, ..., 0, 0, 0], dtype=int64)

3. Lakukan evaluasi dari model tersebut dengan _silhouette score_.

In [9]:
from sklearn.metrics import silhouette_score

print("KMeans")
print(silhouette_score(data_normalized, model_kmeans.predict(data_normalized)))
print("\n")
print("Agglomerative Hierarchical Clustering")
print(silhouette_score(data_normalized, agglo.fit_predict(data_normalized)))

KMeans
0.49179955956504934


Agglomerative Hierarchical Clustering
0.4765689292371433


# _Association rules_
1. _Load_ data pada `flowers.txt`

In [10]:
with open("..\\data\\input\\flowers.txt") as f:
    flowers= f.read()
print(flowers)

Bougenvile, Dandelion, Lavender
Orchid, Tulip, Bougenvile, Lotus
Lotus
Orchid, Tulip, Sakura, Lotus, Dandelion, Lavender
Orchid, Rose, Tulip, Bougenvile, Lavender
Sunflower, Sakura, Bougenvile, Dandelion
Jasmine, Rose, Tulip, Sunflower, Lavender
Orchid, Jasmine, Rose, Sunflower, Sakura, Bougenvile, Lotus, Dandelion
Orchid, Tulip, Sunflower, Lotus, Lavender
Orchid, Jasmine, Sunflower, Bougenvile
Orchid, Jasmine, Tulip, Sunflower, Bougenvile, Lotus, Lavender
Orchid, Jasmine, Rose, Dandelion
Jasmine, Rose, Sunflower, Bougenvile, Lotus, Lavender
Orchid, Jasmine, Tulip, Sakura, Lavender
Jasmine, Rose, Lotus, Dandelion
Jasmine, Tulip, Sunflower, Sakura, Dandelion, Lavender
Orchid, Jasmine, Sunflower, Sakura, Lavender
Jasmine, Rose, Sunflower, Sakura, Bougenvile, Lotus, Dandelion
Tulip, Sakura, Bougenvile, Lotus, Dandelion
Sakura, Lotus, Dandelion, Lavender
Jasmine, Rose, Tulip, Lotus, Dandelion
Jasmine, Tulip, Sunflower, Bougenvile, Lavender
Jasmine, Sunflower, Bougenvile, Lavender
Orchid, T

In [11]:
flowers = flowers.split("\n")
flowers = [flw.split(",") for flw in flowers]
flowers

[['Bougenvile', ' Dandelion', ' Lavender'],
 ['Orchid', ' Tulip', ' Bougenvile', ' Lotus'],
 ['Lotus'],
 ['Orchid', ' Tulip', ' Sakura', ' Lotus', ' Dandelion', ' Lavender'],
 ['Orchid', ' Rose', ' Tulip', ' Bougenvile', ' Lavender'],
 ['Sunflower', ' Sakura', ' Bougenvile', ' Dandelion'],
 ['Jasmine', ' Rose', ' Tulip', ' Sunflower', ' Lavender'],
 ['Orchid',
  ' Jasmine',
  ' Rose',
  ' Sunflower',
  ' Sakura',
  ' Bougenvile',
  ' Lotus',
  ' Dandelion'],
 ['Orchid', ' Tulip', ' Sunflower', ' Lotus', ' Lavender'],
 ['Orchid', ' Jasmine', ' Sunflower', ' Bougenvile'],
 ['Orchid',
  ' Jasmine',
  ' Tulip',
  ' Sunflower',
  ' Bougenvile',
  ' Lotus',
  ' Lavender'],
 ['Orchid', ' Jasmine', ' Rose', ' Dandelion'],
 ['Jasmine', ' Rose', ' Sunflower', ' Bougenvile', ' Lotus', ' Lavender'],
 ['Orchid', ' Jasmine', ' Tulip', ' Sakura', ' Lavender'],
 ['Jasmine', ' Rose', ' Lotus', ' Dandelion'],
 ['Jasmine', ' Tulip', ' Sunflower', ' Sakura', ' Dandelion', ' Lavender'],
 ['Orchid', ' Jasmi

2. Ubah data sehingga memiliki format _one hot encoding_

In [13]:
from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

flowers_mlb = mlb.fit_transform(flowers)
columns_name = mlb.classes_

data = pd.DataFrame(flowers_mlb, columns=columns_name)
data

Unnamed: 0,Bougenvile,Dandelion,Jasmine,Lavender,Lotus,Rose,Sakura,Sunflower,Tulip,Bougenvile.1,Jasmine.1,Lotus.1,Orchid,Rose.1,Sakura.1,Sunflower.1,Tulip.1
0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0
1,1,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,0,1,0,1,1,0,1,0,1,0,0,0,1,0,0,0,0
4,1,0,0,1,0,1,0,0,1,0,0,0,1,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
70,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0
71,0,1,0,0,1,1,1,0,0,0,1,0,0,0,0,0,0
72,1,0,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0
73,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,0


3. Dapatkan _frequent itemsets_ dengan _support_ di atas 20%

In [14]:
from mlxtend.frequent_patterns import apriori

frequent_flowers = apriori(data, min_support= 0.2, use_colnames= True)
frequent_flowers

Unnamed: 0,support,itemsets
0,0.493333,( Bougenvile)
1,0.546667,( Dandelion)
2,0.28,( Jasmine)
3,0.453333,( Lavender)
4,0.466667,( Lotus)
5,0.346667,( Rose)
6,0.546667,( Sakura)
7,0.48,( Sunflower)
8,0.48,( Tulip)
9,0.226667,(Jasmine)


4. Buat _rules_ dengan _metric_
 * _Confidence_ > 0.6
 * _Lift_ > 1.5

In [18]:
#_Confidence_ > 0.6
from mlxtend.frequent_patterns import association_rules

rules_conf = association_rules(frequent_flowers, metric="confidence", min_threshold=0.6)
rules_conf

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,( Rose),( Dandelion),0.346667,0.546667,0.253333,0.730769,1.336773,0.063822,1.68381
1,( Sakura),( Dandelion),0.546667,0.546667,0.346667,0.634146,1.160024,0.047822,1.239111
2,( Dandelion),( Sakura),0.546667,0.546667,0.346667,0.634146,1.160024,0.047822,1.239111
3,( Jasmine),(Orchid),0.28,0.506667,0.28,1.0,1.973684,0.138133,inf
4,( Rose),( Lotus),0.346667,0.466667,0.24,0.692308,1.483516,0.078222,1.733333
5,( Lotus),( Tulip),0.466667,0.48,0.28,0.6,1.25,0.056,1.3
6,( Lotus),(Orchid),0.466667,0.506667,0.28,0.6,1.184211,0.043556,1.233333
7,( Rose),(Orchid),0.346667,0.506667,0.253333,0.730769,1.442308,0.077689,1.832381
8,( Sunflower),(Orchid),0.48,0.506667,0.32,0.666667,1.315789,0.0768,1.48
9,(Orchid),( Sunflower),0.506667,0.48,0.32,0.631579,1.315789,0.0768,1.411429


In [19]:
# _Lift_ > 1.5

rules_lift = association_rules(frequent_flowers, metric="lift", min_threshold=1.5)
rules_lift

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,( Jasmine),(Orchid),0.28,0.506667,0.28,1.0,1.973684,0.138133,inf
1,(Orchid),( Jasmine),0.506667,0.28,0.28,0.552632,1.973684,0.138133,1.609412
2,"( Rose, Dandelion)",(Orchid),0.253333,0.506667,0.2,0.789474,1.558172,0.071644,2.343333
3,"(Orchid, Dandelion)",( Rose),0.293333,0.346667,0.2,0.681818,1.966783,0.098311,2.053333
4,( Rose),"(Orchid, Dandelion)",0.346667,0.293333,0.2,0.576923,1.966783,0.098311,1.670303
5,(Orchid),"( Rose, Dandelion)",0.506667,0.253333,0.2,0.394737,1.558172,0.071644,1.233623
