<a href="https://colab.research.google.com/github/VirGonzalez/AprendAutomatico/blob/main/Perspectiva_Pr%C3%A1ctica_Reglas_de_asociaci%C3%B3n.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Preprocesamiento de los datos
La función apriori espera datos en un formato "one-hot encoded". El TransactionEncoder permite convertir transacciones con items en un formato procesable por la implementación de Apriori en MLxtend. 

In [1]:
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset) #extrae conocimiento de datos con fit() y efectua la transformacion. Cuantos items validos hay y transform pone true/false
df = pd.DataFrame(te_ary, columns=te.columns_) #arreglo numpy se convierte en dataframe
df

Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,False,False,False,True,False,True,True,True,True,False,True
1,False,False,True,True,False,True,False,True,True,False,True
2,True,False,False,True,False,True,True,False,False,False,False
3,False,True,False,False,False,True,True,False,False,True,True
4,False,True,False,True,True,True,False,False,True,False,False


Otra alternativa, sería contar con atributo nominales. En esa situación, podemos utilizar la función *get_dummies* provista por pandas para generar la representación "one-hot encoded". La función get_dummies requiere que le indiquemos que columnas queremos trasformar (*columns=*) y cual será el prefijo de las nuevas columnas(*prefix=*).

In [2]:
brand = pd.Series(['Ferrari', 'Honda', 'Fiat']) #indico columnas porque no siempre quiero todas las columnas (en clasificacion)
price = pd.Series(['high', 'medium', 'low'])
df_brand = pd.DataFrame({'Brand': brand, 'Price': price})
df_brand

Unnamed: 0,Brand,Price
0,Ferrari,high
1,Honda,medium
2,Fiat,low


In [3]:
df_brand_dummies = pd.get_dummies(df_brand, prefix=['brand', 'price'],columns=['Brand', 'Price']) #transforma a formato one_hot
df_brand_dummies

Unnamed: 0,brand_Ferrari,brand_Fiat,brand_Honda,price_high,price_low,price_medium
0,1,0,0,1,0,0
1,0,0,1,0,0,1
2,0,1,0,0,1,0


#Extracción de Itemsets Frecuentes
Continuamos en con el primer dataset. Para la extracción de itemsets frecuentes utilizamos la función *apriori()*, a la cual le debemos indicar los datos que queremos procesar y el soporte mínimo.

In [4]:
from mlxtend.frequent_patterns import apriori

apriori(df, min_support=0.6)

Unnamed: 0,support,itemsets
0,0.8,(3)
1,1.0,(5)
2,0.6,(6)
3,0.6,(8)
4,0.6,(10)
5,0.8,"(3, 5)"
6,0.6,"(8, 3)"
7,0.6,"(5, 6)"
8,0.6,"(8, 5)"
9,0.6,"(10, 5)"


Para visualizar los nombres de las columnas, indicamos *use_colnames=True*

In [5]:
apriori(df, min_support=0.6, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Eggs, Kidney Beans)"
6,0.6,"(Eggs, Onion)"
7,0.6,"(Milk, Kidney Beans)"
8,0.6,"(Kidney Beans, Onion)"
9,0.6,"(Yogurt, Kidney Beans)"


#Selección y filtrado de itemsets frecuentes
Podemos analizar y filtrar los itemsets frecuentes. Dado que el conjunto de itemset frecuentes es un DataFrame de Pandas podemos usar sus funcionalidades para filtrar los resultados. 
Primero, agregamos una nueva columna que almacena la longitud de cada itemset.

In [6]:
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.8,(Eggs),1
1,1.0,(Kidney Beans),1
2,0.6,(Milk),1
3,0.6,(Onion),1
4,0.6,(Yogurt),1
5,0.8,"(Eggs, Kidney Beans)",2
6,0.6,"(Eggs, Onion)",2
7,0.6,"(Milk, Kidney Beans)",2
8,0.6,"(Kidney Beans, Onion)",2
9,0.6,"(Yogurt, Kidney Beans)",2


Seleccionamos los resultados que satisfacen un determinado criterio.

In [7]:
frequent_itemsets[ (frequent_itemsets['length'] == 3) &
                   (frequent_itemsets['support'] >= 0.6) ]

Unnamed: 0,support,itemsets,length
10,0.6,"(Eggs, Kidney Beans, Onion)",3


Similarmente, podemos filtrar itemsets con ciertos elementos.

In [8]:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'Onion', 'Eggs'} ]

Unnamed: 0,support,itemsets,length
6,0.6,"(Eggs, Onion)",2


o utilizando una función lambda

In [9]:
frequent_itemsets[ frequent_itemsets['itemsets'].apply(lambda x: 'Onion' in x)]

Unnamed: 0,support,itemsets,length
3,0.6,(Onion),1
6,0.6,"(Eggs, Onion)",2
8,0.6,"(Kidney Beans, Onion)",2
10,0.6,"(Eggs, Kidney Beans, Onion)",3


#Generación de reglas de asociación a partir de los itemsets frecuentes
La función *association_rules* permite generar un conjunto de reglas de asociación a partir de un conjunto de itemsets frecuentes. Para ello, debemos indicar la métrica que deseamos utilizar y el umbral minimo. 

In [10]:
from mlxtend.frequent_patterns import association_rules

# utilizamos confianza = 0.7
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf
1,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0
2,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
3,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf
4,(Milk),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
5,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
6,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
7,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6
8,"(Eggs, Onion)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf
9,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf


#Posprocesamiento de reglas de asociación
Agregamos una columna con la longitud del antecedente

In [11]:
rules["antecedent_len"] = rules["antecedents"].apply(lambda x: len(x))
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
0,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,0.0,inf,1
1,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,0.0,1.0,1
2,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1
3,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,1
4,(Milk),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,1
5,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,1
6,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,1
7,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2
8,"(Eggs, Onion)",(Kidney Beans),0.6,1.0,0.6,1.0,1.0,0.0,inf,2
9,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2


Nos quedamos con aquellas reglas que poseen al menos dos items en su antecedente, una confianza minima de 0.75 y un lift minimo de 1.2.

In [12]:
rules[ (rules['antecedent_len'] >= 2) &
       (rules['confidence'] > 0.75) &
       (rules['lift'] > 1.2) ]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
9,"(Kidney Beans, Onion)",(Eggs),0.6,0.8,0.6,1.0,1.25,0.12,inf,2


Finalmente, del mismo modo que haciamos con los itemsets frecuentes, podemos filtrar las reglas por los items que hay en su antecedente o consecuente. 

In [13]:
# el antecedente está formado por 'Eggs' y 'Kidney Beans'
rules[rules['antecedents'] == {'Eggs', 'Kidney Beans'}]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
7,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2


In [14]:
# en el consecuente está 'Onion'
rules[rules['consequents'].apply(lambda x: 'Onion' in x)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,antecedent_len
2,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1
7,"(Eggs, Kidney Beans)",(Onion),0.8,0.6,0.6,0.75,1.25,0.12,1.6,2
10,(Eggs),"(Kidney Beans, Onion)",0.8,0.6,0.6,0.75,1.25,0.12,1.6,1
