# Laboratorio 4: Patrones Interesantes

Integrantes:
- Josef Ruzicka - B87095
- Julián Solís - B97634
- Derek Suarez - B97775
- Emmanuel Zúñiga - B98729

## Carga de Librerias 📚

A continuación se importan las librerias a utilizar para la implementación de las búsqueda de reglas de asociación en el dataset transaccional.

In [None]:
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder
from google.colab import drive 

## Carga del Dataset 📅

El dataset a utilizar *Pharma sales data* contiene información sobre la venta de farmacos durante cierto periodo de tiempo, para efectos del presente laboratorio, se realizará un análisis de los datos de ventas diarios. Dicho conjunto de datos puede consultarse en el siguiente enlace: 
https://www.kaggle.com/datasets/milanzdravkovic/pharma-sales-data?select=salesdaily.csv

In [None]:
# Se monta el almacenamiento de drive
drive.mount('/content/drive')
# Lectura del dataset, debe de almacenarse en una carpeta llamada "datasets" en "Mi Unidad" de Google Drive
df = pd.read_csv('/content/drive/MyDrive/datasets/salesdaily.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Una vez cargado el dataset se hace la verificación de los primeros 10 registros.

In [None]:
df.head(10)

Unnamed: 0,datum,M01AB,M01AE,N02BA,N02BE,N05B,N05C,R03,R06,Year,Month,Hour,Weekday Name
0,1/2/2014,0.0,3.67,3.4,32.4,7.0,0.0,0.0,2.0,2014,1,248,Thursday
1,1/3/2014,8.0,4.0,4.4,50.6,16.0,0.0,20.0,4.0,2014,1,276,Friday
2,1/4/2014,2.0,1.0,6.5,61.85,10.0,0.0,9.0,1.0,2014,1,276,Saturday
3,1/5/2014,4.0,3.0,7.0,41.1,8.0,0.0,3.0,0.0,2014,1,276,Sunday
4,1/6/2014,5.0,1.0,4.5,21.7,16.0,2.0,6.0,2.0,2014,1,276,Monday
5,1/7/2014,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2014,1,276,Tuesday
6,1/8/2014,5.33,3.0,10.5,26.4,19.0,1.0,10.0,0.0,2014,1,276,Wednesday
7,1/9/2014,7.0,1.68,8.0,25.0,16.0,0.0,3.0,2.0,2014,1,276,Thursday
8,1/10/2014,5.0,2.0,2.0,53.3,15.0,2.0,0.0,2.0,2014,1,276,Friday
9,1/11/2014,5.0,4.34,10.4,52.3,14.0,0.0,1.0,0.2,2014,1,276,Saturday


## Preprocesamiento de Datos 🧹

Se aplica transformación de los datos para convertir cualquier valor mayor a 1, y 0 en 1 y 0 respectivamente. Esto nos permite indicar la presencia (1) o ausencia (0) de un producto en una transacción.

In [None]:
def encode_units(x):
    if x <= 0:
        return 0
    if x >= 1:
        return 1

df = df.drop('datum', axis = 1)
df = df.drop(df.columns[-4:], axis=1)
df = df.applymap(encode_units)


In [None]:
df.head(10)

Unnamed: 0,M01AB,M01AE,N02BA,N02BE,N05B,N05C,R03,R06
0,0.0,1.0,1.0,1,1,0.0,0.0,1.0
1,1.0,1.0,1.0,1,1,0.0,1.0,1.0
2,1.0,1.0,1.0,1,1,0.0,1.0,1.0
3,1.0,1.0,1.0,1,1,0.0,1.0,0.0
4,1.0,1.0,1.0,1,1,1.0,1.0,1.0
5,0.0,0.0,0.0,0,0,0.0,0.0,0.0
6,1.0,1.0,1.0,1,1,1.0,1.0,0.0
7,1.0,1.0,1.0,1,1,0.0,1.0,1.0
8,1.0,1.0,1.0,1,1,1.0,0.0,1.0
9,1.0,1.0,1.0,1,1,0.0,1.0,


In [None]:
df = df.dropna()

## Creación de Reglas de Asociación 🔢


A continuación se presenta la generación de reglas de asociación para el dataset transaccional, para ello, se utiliza el algoritmo apriori de la libreria mlxtend.frequent_patterns.
En este caso, se define como soporte mínimo un umbral del 6%

In [None]:
frequent_itemsets = apriori(df, min_support=0.06, use_colnames=True)

In [None]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

In [None]:
rules.head(5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(M01AE),(M01AB),0.981501,0.979445,0.974306,0.99267,1.013503,0.01298,2.804287
1,(M01AB),(M01AE),0.979445,0.981501,0.974306,0.994753,1.013503,0.01298,3.526002
2,(N02BA),(M01AB),0.963515,0.979445,0.957348,0.9936,1.014452,0.013639,3.211716
3,(M01AB),(N02BA),0.979445,0.963515,0.957348,0.97744,1.014452,0.013639,1.617223
4,(N02BE),(M01AB),0.986639,0.979445,0.979445,0.992708,1.013542,0.013086,2.818969


In [None]:
print(f"Frecuencia de N02BE: {df['N02BE'].sum()}")

print(f"Frecuencia de N05C: {df['N05C'].sum()}")

Frecuencia de N02BE: 1920
Frecuencia de N05C: 616.0


In [None]:
apriori(df, min_support=0.9, use_colnames=True)

Unnamed: 0,support,itemsets
0,0.979445,(M01AB)
1,0.981501,(M01AE)
2,0.963515,(N02BA)
3,0.986639,(N02BE)
4,0.979445,(N05B)
5,0.974306,"(M01AE, M01AB)"
6,0.957348,"(N02BA, M01AB)"
7,0.979445,"(N02BE, M01AB)"
8,0.972251,"(N05B, M01AB)"
9,0.958376,"(M01AE, N02BA)"


## Selección con Otras Medidas 📏

A continuación, se presentan dos reglas de asociación basadas en las medidas de Confidence y Support.
Posteriormente se presenta un análisis en búsqueda de patrones interesantes. 


In [None]:
confidence_rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=1)

In [None]:
conviction_rules = association_rules(frequent_itemsets, metric="conviction", min_threshold=1)

In [None]:
print("Reglas de asociación con medida de Confidence")
confidence_rules.head(5)

Reglas de asociación con medida de Confidence


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(M01AB),(N02BE),0.979445,0.986639,0.979445,1.0,1.013542,0.013086,inf
1,(M01AE),(N02BE),0.981501,0.986639,0.981501,1.0,1.013542,0.013114,inf
2,(N02BA),(N02BE),0.963515,0.986639,0.963515,1.0,1.013542,0.012873,inf
3,(N05B),(N02BE),0.979445,0.986639,0.979445,1.0,1.013542,0.013086,inf
4,(N05C),(N02BE),0.316547,0.986639,0.316547,1.0,1.013542,0.004229,inf


In [None]:
print("Reglas de asociación con medida de Conviction")
conviction_rules.head(5)

Reglas de asociación con medida de Conviction


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(M01AE),(M01AB),0.981501,0.979445,0.974306,0.99267,1.013503,0.01298,2.804287
1,(M01AB),(M01AE),0.979445,0.981501,0.974306,0.994753,1.013503,0.01298,3.526002
2,(N02BA),(M01AB),0.963515,0.979445,0.957348,0.9936,1.014452,0.013639,3.211716
3,(M01AB),(N02BA),0.979445,0.963515,0.957348,0.97744,1.014452,0.013639,1.617223
4,(N02BE),(M01AB),0.986639,0.979445,0.979445,0.992708,1.013542,0.013086,2.818969


Mínimo y máximo de 'lift' para las reglas de asociación generadas según las medidas empleadas.

In [None]:
print(min(confidence_rules['lift']))
print(max(confidence_rules['lift']))

1.0135416666666666
1.0135416666666666


In [None]:
print(min(conviction_rules['lift']))
print(max(conviction_rules['lift']))

1.000035267148651
1.0235706897434584


## Conclusiones 💡

Según las reglas de asociación obtenidas a partir de las medidas utilizadas, no se encuentran patrones interesantes ni engañosos en el dataset transaccional. Ya que el valor de la métrica 'lift' es muy cercano a 1. Esta situación, nos imposibilita la búsqueda de patrones interesantes. 
Dado que el lift es una medida importante para la búsqueda de patrones interesantes, es fundamental considerar la significancia de su valor para la toma de decisiones con respecto a las transacciones.