# Reglas de asociacion

Las reglas de asociación son una técnica fundamental en la minería de datos que permite descubrir relaciones interesantes entre variables en grandes conjuntos de datos. Desarrolladas originalmente para analizar carritos de compras en supermercados, estas reglas identifican patrones del tipo "si A, entonces B" (A → B), donde A es el antecedente y B el consecuente. La fortaleza de estas reglas se mide mediante tres métricas principales: el soporte (frecuencia de aparición del conjunto {A,B}), la confianza (probabilidad condicional de B dado A) y el lift (que mide qué tanto más frecuente es B cuando A está presente). El algoritmo más conocido para encontrar estas reglas es el Apriori, que utiliza un enfoque de nivel por nivel para generar conjuntos de elementos frecuentes y posteriormente derivar las reglas de asociación.

## Instalando paquetes de python

In [7]:
%%capture
!pip install mlxtend psycopg2-binary 

## Leyendo datos del lago de datos

Primero se comienza importanto algunas funciones utiles de la librearia de PySpark. Se utiliza pyspark para la lectura de los archivos, debido a que tiene funciones por defecto que permiten leer archivos que se encuentran particionados y permite filtrar segun particionamiento para mejorar el rendimiento.

In [8]:
from pyspark.sql import SparkSession
from pyspark.sql.functions import year, month, day
from pyspark.sql.functions import col
from pyspark.sql.types import IntegerType

Luego, se procede a leer los archivos de los eventos. Se filtran solo aquellos que terminaron como transaccion.

In [9]:
spark = SparkSession.builder.appName("filtroEventos").config("spark.jars.packages", "org.postgresql:postgresql:42.6.0").getOrCreate()
path="parquet_transformado/eventos"
evento_nombre='transaction'
if evento_nombre is not None:
    final_path = f"{path}/event={evento_nombre}"
data = spark.read.option("header", True)\
    .option("inferSchema", "true") \
    .option("basePath", path) \
    .csv(final_path)

In [10]:
data.show(5)

+-------------+---------+------+-------------+-----------+
|    timestamp|visitorid|itemid|transactionid|      event|
+-------------+---------+------+-------------+-----------+
|1433222276276|   599528|356475|       4000.0|transaction|
|1433193500981|   121688| 15335|      11117.0|transaction|
|1433193915008|   552148| 81345|       5444.0|transaction|
|1433176736375|   102019|150318|      13556.0|transaction|
|1433174518180|   189384|310791|       7244.0|transaction|
+-------------+---------+------+-------------+-----------+
only showing top 5 rows



Luego, se procede a leer los arhcivos de los productos. En este caso se pueden aplicar filtros segun categoria o marca, en este escenario se leeran todas las categorias y todas las marcas.

In [11]:
spark = SparkSession.builder.appName("filtroProductos").config("spark.jars.packages", "org.postgresql:postgresql:42.6.0").getOrCreate()
path="parquet_transformado/productos"
categoria_id='*'
marca_id='*'
if categoria_id is not None:
    final_path = f"{path}/categoria_id={categoria_id}"
if marca_id is not None:
    final_path = f"{final_path}/marca_id={marca_id}"
data_productos = spark.read.option("header", True)\
    .option("inferSchema", "true") \
    .option("basePath", path) \
    .csv(final_path)

In [12]:
data_productos.show(5)

+------+-------------+-------+------+--------+---------+------------+--------+
|itemid|       nombre|volumen|precio|   marca|categoria|categoria_id|marca_id|
+------+-------------+-------+------+--------+---------+------------+--------+
|410389|Generic Drink|    750| 22.49|generico| generico|           0|       0|
|196180|Generic Drink|    300| 11.03|generico| generico|           0|       0|
|   199|Generic Drink|    200|  7.08|generico| generico|           0|       0|
| 40702|Generic Drink|    750| 98.99|generico| generico|           0|       0|
| 12728|Generic Drink|    300| 11.03|generico| generico|           0|       0|
+------+-------------+-------+------+--------+---------+------------+--------+
only showing top 5 rows



Para manipular los datos se procede a convertir los pyspark dataframes a pandas, dado a que no es un gran volumen de datos. 

In [13]:
df_productos = data_productos.toPandas()
df = data.toPandas()

  if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
  if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
  if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):
  if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version):


Antes de continuar se deben de unir ambos datasets por **itemid**.

In [14]:
df.set_index('timestamp', inplace=True)
data_merged = df.join(
    df_productos.reset_index().set_index('itemid'), 
    on='itemid', 
    how='left'
)

In [15]:
data_merged.head()

Unnamed: 0_level_0,visitorid,itemid,transactionid,event,index,nombre,volumen,precio,marca,categoria,categoria_id,marca_id
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1433222276276,599528,356475,4000.0,transaction,4009,Crown Royal Honey,750,22.49,Diageo Americas,CANADIAN WHISKIES,9,1
1433222276276,599528,356475,4000.0,transaction,6621,Dekuyper Sour Peach Pucker,200,2.27,Jim Beam Brands,PEACH SCHNAPPS,74,23
1433193500981,121688,15335,11117.0,transaction,4010,Crown Royal Regal Apple Mini,300,11.03,Diageo Americas,CANADIAN WHISKIES,9,1
1433193915008,552148,81345,5444.0,transaction,4011,Crown Royal Regal Apple,200,7.08,Diageo Americas,CANADIAN WHISKIES,9,1
1433176736375,102019,150318,13556.0,transaction,4012,Crown Royal Xr Canadian Whiskey,750,98.99,Diageo Americas,CANADIAN WHISKIES,9,1


# Preparar los datos para reglas de asociacion

In [16]:
# Se reducen los campos del data set.
df_reglas_asociacion = data_merged[["transactionid","categoria"]]
# Se filtran los datos segun el id de la transaccion
df_reglas_asociacion.head()

Unnamed: 0_level_0,transactionid,categoria
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
1433222276276,4000.0,CANADIAN WHISKIES
1433222276276,4000.0,PEACH SCHNAPPS
1433193500981,11117.0,CANADIAN WHISKIES
1433193915008,5444.0,CANADIAN WHISKIES
1433176736375,13556.0,CANADIAN WHISKIES


In [17]:
# Calculando el tamaño de la canasta.
basket_sizes = df_reglas_asociacion.groupby('transactionid').size()
# Filtrando solo las canastas con mas de dos productos.
valid_baskets = basket_sizes[basket_sizes > 1].index
# Quitando las canastas que no tienen mas de un producto
df_filtered = df_reglas_asociacion[df_reglas_asociacion['transactionid'].isin(valid_baskets)]
# Ordnear de manera ascendente.
df_filtered.sort_values(by="transactionid",ascending=True)

Unnamed: 0_level_0,transactionid,categoria
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
1431978994534,12.0,SINGLE MALT SCOTCH
1431978994534,12.0,FLAVORED RUM
1433448499642,23.0,MISC. IMPORTED CORDIALS & LIQUEURS
1433448499720,23.0,IMPORTED VODKA - MISC
1433448499720,23.0,IMPORTED VODKA - MISC
...,...,...
1439949233505,17662.0,CINNAMON SCHNAPPS
1439924299698,17663.0,DECANTERS & SPECIALTY PACKAGES
1439924299494,17663.0,MISC. IMPORTED CORDIALS & LIQUEURS
1432072691718,17669.0,VODKA FLAVORED


# Apriori
Se utiliza el metodo apriori para generar dichas reglas de asociacion.

In [18]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import pandas as pd

# Codificar los datos con one-hot encoding
one_hot = pd.get_dummies(df_filtered['categoria'])
# Establecer el id de cada fila
one_hot['transactionid'] = df_filtered['transactionid']
# Calcular el maximo de cada encoding
basket = one_hot.groupby('transactionid').max()
# Imprimir los resultados
basket.head()

Unnamed: 0_level_0,100 PROOF VODKA,AMARETTO - IMPORTED,AMERICAN ALCOHOL,AMERICAN AMARETTO,AMERICAN COCKTAILS,AMERICAN DRY GINS,AMERICAN GRAPE BRANDIES,AMERICAN RUMS,AMERICAN SLOE GINS,APPLE SCHNAPPS,...,TEQUILA,TRIPLE SEC,TROPICAL FRUIT SCHNAPPS,VODKA 80 PROOF,VODKA FLAVORED,WATERMELON SCHNAPPS,WHISKEY LIQUEUR,WHITE CREME DE CACAO,WHITE CREME DE MENTHE,generico
transactionid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
12.0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
23.0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False
27.0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
28.0,False,False,False,False,False,False,False,False,False,True,...,False,False,False,False,False,False,False,False,False,False
37.0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


In [53]:
# Calcular los conjuntos de productos que se compran juntos seguidamente.
frequent_itemsets = apriori(basket, min_support=0.001, use_colnames=True)
frequent_itemsets.head()experimentos

In [54]:
# Generar las reglas de asociacion.
rules = association_rules(frequent_itemsets, num_itemsets=2, metric="confidence", min_threshold=0.7)
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(LEMONADE SCHNAPPS),(SINGLE MALT SCOTCH),0.001106,0.108407,0.001106,1.0,9.22449,1.0,0.000986,inf,0.89258,0.010204,1.0,0.505102
1,(PRIVATE LABEL TEQUILA),(MISC. IMPORTED CORDIALS & LIQUEURS),0.001844,0.122788,0.001475,0.8,6.515315,1.0,0.001249,4.386062,0.848079,0.011976,0.772005,0.406006
2,(PRIVATE LABEL BOURBON),(SINGLE MALT SCOTCH),0.001475,0.108407,0.001106,0.75,6.918367,1.0,0.000946,3.566372,0.856721,0.010169,0.719603,0.380102
3,"(BLENDED WHISKIES, AMERICAN DRY GINS)",(AMERICAN COCKTAILS),0.001844,0.119469,0.001475,0.8,6.696296,1.0,0.001255,4.402655,0.852235,0.012308,0.772864,0.406173
4,"(SINGLE BARREL BOURBON WHISKIES, IRISH WHISKIES)",(AMERICAN COCKTAILS),0.001475,0.119469,0.001475,1.0,8.37037,1.0,0.001299,inf,0.881832,0.012346,1.0,0.506173


In [55]:
# Mostrar las reglas
rules.sort_values(["support", "confidence","lift"],axis = 0, ascending = False).head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
6,"(MISC. IMPORTED CORDIALS & LIQUEURS, WHISKEY L...",(AMERICAN COCKTAILS),0.00295,0.119469,0.002212,0.75,6.277778,1.0,0.00186,3.522124,0.843195,0.018405,0.71608,0.384259
99,"(SINGLE MALT SCOTCH, FLAVORED RUM, STRAIGHT BO...",(AMERICAN COCKTAILS),0.001844,0.119469,0.001844,1.0,8.37037,1.0,0.001623,inf,0.882157,0.015432,1.0,0.507716
122,"(SINGLE MALT SCOTCH, STRAIGHT BOURBON WHISKIES...",(AMERICAN COCKTAILS),0.001844,0.119469,0.001844,1.0,8.37037,1.0,0.001623,inf,0.882157,0.015432,1.0,0.507716
127,"(SINGLE MALT SCOTCH, STRAIGHT BOURBON WHISKIES...",(AMERICAN COCKTAILS),0.001844,0.119469,0.001844,1.0,8.37037,1.0,0.001623,inf,0.882157,0.015432,1.0,0.507716
237,"(IMPORTED VODKA, IMPORTED GRAPE BRANDIES, VODK...",(DECANTERS & SPECIALTY PACKAGES),0.001844,0.193584,0.001844,1.0,5.165714,1.0,0.001487,inf,0.807905,0.009524,1.0,0.504762


In [56]:
# Imprimir la cantidad de transacciones iniciales, y las que se usaron para generar las reglas de asociacion.
print(f"Original number of transactions: {df['transactionid'].nunique()}")
print(f"Number of transactions after removing single-item baskets: {df_filtered['transactionid'].nunique()}")

Original number of transactions: 17672
Number of transactions after removing single-item baskets: 2712


In [57]:
# Guardar las reglas en un archivo csv.
basket.to_csv('reglas_asociacion_canastas.csv') 

In [58]:
# Imprimir las reglas de asociacion.
rules.sort_values(["support", "confidence","lift"],axis = 0, ascending = False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
6,"(MISC. IMPORTED CORDIALS & LIQUEURS, WHISKEY L...",(AMERICAN COCKTAILS),0.002950,0.119469,0.002212,0.75,6.277778,1.0,0.001860,3.522124,0.843195,0.018405,0.716080,0.384259
99,"(SINGLE MALT SCOTCH, FLAVORED RUM, STRAIGHT BO...",(AMERICAN COCKTAILS),0.001844,0.119469,0.001844,1.00,8.370370,1.0,0.001623,inf,0.882157,0.015432,1.000000,0.507716
122,"(SINGLE MALT SCOTCH, STRAIGHT BOURBON WHISKIES...",(AMERICAN COCKTAILS),0.001844,0.119469,0.001844,1.00,8.370370,1.0,0.001623,inf,0.882157,0.015432,1.000000,0.507716
127,"(SINGLE MALT SCOTCH, STRAIGHT BOURBON WHISKIES...",(AMERICAN COCKTAILS),0.001844,0.119469,0.001844,1.00,8.370370,1.0,0.001623,inf,0.882157,0.015432,1.000000,0.507716
237,"(IMPORTED VODKA, IMPORTED GRAPE BRANDIES, VODK...",(DECANTERS & SPECIALTY PACKAGES),0.001844,0.193584,0.001844,1.00,5.165714,1.0,0.001487,inf,0.807905,0.009524,1.000000,0.504762
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147,"(STRAIGHT BOURBON WHISKIES, AMERICAN DRY GINS,...",(DECANTERS & SPECIALTY PACKAGES),0.001475,0.193584,0.001106,0.75,3.874286,1.0,0.000821,3.225664,0.742984,0.005703,0.689986,0.377857
188,"(BLENDED WHISKIES, STRAIGHT BOURBON WHISKIES, ...",(DECANTERS & SPECIALTY PACKAGES),0.001475,0.193584,0.001106,0.75,3.874286,1.0,0.000821,3.225664,0.742984,0.005703,0.689986,0.377857
209,"(SINGLE MALT SCOTCH, CANADIAN WHISKIES, STRAIG...",(DECANTERS & SPECIALTY PACKAGES),0.001475,0.193584,0.001106,0.75,3.874286,1.0,0.000821,3.225664,0.742984,0.005703,0.689986,0.377857
242,"(IMPORTED VODKA, VODKA FLAVORED, IRISH WHISKIES)",(DECANTERS & SPECIALTY PACKAGES),0.001475,0.193584,0.001106,0.75,3.874286,1.0,0.000821,3.225664,0.742984,0.005703,0.689986,0.377857


In [59]:
# Guardar las reglas de asociacion en un archivo de texto
rules.to_csv('reglas_asociacion_reglas.csv')  

In [60]:
# Para poder identificar de mejor manera las reglas de asociacion, se clasifican segun  el nivel de confianza.
def asignar_categoria(confidence):
    if confidence < 0.5:
        return 'low'
    elif confidence < 0.75:
        return 'medium'
    else:
        return 'high'
rules['categoriaconfidence'] = rules['confidence'].apply(asignar_categoria)

In [61]:
# Para poder identificar de mejor manera las reglas de asociacion, se clasifican segun su lift.
def asignar_categoria_lift(confidence):
    if confidence < 1:
        return 'negative'
    elif confidence == 1:
        return 'independent'
    else:
        return 'positive'
rules['categorialift'] = rules['lift'].apply(asignar_categoria_lift)

# Escribir resultados a la base de datos

In [62]:
from sqlalchemy import create_engine
import numpy as np
import psycopg2

engine = create_engine('postgresql://data_analytics:data_analytics@mypostgres:5432/data_analytics')
rules['new_antecedents'] = rules['antecedents'].apply(lambda x: np.array(list(x)))
rules['new_antecedents'] = rules['new_antecedents'].astype(str)
rules['new_consequents'] = rules['consequents'].apply(lambda x: np.array(list(x)))
rules['new_consequents'] = rules['new_consequents'].astype(str)

rules[["new_antecedents","new_consequents","antecedent support","consequent support","support","confidence","lift",'categoriaconfidence','categorialift']].to_sql('reglas_asociacion_categorias', engine, if_exists='replace', index=False)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski,categoriaconfidence,categorialift,new_antecedents,new_consequents
0,(LEMONADE SCHNAPPS),(SINGLE MALT SCOTCH),0.001106,0.108407,0.001106,1.00,9.224490,1.0,0.000986,inf,0.892580,0.010204,1.000000,0.505102,high,positive,['LEMONADE SCHNAPPS'],['SINGLE MALT SCOTCH']
1,(PRIVATE LABEL TEQUILA),(MISC. IMPORTED CORDIALS & LIQUEURS),0.001844,0.122788,0.001475,0.80,6.515315,1.0,0.001249,4.386062,0.848079,0.011976,0.772005,0.406006,high,positive,['PRIVATE LABEL TEQUILA'],['MISC. IMPORTED CORDIALS & LIQUEURS']
2,(PRIVATE LABEL BOURBON),(SINGLE MALT SCOTCH),0.001475,0.108407,0.001106,0.75,6.918367,1.0,0.000946,3.566372,0.856721,0.010169,0.719603,0.380102,high,positive,['PRIVATE LABEL BOURBON'],['SINGLE MALT SCOTCH']
3,"(BLENDED WHISKIES, AMERICAN DRY GINS)",(AMERICAN COCKTAILS),0.001844,0.119469,0.001475,0.80,6.696296,1.0,0.001255,4.402655,0.852235,0.012308,0.772864,0.406173,high,positive,['BLENDED WHISKIES' 'AMERICAN DRY GINS'],['AMERICAN COCKTAILS']
4,"(SINGLE BARREL BOURBON WHISKIES, IRISH WHISKIES)",(AMERICAN COCKTAILS),0.001475,0.119469,0.001475,1.00,8.370370,1.0,0.001299,inf,0.881832,0.012346,1.000000,0.506173,high,positive,['SINGLE BARREL BOURBON WHISKIES' 'IRISH WHISK...,['AMERICAN COCKTAILS']
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
494,"(FLAVORED RUM, STRAIGHT BOURBON WHISKIES, SCOT...","(SINGLE MALT SCOTCH, AMERICAN COCKTAILS)",0.001106,0.009218,0.001106,1.00,108.480000,1.0,0.001096,inf,0.991879,0.120000,1.000000,0.560000,high,positive,['FLAVORED RUM' 'STRAIGHT BOURBON WHISKIES' 'S...,['SINGLE MALT SCOTCH' 'AMERICAN COCKTAILS']
495,"(TEQUILA, FLAVORED RUM, STRAIGHT BOURBON WHISK...","(SINGLE MALT SCOTCH, SCOTCH WHISKIES)",0.001475,0.009587,0.001106,0.75,78.230769,1.0,0.001092,3.961652,0.988676,0.111111,0.747580,0.432692,high,positive,['TEQUILA' 'FLAVORED RUM' 'STRAIGHT BOURBON WH...,['SINGLE MALT SCOTCH' 'SCOTCH WHISKIES']
496,"(TEQUILA, FLAVORED RUM, SCOTCH WHISKIES, AMERI...","(SINGLE MALT SCOTCH, STRAIGHT BOURBON WHISKIES)",0.001106,0.006637,0.001106,1.00,150.666667,1.0,0.001099,inf,0.994463,0.166667,1.000000,0.583333,high,positive,['TEQUILA' 'FLAVORED RUM' 'SCOTCH WHISKIES' 'A...,['SINGLE MALT SCOTCH' 'STRAIGHT BOURBON WHISKI...
497,"(FLAVORED RUM, SCOTCH WHISKIES, AMERICAN COCKT...","(SINGLE MALT SCOTCH, STRAIGHT BOURBON WHISKIES...",0.001475,0.001844,0.001106,0.75,406.800000,1.0,0.001103,3.992625,0.999015,0.500000,0.749538,0.675000,high,positive,['FLAVORED RUM' 'SCOTCH WHISKIES' 'AMERICAN CO...,['SINGLE MALT SCOTCH' 'STRAIGHT BOURBON WHISKI...


# Registrar experimentos utilizando MLFlow

In [70]:
%%capture
!pip install mlxtend mlflow

In [71]:
mlflow.set_tracking_uri('http://localhost:5000')

In [72]:
import mlflow
import pandas as pd
from mlxtend.frequent_patterns import association_rules
import pickle

class AssociationRulesModel(mlflow.pyfunc.PythonModel):
    def __init__(self, frequent_itemsets, num_itemsets=2, metric="confidence", min_threshold=0.7):
        self.frequent_itemsets = frequent_itemsets
        self.num_itemsets = num_itemsets
        self.metric = metric
        self.min_threshold = min_threshold
        self.rules = None
        
    def fit(self):
        """Generate association rules from frequent itemsets"""
        self.rules = association_rules(
            self.frequent_itemsets,
            num_itemsets=self.num_itemsets,
            metric=self.metric,
            min_threshold=self.min_threshold
        )
        ## Filter for specified number of itemsets if needed
        #if self.num_itemsets:
        #    self.rules = self.rules.head(self.num_itemsets)
        return self.rules
    
    def predict(self, context, model_input):
        """
        Given an antecedent, predict the consequent based on learned rules
        model_input should be a DataFrame with items to check
        """
        predictions = []
        for _, row in model_input.iterrows():
            items = set(row[row == 1].index)
            matching_rules = self.rules[self.rules['antecedents'].apply(lambda x: x.issubset(items))]
            if not matching_rules.empty:
                # Get the consequent with highest confidence
                best_rule = matching_rules.loc[matching_rules['confidence'].idxmax()]
                predictions.append(list(best_rule['consequents']))
            else:
                predictions.append([])
        return predictions

def log_association_rules_model(frequent_itemsets, num_itemsets=2, metric="confidence", min_threshold=0.7):
    """
    Log association rules model and its metrics to MLflow
    """
    with mlflow.start_run() as run:
        # Create and fit the model
        model = AssociationRulesModel(
            frequent_itemsets=frequent_itemsets,
            num_itemsets=num_itemsets,
            metric=metric,
            min_threshold=min_threshold
        )
        rules = model.fit()
        
        # Log parameters
        mlflow.log_params({
            "num_itemsets": num_itemsets,
            "metric": metric,
            "min_threshold": min_threshold,
            "total_rules_generated": len(rules)
        })
        
        # Log metrics
        mlflow.log_metrics({
            "avg_confidence": rules['confidence'].mean(),
            "avg_support": rules['support'].mean(),
            "avg_lift": rules['lift'].mean()
        })
        
        # Save rules as CSV artifact
        rules.to_csv("association_rules.csv")
        mlflow.log_artifact("association_rules.csv")
        
        # Log model
        mlflow.pyfunc.log_model(
            artifact_path="association_rules_model",
            python_model=model,
            #artifacts={"frequent_itemsets": frequent_itemsets},
            conda_env={
                'channels': ['defaults', 'conda-forge'],
                'dependencies': [
                    'python=3.8.0',
                    'pandas',
                    'mlxtend'
                ]
            }
        )
        mlflow.sklearn.log_model(
            sk_model=model,
            artifact_path="association_rules_model",
            input_example=basket,
            registered_model_name="association_rules_mode",
        )
        
        print(f"Model logged to MLflow with run_id: {run.info.run_id}")
        return run.info.run_id


In [73]:
confidence_experiments = [0.7,0.8,0.9]
for confidence in confidence_experiments:
    rules_run_id = log_association_rules_model(
        frequent_itemsets=frequent_itemsets,  # Your frequent_itemsets DataFrame
        num_itemsets=2,
        metric="confidence",
        min_threshold=confidence
    )
    
# Load the model back
#loaded_model = mlflow.pyfunc.load_model(f"runs:/{rules_run_id}/association_rules_model")

Registered model 'association_rules_mode' already exists. Creating a new version of this model...
2024/11/11 04:16:13 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: association_rules_mode, version 7
Created version '7' of model 'association_rules_mode'.


Downloading artifacts:   0%|          | 0/8 [00:00<?, ?it/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

2024/11/11 04:16:13 INFO mlflow.tracking._tracking_service.client: 🏃 View run luxuriant-fish-956 at: http://localhost:5000/#/experiments/0/runs/e5225d79f4844e29b4e0315941a0a4d7.
2024/11/11 04:16:13 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/0.


Model logged to MLflow with run_id: e5225d79f4844e29b4e0315941a0a4d7


Registered model 'association_rules_mode' already exists. Creating a new version of this model...
2024/11/11 04:16:15 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: association_rules_mode, version 8
Created version '8' of model 'association_rules_mode'.


Downloading artifacts:   0%|          | 0/8 [00:00<?, ?it/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

2024/11/11 04:16:16 INFO mlflow.tracking._tracking_service.client: 🏃 View run victorious-seal-243 at: http://localhost:5000/#/experiments/0/runs/a4024c98758f4f058526c09a48ffc8f8.
2024/11/11 04:16:16 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/0.


Model logged to MLflow with run_id: a4024c98758f4f058526c09a48ffc8f8


Registered model 'association_rules_mode' already exists. Creating a new version of this model...
2024/11/11 04:16:17 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: association_rules_mode, version 9
Created version '9' of model 'association_rules_mode'.


Downloading artifacts:   0%|          | 0/8 [00:00<?, ?it/s]

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)

2024/11/11 04:16:18 INFO mlflow.tracking._tracking_service.client: 🏃 View run powerful-ram-662 at: http://localhost:5000/#/experiments/0/runs/3d212db7458944f0b11af35306d07474.
2024/11/11 04:16:18 INFO mlflow.tracking._tracking_service.client: 🧪 View experiment at: http://localhost:5000/#/experiments/0.


Model logged to MLflow with run_id: 3d212db7458944f0b11af35306d07474
