# Apriori Algorithm 

This notebook demonstrates the use of the Apriori algorithm to generate association rules between the frequent aisles from a transaction dataset.

## Steps

1. **Import Libraries**: Import the necessary libraries for data manipulation and the Apriori algorithm.
2. **Load Dataset**: Load the transaction dataset into a pandas DataFrame.
3. **Generate Frequent Itemsets**: Use the `apriori` function to generate frequent itemsets with a minimum support threshold.
4. **Generate Association Rules**: Use the `association_rules` function to generate association rules from the frequent itemsets, using lift as the metric.
5. **Sort Rules**: Sort the generated rules by the lift metric in ascending order.
6. **Display Rules**: Display the sorted rules.

In [3]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

root = 'C:/Users/cabre/PycharmProjects/Market-Basket-Analysis/Data/'

In [4]:
orders = pd.read_csv(root + 'orders.csv')
aisles = pd.read_csv(root + 'aisles.csv')
order_products_prior = pd.read_csv(root + 'order_products__prior.csv')
order_products_train = pd.read_csv(root + 'order_products__train.csv')
products = pd.read_csv(root + 'products.csv')

In [5]:
order_products = pd.concat([order_products_prior, order_products_train])
order_products.shape

(33819106, 4)

## Merge the order_products DataFrame with the products DataFrame on the product_id column

In [6]:
order_products = order_products.merge(products, on = 'product_id', how = 'left')

## Merge the order_products DataFrame with the aisles DataFrame on the aisle_id column

In [7]:
order_products_with_aisles = order_products.merge(aisles, on = 'aisle_id', how='left')
order_products_with_aisles.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,aisle
0,2,33120,1,1,Organic Egg Whites,86,16,eggs
1,2,28985,2,1,Michigan Organic Kale,83,4,fresh vegetables
2,2,9327,3,0,Garlic Powder,104,13,spices seasonings
3,2,45918,4,1,Coconut Butter,19,13,oils vinegars
4,2,30035,5,0,Natural Sweetener,17,13,baking ingredients


In [9]:
transactions_list = order_products_with_aisles.groupby('order_id')['aisle'].apply(list).tolist()
transactions_list[:10]

[['yogurt',
  'other creams cheeses',
  'fresh vegetables',
  'fresh vegetables',
  'canned meat seafood',
  'fresh fruits',
  'fresh fruits',
  'packaged cheese'],
 ['eggs',
  'fresh vegetables',
  'spices seasonings',
  'oils vinegars',
  'baking ingredients',
  'fresh vegetables',
  'doughs gelatins bake mixes',
  'spreads',
  'packaged vegetables fruits'],
 ['yogurt',
  'soy lactosefree',
  'packaged vegetables fruits',
  'packaged vegetables fruits',
  'soy lactosefree',
  'fresh vegetables',
  'poultry counter',
  'bread'],
 ['breakfast bakery',
  'cold flu allergy',
  'energy granola bars',
  'breakfast bars pastries',
  'breakfast bars pastries',
  'breakfast bars pastries',
  'breakfast bars pastries',
  'chips pretzels',
  'trail mix snack mix',
  'crackers',
  'refrigerated',
  'energy sports drinks',
  'energy sports drinks'],
 ['fresh fruits',
  'salad dressing toppings',
  'prepared soups salads',
  'packaged vegetables fruits',
  'milk',
  'paper goods',
  'water seltzer

## Transforms the transactions_list into a one-hot encoded NumPy boolean

In [10]:
transaction_encoder = TransactionEncoder()
transaction_array = transaction_encoder.fit(transactions_list).transform(transactions_list)
transaction_df = pd.DataFrame(transaction_array, columns=transaction_encoder.columns_)
transaction_df

Unnamed: 0,air fresheners candles,asian foods,baby accessories,baby bath body care,baby food formula,bakery desserts,baking ingredients,baking supplies decor,beauty,beers coolers,...,spreads,tea,tofu meat alternatives,tortillas flat bread,trail mix snack mix,trash bags liners,vitamins supplements,water seltzer sparkling water,white wines,yogurt
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
1,False,False,False,False,False,False,True,False,False,False,...,True,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,True,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3346078,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3346079,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3346080,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
3346081,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Generate frequent aisles using the Apriori algorithm

In [11]:
frequent_aisles = apriori(transaction_df, min_support=0.01, use_colnames=True, low_memory=True)
frequent_aisles.head()

Unnamed: 0,support,itemsets
0,0.043115,(asian foods)
1,0.045884,(baby food formula)
2,0.010255,(bakery desserts)
3,0.076715,(baking ingredients)
4,0.011175,(body lotions soap)


## Association rules between the frequent aisles

In [17]:
rules = association_rules(frequent_items, metric="lift", min_threshold=1)
rules.sort_values('lift', ascending=True)
rules[:100]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(asian foods),(fresh fruits),0.043115,0.556755,0.028041,0.650359,1.168124,1.0,0.004036,1.267714,0.150412,0.049037,0.211179,0.350362
1,(fresh fruits),(asian foods),0.556755,0.043115,0.028041,0.050364,1.168124,1.0,0.004036,1.007633,0.324711,0.049037,0.007575,0.350362
2,(asian foods),(fresh vegetables),0.043115,0.444341,0.028292,0.656188,1.476767,1.0,0.009134,1.616172,0.337392,0.061616,0.381254,0.359930
3,(fresh vegetables),(asian foods),0.444341,0.043115,0.028292,0.063672,1.476767,1.0,0.009134,1.021954,0.581013,0.061616,0.021482,0.359930
4,(asian foods),(milk),0.043115,0.243671,0.011269,0.261368,1.072623,1.0,0.000763,1.023958,0.070757,0.040901,0.023398,0.153807
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,(cookies cakes),(bread),0.059127,0.163958,0.014275,0.241429,1.472504,1.0,0.004581,1.102127,0.341050,0.068363,0.092664,0.164247
96,(bread),(crackers),0.163958,0.114835,0.029106,0.177519,1.545864,1.0,0.010278,1.076214,0.422362,0.116569,0.070816,0.215488
97,(crackers),(bread),0.114835,0.163958,0.029106,0.253457,1.545864,1.0,0.010278,1.119884,0.398923,0.116569,0.107051,0.215488
98,(bread),(cream),0.163958,0.091588,0.019771,0.120589,1.316643,1.0,0.004755,1.032977,0.287656,0.083858,0.031925,0.168231
