# Apriori Algorithm 

This notebook demonstrates the use of the Apriori algorithm to generate association rules between the frequent aisles from a transaction dataset.

## Steps

1. **Import Libraries**: Import the necessary libraries for data manipulation and the Apriori algorithm.
2. **Load Dataset**: Load the transaction dataset into a pandas DataFrame.
3. **Generate Frequent Itemsets**: Use the `apriori` function to generate frequent itemsets with a minimum support threshold.
4. **Generate Association Rules**: Use the `association_rules` function to generate association rules from the frequent itemsets, using lift as the metric.
5. **Sort Rules**: Sort the generated rules by the lift metric in ascending order.
6. **Display Rules**: Display the sorted rules.

In [17]:
import numpy as np
import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
from mlxtend.preprocessing import TransactionEncoder

root = 'C:/Users/cabre/PycharmProjects/Market-Basket-Analysis/Data/'

In [18]:
orders = pd.read_csv(root + 'orders.csv')
aisles = pd.read_csv(root + 'aisles.csv')
order_products_prior = pd.read_csv(root + 'order_products__prior.csv')
order_products_train = pd.read_csv(root + 'order_products__train.csv')
products = pd.read_csv(root + 'products.csv')

In [19]:
order_products = pd.concat([order_products_prior, order_products_train])
order_products.shape

(33819106, 4)

## Merge the order_products DataFrame with the products DataFrame on the product_id column

In [20]:
order_products = order_products.merge(products, on = 'product_id', how = 'left')

## Merge the order_products DataFrame with the aisles DataFrame on the aisle_id column

In [21]:
order_products_with_aisles = order_products.merge(aisles, on = 'aisle_id', how='left')
order_products_with_aisles.head()

Unnamed: 0,order_id,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,aisle
0,2,33120,1,1,Organic Egg Whites,86,16,eggs
1,2,28985,2,1,Michigan Organic Kale,83,4,fresh vegetables
2,2,9327,3,0,Garlic Powder,104,13,spices seasonings
3,2,45918,4,1,Coconut Butter,19,13,oils vinegars
4,2,30035,5,0,Natural Sweetener,17,13,baking ingredients


In [22]:
transactions_list = order_products_with_aisles.groupby('order_id')['aisle'].apply(list).tolist()
transactions_list[:10]

[['yogurt',
  'other creams cheeses',
  'fresh vegetables',
  'fresh vegetables',
  'canned meat seafood',
  'fresh fruits',
  'fresh fruits',
  'packaged cheese'],
 ['eggs',
  'fresh vegetables',
  'spices seasonings',
  'oils vinegars',
  'baking ingredients',
  'fresh vegetables',
  'doughs gelatins bake mixes',
  'spreads',
  'packaged vegetables fruits'],
 ['yogurt',
  'soy lactosefree',
  'packaged vegetables fruits',
  'packaged vegetables fruits',
  'soy lactosefree',
  'fresh vegetables',
  'poultry counter',
  'bread'],
 ['breakfast bakery',
  'cold flu allergy',
  'energy granola bars',
  'breakfast bars pastries',
  'breakfast bars pastries',
  'breakfast bars pastries',
  'breakfast bars pastries',
  'chips pretzels',
  'trail mix snack mix',
  'crackers',
  'refrigerated',
  'energy sports drinks',
  'energy sports drinks'],
 ['fresh fruits',
  'salad dressing toppings',
  'prepared soups salads',
  'packaged vegetables fruits',
  'milk',
  'paper goods',
  'water seltzer

## Transforms the transactions_list into a one-hot encoded NumPy boolean

In [23]:
transaction_encoder = TransactionEncoder()
transaction_array = transaction_encoder.fit(transactions_list).transform(transactions_list)
transaction_df = pd.DataFrame(transaction_array, columns=transaction_encoder.columns_)
transaction_df

Unnamed: 0,air fresheners candles,asian foods,baby accessories,baby bath body care,baby food formula,bakery desserts,baking ingredients,baking supplies decor,beauty,beers coolers,...,spreads,tea,tofu meat alternatives,tortillas flat bread,trail mix snack mix,trash bags liners,vitamins supplements,water seltzer sparkling water,white wines,yogurt
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
1,False,False,False,False,False,False,True,False,False,False,...,True,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,True
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,True,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,True,False,False,False,False,False,False,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3346078,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3346079,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3346080,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,True,False,False
3346081,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Generate frequent aisles using the Apriori algorithm

In [30]:
frequent_aisles = apriori(transaction_df, min_support=0.1, use_colnames=True, low_memory=True)
frequent_aisles.head()

Unnamed: 0,support,itemsets
0,0.163958,(bread)
1,0.167729,(chips pretzels)
2,0.114835,(crackers)
3,0.137402,(eggs)
4,0.556755,(fresh fruits)


## Association rules between the frequent aisles

In [33]:
rules = association_rules(frequent_aisles, metric="lift", min_threshold=1)
rules.sort_values('lift', ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
39,"(fresh vegetables, fresh fruits)",(packaged vegetables fruits),0.318137,0.367445,0.187271,0.588649,1.602006,1.0,0.070373,1.537749,0.551112,0.375812,0.349699,0.549153
42,(packaged vegetables fruits),"(fresh vegetables, fresh fruits)",0.367445,0.318137,0.187271,0.509658,1.602006,1.0,0.070373,1.390585,0.594071,0.375812,0.280878,0.549153
40,"(packaged vegetables fruits, fresh fruits)",(fresh vegetables),0.270937,0.444341,0.187271,0.691198,1.555556,1.0,0.066882,1.7994,0.489866,0.354675,0.444259,0.556328
41,(fresh vegetables),"(packaged vegetables fruits, fresh fruits)",0.444341,0.270937,0.187271,0.421457,1.555556,1.0,0.066882,1.260172,0.642738,0.354675,0.206457,0.556328
54,(packaged vegetables fruits),"(yogurt, fresh fruits)",0.367445,0.187954,0.105528,0.287196,1.528007,1.0,0.036466,1.139226,0.54628,0.234575,0.122211,0.424327
51,"(yogurt, fresh fruits)",(packaged vegetables fruits),0.187954,0.367445,0.105528,0.561458,1.528007,1.0,0.036466,1.442405,0.425534,0.234575,0.306713,0.424327
36,(fresh vegetables),"(packaged cheese, fresh fruits)",0.444341,0.15521,0.10465,0.235516,1.517407,1.0,0.035684,1.105047,0.613652,0.211455,0.095061,0.454881
33,"(packaged cheese, fresh fruits)",(fresh vegetables),0.15521,0.444341,0.10465,0.674247,1.517407,1.0,0.035684,1.705765,0.403628,0.211455,0.413753,0.454881
50,"(yogurt, packaged vegetables fruits)",(fresh fruits),0.127933,0.556755,0.105528,0.824872,1.48157,1.0,0.034301,2.530974,0.372724,0.18221,0.604895,0.507207
55,(fresh fruits),"(yogurt, packaged vegetables fruits)",0.556755,0.127933,0.105528,0.189542,1.48157,1.0,0.034301,1.076017,0.73332,0.18221,0.070647,0.507207
