#### Market Basket Analysis using Apriori Algorithm  with `mlxtend` Library

In [1]:
from mlxtend.frequent_patterns import apriori,association_rules
from mlxtend.preprocessing import TransactionEncoder
import pandas as pd 
import numpy as np

In [2]:
# Dataset to be used is -> 
dataset=[
    ['Milk','Onion','Nutmeg','Kidney Beans','Eggs','Yogurt'],
    ['Dill','Onion','Nutmeg','Kidney Beans','Eggs','Yogurt'],
    ['Milk','Apple','Kidney Beans','Eggs'],
     ['Milk','Unicorn','Corn','Kidney Beans','Yogurt'],
      ['Corn','Onion','Onion','Kidney Beans','Ice cream','Eggs']
]

### Transaction Encoder in `mlxtend` Library

The `TransactionEncoder` in the `mlxtend.preprocessing` module is a utility used to transform a dataset of transactions (lists of items) into a one-hot encoded DataFrame. This is a crucial preprocessing step for algorithms like Apriori and FP-Growth, which require the input data to be in a binary format (1 for presence, 0 for absence of an item).

#### Why Use `TransactionEncoder`?
- Market basket datasets are typically represented as lists of transactions, where each transaction is a list of items purchased together.
- Algorithms like Apriori cannot directly process this format. They require a binary matrix where:
  - Rows represent transactions.
  - Columns represent items.
  - Values are `1` if the item is present in the transaction, otherwise `0`.

#### How `TransactionEncoder` Works
1. **Fit the Encoder**: The `TransactionEncoder` identifies all unique items in the dataset.
2. **Transform the Data**: It converts the list of transactions into a binary matrix.

#### Example Usage


In [3]:
# Initialize and fit the TransactionEncoder
te = TransactionEncoder()
te_array = te.fit(dataset).transform(dataset)

# Convert to a DataFrame
df = pd.DataFrame(te_array, columns=te.columns_)

print(df)

   Apple   Corn   Dill   Eggs  Ice cream  Kidney Beans   Milk  Nutmeg  Onion  \
0  False  False  False   True      False          True   True    True   True   
1  False  False   True   True      False          True  False    True   True   
2   True  False  False   True      False          True   True   False  False   
3  False   True  False  False      False          True   True   False  False   
4  False   True  False   True       True          True  False   False   True   

   Unicorn  Yogurt  
0    False    True  
1    False    True  
2    False   False  
3     True    True  
4    False   False  




#### Output
The resulting DataFrame will look like this:
| Apple | Corn | Dill | Eggs | Ice cream | Kidney Beans | Milk | Nutmeg | Onion | Unicorn | Yogurt |
|-------|------|------|------|-----------|--------------|------|--------|-------|---------|--------|
| 0     | 0    | 0    | 1    | 0         | 1            | 1    | 1      | 1     | 0       | 1      |
| 0     | 0    | 1    | 1    | 0         | 1            | 0    | 1      | 1     | 0       | 1      |
| 1     | 0    | 0    | 1    | 0         | 1            | 1    | 0      | 0     | 0       | 0      |
| 0     | 1    | 0    | 0    | 0         | 1            | 1    | 0      | 0     | 1       | 1      |
| 0     | 1    | 0    | 1    | 1         | 1            | 0    | 0      | 1     | 0       | 0      |

#### Key Points
- **`fit()`**: Identifies all unique items in the dataset.
- **`transform()`**: Converts the dataset into a binary matrix.
- **`te.columns_`**: Contains the list of unique items (column names for the DataFrame).

#### Use Case
The one-hot encoded DataFrame can now be passed to the `apriori` or `fp_growth` functions in `mlxtend` to generate frequent itemsets and association rules.

This preprocessing step ensures that the dataset is in the correct format for association rule mining algorithms.

Similar code found with 1 license type

In [4]:
# Now Applying Apriori Algorithm on the Above Dataset to generate Frequent Itemsets with min_support=0.6
fq_6=apriori(df,min_support=0.6,use_colnames=True)


In [5]:
fq_6

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Eggs, Kidney Beans)"
6,0.6,"(Eggs, Onion)"
7,0.6,"(Kidney Beans, Milk)"
8,0.6,"(Kidney Beans, Onion)"
9,0.6,"(Kidney Beans, Yogurt)"


In [6]:
# GeNERATING FREQUENT Itemset with mimum support of 0.7
fq_7=apriori(df,min_support=0.7,use_colnames=True)

In [7]:
fq_7

Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.8,"(Eggs, Kidney Beans)"


#### Depending upon the min_threshold for the mentioned metric `confidence` here we can generate different association rules

In [8]:
# Generating the Association Rules with Minimum Support of 0.6
ass_rules=association_rules(fq_6,metric='confidence',min_threshold=0.6)

  cert_metric = np.where(certainty_denom == 0, 0, certainty_num / certainty_denom)


In [9]:
ass_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
0,(Eggs),(Kidney Beans),0.8,1.0,0.8,1.0,1.0,1.0,0.0,inf,0.0,0.8,0.0,0.9
1,(Kidney Beans),(Eggs),1.0,0.8,0.8,0.8,1.0,1.0,0.0,1.0,0.0,0.8,0.0,0.9
2,(Eggs),(Onion),0.8,0.6,0.6,0.75,1.25,1.0,0.12,1.6,1.0,0.75,0.375,0.875
3,(Onion),(Eggs),0.6,0.8,0.6,1.0,1.25,1.0,0.12,inf,0.5,0.75,1.0,0.875
4,(Kidney Beans),(Milk),1.0,0.6,0.6,0.6,1.0,1.0,0.0,1.0,0.0,0.6,0.0,0.8
5,(Milk),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,1.0,0.0,inf,0.0,0.6,0.0,0.8
6,(Kidney Beans),(Onion),1.0,0.6,0.6,0.6,1.0,1.0,0.0,1.0,0.0,0.6,0.0,0.8
7,(Onion),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,1.0,0.0,inf,0.0,0.6,0.0,0.8
8,(Kidney Beans),(Yogurt),1.0,0.6,0.6,0.6,1.0,1.0,0.0,1.0,0.0,0.6,0.0,0.8
9,(Yogurt),(Kidney Beans),0.6,1.0,0.6,1.0,1.0,1.0,0.0,inf,0.0,0.6,0.0,0.8
