# 6.3.1 Apriori Algorithm

## Explanation of the Apriori Algorithm

The Apriori Algorithm is a classic algorithm used in data mining for learning association rules. The algorithm operates on a database of transactions, where each transaction is a set of items. The Apriori Algorithm identifies the frequent itemsets in the database and then uses these itemsets to generate association rules.

## Transactional Database

A **transactional database** is a collection of transactions, where each transaction is a set of items bought together. In the context of market basket analysis, a transaction typically represents a customer's purchase, and the items in the transaction are the products bought. Here's an example of a simple transactional database:

| Transaction ID | Items                           |
|----------------|---------------------------------|
| 1              | Milk, Bread, Butter             |
| 2              | Bread, Butter                   |
| 3              | Milk, Bread, Butter, Cereal     |
| 4              | Bread, Butter, Cereal           |
| 5              | Milk, Bread                     |
| 6              | Butter, Cereal                  |

In this example, each row represents a transaction, and the items column lists the products purchased in that transaction.

___
___

### Readings:
- [Apriori — Association Rule Mining In-depth Explanation and Python Implementation](https://towardsdatascience.com/apriori-association-rule-mining-explanation-and-python-implementation-290b42afdfc6)
- [Apriori Algorithm In Data Mining: Implementation, Examples, and More](https://www.analytixlabs.co.in/blog/apriori-algorithm-in-data-mining/)
- [Apriori Algorithm in Python (Recommendation Engine)](https://deepak6446.medium.com/apriori-algorithm-in-python-recommendation-engine-5ba89bd1a6da)
- [Apriori Algorithm](https://athena.ecs.csus.edu/~mei/associationcw/Apriori.html)
___
___

## Benefits and Scenarios for Using the Apriori Algorithm

**Benefits:**

- **Simplicity**: Easy to understand and implement.
- **Effectiveness**: Efficiently identifies frequent itemsets and generates association rules.
- **Applicability**: Can be used in various domains such as market basket analysis, recommendation systems, and web usage mining.

**Scenarios:**

- **Market Basket Analysis**: Finding products that frequently co-occur in transactions.
- **Recommendation Systems**: Suggesting items that are frequently bought together.
- **Web Usage Mining**: Identifying patterns in user navigation behavior on websites.

___
- ### Example Dataset

In [1]:
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder
import pandas as pd

In [2]:
# Example transaction data
dataset = [
    ['milk', 'bread', 'butter'],
    ['bread', 'butter'],
    ['milk', 'bread', 'butter', 'cereal'],
    ['bread', 'butter', 'cereal'],
    ['milk', 'bread'],
    ['butter', 'cereal']
]

In [3]:
# Transform data using TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(dataset).transform(dataset)
df = pd.DataFrame(te_ary, columns=te.columns_)

In [4]:
# Generate frequent itemsets
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
    support          itemsets
0  0.833333           (bread)
1  0.833333          (butter)
2  0.500000          (cereal)
3  0.500000            (milk)
4  0.666667   (butter, bread)
5  0.500000     (milk, bread)
6  0.500000  (cereal, butter)


In [5]:
# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("\nAssociation Rules:")
print(rules)


Association Rules:
  antecedents consequents  antecedent support  consequent support   support  \
0    (butter)     (bread)            0.833333            0.833333  0.666667   
1     (bread)    (butter)            0.833333            0.833333  0.666667   
2      (milk)     (bread)            0.500000            0.833333  0.500000   
3    (cereal)    (butter)            0.500000            0.833333  0.500000   

   confidence  lift  leverage  conviction  zhangs_metric  
0         0.8  0.96 -0.027778    0.833333      -0.200000  
1         0.8  0.96 -0.027778    0.833333      -0.200000  
2         1.0  1.20  0.083333         inf       0.333333  
3         1.0  1.20  0.083333         inf       0.333333  


___
- ### Using Groceries dataset

In [6]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

In [7]:
# Load the dataset
groceries = pd.read_csv("Groceries_dataset.csv")
print(groceries.shape)
print(groceries.head())

(38765, 3)
   Member_number        Date   itemDescription
0           1808  21-07-2015    tropical fruit
1           2552  05-01-2015        whole milk
2           2300  19-09-2015         pip fruit
3           1187  12-12-2015  other vegetables
4           3037  01-02-2015        whole milk


In [8]:
# Get all the transactions as a list of lists
all_transactions = [transaction[1]['itemDescription'].tolist() for transaction in list(groceries.groupby(['Member_number', 'Date']))]
print(all_transactions[:3])
# Transform data using TransactionEncoder
te = TransactionEncoder()
te_ary = te.fit(all_transactions).transform(all_transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)

[['sausage', 'whole milk', 'semi-finished bread', 'yogurt'], ['whole milk', 'pastry', 'salty snack'], ['canned beer', 'misc. beverages']]


In [9]:
print(df.head(2))

   Instant food products  UHT-milk  abrasive cleaner  artif. sweetener  \
0                  False     False             False             False   
1                  False     False             False             False   

   baby cosmetics   bags  baking powder  bathroom cleaner   beef  berries  \
0           False  False          False             False  False    False   
1           False  False          False             False  False    False   

   ...  turkey  vinegar  waffles  whipped/sour cream  whisky  white bread  \
0  ...   False    False    False               False   False        False   
1  ...   False    False    False               False   False        False   

   white wine  whole milk  yogurt  zwieback  
0       False        True    True     False  
1       False        True   False     False  

[2 rows x 167 columns]


In [10]:
# Generate frequent itemsets with a lower minimum support
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.1)
print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
     support                        itemsets
0   0.021386                      (UHT-milk)
1   0.033950                          (beef)
2   0.021787                       (berries)
3   0.016574                     (beverages)
4   0.045312                  (bottled beer)
..       ...                             ...
64  0.010559  (rolls/buns, other vegetables)
65  0.014837  (other vegetables, whole milk)
66  0.013968        (rolls/buns, whole milk)
67  0.011629              (whole milk, soda)
68  0.011161            (yogurt, whole milk)

[69 rows x 2 columns]

Association Rules:
          antecedents   consequents  antecedent support  consequent support  \
0  (other vegetables)  (whole milk)            0.122101            0.157923   
1        (rolls/buns)  (whole milk)            0.110005            0.157923   
2              (soda)  (whole milk)            0.097106            0.157923   
3            (yogurt)  (whole milk)            0.085879            0.157923   

  

___
___
## Conclusion

The Apriori Algorithm is a fundamental method for mining frequent itemsets and discovering association rules in transaction datasets. Its simplicity and effectiveness make it a popular choice in various domains such as market basket analysis and recommendation systems.

### Key Points

- **Identifying Patterns**: Apriori helps in identifying patterns and relationships between items in large datasets.
- **Efficiency**: The algorithm is efficient in handling large datasets with numerous transactions.
- **Versatility**: Can be applied in various fields beyond market basket analysis, such as bioinformatics and web usage mining.

The provided Python implementation demonstrates how to apply the Apriori Algorithm to a sample dataset, showcasing the identification of frequent itemsets and generation of association rules. Lowering the minimum support threshold can help in identifying more frequent itemsets and generating meaningful association rules.
