# **Apriori Algorithm**

### **Introduction and step by step example solved for Apriori Algorithm**

- ### **Introduction**
  - Apriori Algorithm is a famous frequent pattern mining method.
  - It scans dataset repeatedly and generate items sets by bottom - top approach.
  - The name is Apriori because it uses prior knowledge of frequent itemset properties.
    - #### **Situations Where Apriori is Used**
      - **Retail** Local sellers bundle items like onions and potatoes or offer discounts because customers who buy one often buy the other, boosting sales.
      - **Supermarkets** Bundling items like bread, butter, and jam for customer convenience and increased sales.
      - This algorithm has utility in the field of healthcare as it can help in detecting adverse drug reactions (ADR) by producing association rules to indiacte the combination of medications and patient characteristics that could lead to ADRs.
        - #### **Components**
          - The core components of the Apriori Algorithm include
            - **Support** The frequency of an itemset appearing in the dataset.
            - **Confidence** The probability that an item is purchased given that another item is purchased.
            - **Lift** The ratio of the observed support to that expected if the items were independent.
        - #### **Example Scenario**
          - Let us suppose, wh have 2000 customer transactions in a supermarket.
          - We have to find the Support, Confidence and Lift for two items, say bread and jam. It is because people frequently bundle these two items together.
          - Out of the 2000 transactions, 200 contain jam whereas 300 contain bread. These 300 transactions include a 100 that includes breads as well as jam.
          - Using this data, we shall find out the Support, Confidence and Lift.
          - **Calculations**
            - **Support (Jam)**
              - We can calculate the Support as a quotient of the division of the number of transactions containing that item by the total number of transactions.
              - For our example -
                - Support(Jam) = (Transactions involving jam)/(Total transactions)
                  - = 200/2000
                  - = 10%
            - **Confidence (Bread and Jam)**
              - Confidence is the likelihood that customers bought both bread and jam.
              - Dividing the numebr of transactions that include both bread and jam bu the total number of transactions will give the confidence figure.
              - Confidence (Bread and Jam)
                - = (Transaction involving both bread and jam)/(Total transaction involving jam)
                - = 100/200 = 50%
              - It implies that 50% of customers who brought jam brought bread as well.
            - **Lift**
              - Lift is the increase in the ratio of the sale of bread when you sell jam.
              - The lift can be calculated using the confidence of bread and jam divided by the support of jam.
              - For our example -
              - Lift = (Confidence (Bread and Jam))/(Support(Jam))
                - = 50/10 = 5
                - It says that the likelihood of a customer buying both jama and bread together is 5 times more than the chance of purchasing jam alone. If the Lift value is less than 1, It entails that the customers are unlikely to buy both the items together.
                - **Greater the value, better is the combination.**
          - #### **How does Apriori Algorithm Work?**
            - We shall explain this using a very simple example.
            - Consider a supermarket scenario where the itemset is | = {Onion, Burger,Potato, Milk, Beer}.
            - The database consists of six transactions where 1 represents the presence of the item and 0 the absence.
              - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (1).png>)
          - #### **Assumptions**
            - The Apriori Algorithm makes the following assumptions.
              - All subsets of a frequent itemset should be frequent.
              - All subsets of an infrequent itemset should be indrequent.
              - Set a thereshold support level. In our case, we shall fix it at 50%.
          - ### **Flowchart of Apriori Algorithm**
            - The flowchart visualizes the step-by-step process of the Apriori algorithm, guiding through the process of generating frequent itemsets and association rules.
            - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (2).png>)
          - #### **Taking the given example**
            - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (3).png>)
            - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (4).png>)
            - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (5).png>)
          - ### **Subset Creation**
            - The subset creation process involves generating combinations of items and filtering them based on the support threshold. The visuals illustrate how subsets are created and evaluated.
            - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (6).png>)
            - ![Apriori_Algorithm](<./Images/Apriori_Algorithm (7).png>)

---

## **Project- Market Basket Analysis with Apriori Algorithm**

- **Objective** Analyze transaction data using the Apriori algorithm to identify frequent itemsets and generate association rules.
- **Description** For simplicity, we'll use a synthetic dataset created within the code rather than importing from an external file.

### **Implementation**

#### **Import necessary libraries**


In [79]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

#### **Create a synthetic transaction dataset**

In [80]:
data = {
    'TransactionID': [1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4],
    'Item': ['Milk', 'Bread', 'Milk', 'Bread', 'Butter', 'Milk', 'Bread', 'Butter', 'Milk', 'Bread', 'Butter']
}
df = pd.DataFrame(data)

#### **Preprocess the data**

In [81]:
def preprocess_data(data):
    basket = (data.groupby(['TransactionID', 'Item'])['Item']
              .count().unstack().reset_index().fillna(0)
              .set_index('TransactionID'))
    basket = basket.apply(lambda x: x > 0)
    return basket

basket_data = preprocess_data(df)

#### **Apply the Apriori algorithm**

In [82]:
frequent_itemsets = apriori(basket_data, min_support=0.5, use_colnames=True)

#### **Generate association rules**

In [83]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

#### **Display the results**

In [84]:
print("Frequent Itemsets:")
print(frequent_itemsets)

print("\nAssociation Rules:")
print(rules)

Frequent Itemsets:
   support               itemsets
0     1.00                (Bread)
1     0.75               (Butter)
2     1.00                 (Milk)
3     0.75        (Bread, Butter)
4     1.00          (Milk, Bread)
5     0.75         (Milk, Butter)
6     0.75  (Milk, Bread, Butter)

Association Rules:
        antecedents      consequents  antecedent support  consequent support  \
0           (Bread)         (Butter)                1.00                0.75   
1          (Butter)          (Bread)                0.75                1.00   
2            (Milk)          (Bread)                1.00                1.00   
3           (Bread)           (Milk)                1.00                1.00   
4            (Milk)         (Butter)                1.00                0.75   
5          (Butter)           (Milk)                0.75                1.00   
6     (Milk, Bread)         (Butter)                1.00                0.75   
7    (Milk, Butter)          (Bread)             





---
<div align="center">
    Created by Ankit Dimri  
    © 2024
</div>
