<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Python-Notebook-Banners/Examples.png"  style="display: block; margin-left: auto; margin-right: auto;";/>
</div>

# Examples: Association
© ExploreAI Academy

In this train, we will explore the fundamentals of association rule learning and the Apriori algorithm, including their definitions, steps, and practical applications in data mining.

## Learning objectives

By the end of this train, you should be able to:
- Understand the concept and significance of association rules in data mining.
- Learn the steps and mechanics of the Apriori algorithm for finding frequent item sets.
- Apply the Apriori algorithm using Python to uncover patterns in transactional data.

## Overview

Association rule learning is a critical data mining method used to **uncover interesting relationships among variables within large datasets**. This technique is particularly effective in **market basket analysis**, where it aims to find sets of products that frequently co-occur in transactions. For instance, understanding that customers who buy bread often buy butter can help businesses improve product placement, enhance cross-selling strategies, and optimise inventory management.

The Apriori algorithm is one of the most well-known algorithms for **mining frequent item sets and generating association rules**. It operates on the principle that if an item set is frequent, then all of its subsets must also be frequent. This property, known as the **Apriori property**, helps reduce the computational complexity involved in discovering frequent item sets.

## Association rule

### Definition

Association rules are used to identify relationships among a set of items in transactional data. Each rule is represented in the form of X → Y, where \( X \) is the antecedent (left-hand side) and \( Y \) is the consequent (right-hand side). The antecedent is a set of items found in the transaction data, and the consequent is another set of items that often occur with the antecedent.

### Example

Consider a retail scenario where we have transactional data of customer purchases. An example of an association rule could be:
- **Rule:** {milk, bread} → {butter}
- **Interpretation:** If a customer buys milk and bread, they are likely to also buy butter.

### Metrics

To evaluate the strength of an association rule, we use several metrics:

- **Support:** This measures how frequently the item set appears in the dataset. For an item set \(X\), support is defined as:

$$
\text{Support}(X) = \frac{\text{Number of transactions containing } X}{\text{Total number of transactions}}
$$


- **Confidence:** This measures how often the items in \(Y\) appear in transactions that contain \(X\). For a rule X → Y, confidence is defined as:
  
$$
\text{Confidence}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X)}
$$


- **Lift:** This measures the ratio of the observed support to that expected if \(X\) and \(Y\) were independent. For a rule X → Y, lift is defined as:
  
$$
\text{Lift}(X \rightarrow Y) = \frac{\text{Support}(X \cup Y)}{\text{Support}(X) \times \text{Support}(Y)}
$$


Higher values of support, confidence, and lift indicate stronger associations.

## Apriori algorithm

### Definition

The Apriori algorithm is a fundamental algorithm used to find frequent item sets and generate association rules. It uses a breadth-first search strategy and applies a principle known as the "Apriori property." This property states that if an item set is frequent, then all of its non-empty subsets must also be frequent.

### Steps

The Apriori algorithm consists of the following steps:

1. **Generate candidate item sets:** Generate all possible item sets of a given length from the dataset.
2. **Prune:** Remove item sets that do not meet the minimum support threshold. This step reduces the number of candidate item sets to consider.
3. **Repeat:** Increase the length of item sets by one and repeat the process of generating and pruning until no more frequent item sets are found.

### Example 1

To better understand the Apriori algorithm, let's go through a detailed example. 

**Dataset:**
Imagine we have the following transactions in a supermarket:

```
T1: {milk, bread, butter}
T2: {beer, bread}
T3: {milk, bread, butter, beer}
T4: {bread, butter}
T5: {milk, bread, beer}
```

Here's how we can implement the Apriori algorithm step-by-step:

**Step 1: Generate candidate item_sets of length 1**
- This means we look at all the individual items in the transactions and count their occurrences.
- Candidate item_sets of length 1: {milk}, {bread}, {butter}, {beer}

In [2]:
from collections import Counter

# Sample dataset
transactions = [
    ['milk', 'bread', 'butter'],
    ['beer', 'bread'],
    ['milk', 'bread', 'butter', 'beer'],
    ['bread', 'butter'],
    ['milk', 'bread', 'beer']
]

# Step 1: Generate candidate item_sets of length 1
items = Counter(item for transaction in transactions for item in transaction)

# items_list = []
# for transaction in transactions:
#     for item in transaction:
#         items_list.append(item)
        
print("Candidate item sets of Length 1:")
for item, count in items.items():
    print(f"{item}: {count}")

Candidate item sets of Length 1:
milk: 3
bread: 5
butter: 3
beer: 3


**Step 2: Prune Non-Frequent item sets**
- Set a minimum support threshold (e.g., 0.6) and prune item sets that do not meet this threshold.
- Minimum support threshold = 3 (since we have 5 transactions, 0.6 * 5 = 3)

In [6]:
# Minimum support threshold
min_support = 3

# Pruning step
frequent_items = {item: count for item, count in items.items() if count >= min_support}
print("Frequent item sets of length 1:")
for item, count in frequent_items.items():
    print(f"{item}: {count}")

Frequent item sets of length 1:
milk: 3
bread: 5
butter: 3
beer: 3


**Step 3: Generate Candidate item sets of Length 2**
- Combine frequent item sets of length 1 to generate item sets of length 2.

In [8]:
from itertools import combinations

# Generate candidate item sets of length 2
candidate_item_sets_2 = list(combinations(frequent_items, 2))
print("Candidate item sets of length 2:")
for item_set in candidate_item_sets_2:
    print(item_set)

Candidate item sets of length 2:
('milk', 'bread')
('milk', 'butter')
('milk', 'beer')
('bread', 'butter')
('bread', 'beer')
('butter', 'beer')


**Step 4: Prune Non-Frequent item sets of Length 2**
- Count the occurrences of each item set in the transactions and prune those that do not meet the minimum support threshold.

In [10]:
# Count occurrences of each candidate item set in the transactions
item_set_counts_2 = Counter()
for transaction in transactions:
    for item_set in candidate_item_sets_2:
        if all(item in transaction for item in item_set):
            item_set_counts_2[item_set] += 1

# Pruning step
frequent_item_sets_2 = {item_set: count for item_set, count in item_set_counts_2.items() if count >= min_support}
print("Frequent item sets of length 2:")
for item_set, count in frequent_item_sets_2.items():
    print(f"{item_set}: {count}")

Frequent item sets of length 2:
('milk', 'bread'): 3
('bread', 'butter'): 3
('bread', 'beer'): 3


**Step 5: Generate Candidate item sets of Length 3**
- Combine frequent item sets of length 2 to generate item sets of length 3.

In [12]:
# Generate candidate item sets of length 3
candidate_item_sets_3 = [tuple(sorted(set(a) | set(b))) for a in frequent_item_sets_2 for b in frequent_item_sets_2 if len(set(a) | set(b)) == 3]
candidate_item_sets_3 = list(set(candidate_item_sets_3))  # Remove duplicates
print("Candidate item sets of length 3:")
for item_set in candidate_item_sets_3:
    print(item_set)

Candidate item sets of length 3:
('beer', 'bread', 'milk')
('bread', 'butter', 'milk')
('beer', 'bread', 'butter')


**Step 6: Prune Non-Frequent item sets of Length 3**
- Count the occurrences of each item set in the transactions and prune those that do not meet the minimum support threshold.


In [14]:
# Count occurrences of each candidate item set in the transactions
item_set_counts_3 = Counter()
for transaction in transactions:
    for item_set in candidate_item_sets_3:
        if all(item in transaction for item in item_set):
            item_set_counts_3[item_set] += 1

# Pruning step
frequent_item_sets_3 = {item_set: count for item_set, count in item_set_counts_3.items() if count >= min_support}
print("Frequent item sets of length 3:")
for item_set, count in frequent_item_sets_3.items():
    print(f"{item_set}: {count}")

Frequent item sets of length 3:


Since no more frequent item sets can be generated, the algorithm stops here.

## Example 2

Here’s a practical implementation of the Apriori algorithm using Python with the `apyori` library:

In [16]:
# Uncomment the line below to install the required library
!pip install apyori

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py): started
  Building wheel for apyori (setup.py): finished with status 'done'
  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5976 sha256=b898761df256d4aa0e6c5701ac425152d4363bfabaa53948ee7cfe93551c452f
  Stored in directory: c:\users\tcala\appdata\local\pip\cache\wheels\7f\49\e3\42c73b19a264de37129fadaa0c52f26cf50e87de08fb9804af
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [18]:
# Import the required modules
from apyori import apriori

# Sample dataset
transactions = [
    ['milk', 'bread', 'butter'],
    ['beer', 'bread', 'butter'],
    ['milk', 'bread', 'butter', 'beer'],
    ['bread', 'butter'],
    ['milk', 'bread', 'beer'],
    ['milk', 'butter'],
    ['bread', 'butter', 'beer'],
    ['milk', 'bread', 'butter', 'beer'],
    ['bread', 'butter'],
    ['milk', 'bread', 'beer']
]

# Applying Apriori 
rules = apriori(transactions, min_support=0.3, min_confidence=0.8, min_lift=1.2)

# Collecting results and converting them into a list
results = list(rules)

# Filtering results to include only those with a minimum length of 2
filtered_results = [result for result in results if len(result.items) >= 2]

# Displaying results
if not filtered_results:
    print("No rules found with the specified criteria.")
else:
    for result in filtered_results:
        # Print the items in the rule
        print(f"Final set: {list(result.items)}")
        
        # Print the support of the rule
        print(f"Support: {result.support}")
        
        # Print the confidence and lift of the rule
        for stat in result.ordered_statistics:
            print(f"Confidence: {stat.confidence}")
            print(f"Lift: {stat.lift}")
            items_base1 = stat.items_base
            items_add1 = stat.items_add
            print('Potential Apriori Rule: ', list(items_base1), '  -> ', list(items_add1))
        
        print("====================================")


Final set: ['bread', 'beer', 'milk']
Support: 0.4
Confidence: 0.8
Lift: 1.3333333333333335
Potential Apriori Rule:  ['bread', 'milk']   ->  ['beer']


## Conclusion

The Apriori algorithm plays a crucial role in data mining for discovering frequent item sets and generating association rules. It leverages the Apriori property to efficiently prune the search space, making it feasible to mine large datasets. This algorithm is widely used in various applications, such as market basket analysis, to uncover significant associations and patterns that can inform strategic decisions.

Understanding and implementing the Apriori algorithm equips data scientists with a powerful tool for uncovering hidden patterns in transactional data, ultimately driving better business insights and decision-making processes.

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>