# Lab 08: Association Rule Mining

**Objective:** This lab aims to introduce the concept of association rule mining, a popular technique for discovering interesting relationships hidden in large datasets. We will explore how to use existing Python packages to perform association rule mining and interpret the results. Finally, you will apply these techniques to a new dataset.

## 1. Introduction to Association Rule Mining

Association rule mining is a rule-based machine learning method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using some measures of interestingness.

The classic example is the "market basket analysis," where a retailer tries to understand purchasing behaviors of customers. For example, an association rule might be: `{Diapers} -> {Beer}`. This rule suggests that customers who buy diapers also tend to buy beer.

Key concepts in association rule mining include:

* **Itemset:** A collection of one or more items. E.g., `{Milk, Bread, Diaper}`.
* **Support:** The fraction of transactions that contain an itemset. It indicates the popularity of an itemset. 
    $$Support(X) = \frac{\text{Number of transactions containing X}}{\text{Total number of transactions}}$$
* **Confidence:** Measures how often items in Y appear in transactions that contain X. It indicates the likelihood of item Y being purchased when item X is purchased.
    $$Confidence(X \rightarrow Y) = \frac{Support(X \cup Y)}{Support(X)}$$
* **Lift:** Measures how much more often X and Y occur together than expected if they were statistically independent. A lift greater than 1 suggests a positive association.
    $$Lift(X \rightarrow Y) = \frac{Support(X \cup Y)}{Support(X) \times Support(Y)}$$
* **Antecedent (LHS):** The itemset on the left-hand side of the rule (e.g., `{Diapers}`).
* **Consequent (RHS):** The itemset on the right-hand side of the rule (e.g., `{Beer}`).

The most common algorithm for association rule mining is the **Apriori algorithm**.

## 2. Setup

First, let's install and import the necessary Python libraries. We'll primarily use `pandas` for data manipulation and `mlxtend` for association rule mining.

In [2]:
# Install mlxtend if you haven't already
!pip install mlxtend

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

Defaulting to user installation because normal site-packages is not writeable


## 3. Example 1: Basic Association Rule Mining

Let's start with a simple dataset of transactions.

In [3]:
# Sample transaction data
dataset1 = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
            ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
            ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
            ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
            ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

# Print the dataset
print("Raw Dataset 1:")
for transaction in dataset1:
    print(transaction)

Raw Dataset 1:
['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt']
['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt']
['Milk', 'Apple', 'Kidney Beans', 'Eggs']
['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt']
['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']


### 3.1. Data Preprocessing

The Apriori algorithm expects data in a one-hot encoded format, where each row represents a transaction and each column represents an item. The value is `True` or `1` if the item is in the transaction, and `False` or `0` otherwise.

In [4]:
te = TransactionEncoder()
te_ary = te.fit(dataset1).transform(dataset1)
df1 = pd.DataFrame(te_ary, columns=te.columns_)

print("\nOne-Hot Encoded DataFrame 1:")
df1


One-Hot Encoded DataFrame 1:


Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,False,False,False,True,False,True,True,True,True,False,True
1,False,False,True,True,False,True,False,True,True,False,True
2,True,False,False,True,False,True,True,False,False,False,False
3,False,True,False,False,False,True,True,False,False,True,True
4,False,True,False,True,True,True,False,False,True,False,False


### 3.2. Apply Apriori Algorithm

Now, we apply the Apriori algorithm to find frequent itemsets. The `min_support` parameter specifies the minimum support threshold for an itemset to be considered frequent.

In [5]:
# Find frequent itemsets with min_support = 0.6 (i.e., itemset appears in at least 60% of transactions)
frequent_itemsets1 = apriori(df1, min_support=0.6, use_colnames=True)

print("Frequent Itemsets (min_support=0.6):")
frequent_itemsets1

Frequent Itemsets (min_support=0.6):


Unnamed: 0,support,itemsets
0,0.8,(Eggs)
1,1.0,(Kidney Beans)
2,0.6,(Milk)
3,0.6,(Onion)
4,0.6,(Yogurt)
5,0.8,"(Eggs, Kidney Beans)"
6,0.6,"(Eggs, Onion)"
7,0.6,"(Kidney Beans, Milk)"
8,0.6,"(Kidney Beans, Onion)"
9,0.6,"(Kidney Beans, Yogurt)"


### 3.3. Generate Association Rules

Once we have the frequent itemsets, we can generate association rules. We'll use `confidence` as the metric and set a minimum threshold (e.g., `min_threshold=0.7`).

In [6]:
# Generate association rules with min_confidence = 0.7
rules1 = association_rules(frequent_itemsets1, metric="confidence", min_threshold=0.7)

print("Association Rules (min_confidence=0.7):")
rules1[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Association Rules (min_confidence=0.7):


  cert_metric = np.where(certainty_denom == 0, 0, certainty_num / certainty_denom)


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Eggs),(Kidney Beans),0.8,1.0,1.0
1,(Kidney Beans),(Eggs),0.8,0.8,1.0
2,(Eggs),(Onion),0.6,0.75,1.25
3,(Onion),(Eggs),0.6,1.0,1.25
4,(Milk),(Kidney Beans),0.6,1.0,1.0
5,(Onion),(Kidney Beans),0.6,1.0,1.0
6,(Yogurt),(Kidney Beans),0.6,1.0,1.0
7,"(Eggs, Kidney Beans)",(Onion),0.6,0.75,1.25
8,"(Kidney Beans, Onion)",(Eggs),0.6,1.0,1.25
9,"(Eggs, Onion)",(Kidney Beans),0.6,1.0,1.0


### 3.4. Interpret the Results

Let's look at one of the rules:
- **Rule:** `{Onion} -> {Eggs}`
- **Support:** This value (e.g., 0.6) means that 60% of all transactions contain both Onion and Eggs.
- **Confidence:** If the confidence is 1.0, it means that 100% of the transactions that contain Onion also contain Eggs.
- **Lift:** If the lift is, for example, 1.25, it means that customers are 1.25 times more likely to buy Eggs if they buy Onion, compared to if the purchase of Eggs was independent of the purchase of Onion. A lift > 1 indicates a positive correlation.

## 4. Example 2: Exploring Different Parameters and a slightly different view

Let's use another small dataset and see how changing `min_support` and `min_threshold` for confidence affects the rules generated. This dataset is already in a list of lists format, suitable for `TransactionEncoder`.

In [7]:
dataset2 = [['bread', 'milk', 'butter'],
            ['bread', 'butter', 'cheese', 'jam'],
            ['milk', 'butter', 'cheese'],
            ['bread', 'milk', 'jam'],
            ['bread', 'milk', 'butter', 'cheese'],
            ['tea', 'milk'],
            ['bread', 'butter', 'jam']]

te2 = TransactionEncoder()
te_ary2 = te2.fit(dataset2).transform(dataset2)
df2 = pd.DataFrame(te_ary2, columns=te2.columns_)

print("One-Hot Encoded DataFrame 2:")
df2

One-Hot Encoded DataFrame 2:


Unnamed: 0,bread,butter,cheese,jam,milk,tea
0,True,True,False,False,True,False
1,True,True,True,True,False,False
2,False,True,True,False,True,False
3,True,False,False,True,True,False
4,True,True,True,False,True,False
5,False,False,False,False,True,True
6,True,True,False,True,False,False


### 4.1. Experimenting with `min_support`

Let's try a lower `min_support` first.

In [8]:
# Lower min_support
frequent_itemsets2_low_support = apriori(df2, min_support=0.2, use_colnames=True) # itemset appears in at least ~20% of transactions
print("Frequent Itemsets (min_support=0.2):")
print(frequent_itemsets2_low_support)

# Generate rules with min_confidence = 0.6
rules2_low_support = association_rules(frequent_itemsets2_low_support, metric="confidence", min_threshold=0.6)
print("\nAssociation Rules (min_support=0.2, min_confidence=0.6):")
rules2_low_support[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Frequent Itemsets (min_support=0.2):
     support                 itemsets
0   0.714286                  (bread)
1   0.714286                 (butter)
2   0.428571                 (cheese)
3   0.428571                    (jam)
4   0.714286                   (milk)
5   0.571429          (butter, bread)
6   0.285714          (cheese, bread)
7   0.428571             (jam, bread)
8   0.428571            (milk, bread)
9   0.428571         (cheese, butter)
10  0.285714            (jam, butter)
11  0.428571           (milk, butter)
12  0.285714           (milk, cheese)
13  0.285714  (cheese, butter, bread)
14  0.285714     (jam, butter, bread)
15  0.285714    (milk, butter, bread)
16  0.285714   (milk, cheese, butter)

Association Rules (min_support=0.2, min_confidence=0.6):


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(butter),(bread),0.571429,0.8,1.12
1,(bread),(butter),0.571429,0.8,1.12
2,(cheese),(bread),0.285714,0.666667,0.933333
3,(jam),(bread),0.428571,1.0,1.4
4,(bread),(jam),0.428571,0.6,1.4
5,(milk),(bread),0.428571,0.6,0.84
6,(bread),(milk),0.428571,0.6,0.84
7,(cheese),(butter),0.428571,1.0,1.4
8,(butter),(cheese),0.428571,0.6,1.4
9,(jam),(butter),0.285714,0.666667,0.933333


Now, let's try a higher `min_support`.

In [9]:
# Higher min_support
frequent_itemsets2_high_support = apriori(df2, min_support=0.5, use_colnames=True) # itemset appears in at least 50% of transactions
print("Frequent Itemsets (min_support=0.5):")
print(frequent_itemsets2_high_support)

# Generate rules with min_confidence = 0.6
rules2_high_support = association_rules(frequent_itemsets2_high_support, metric="confidence", min_threshold=0.6)
print("\nAssociation Rules (min_support=0.5, min_confidence=0.6):")
rules2_high_support[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Frequent Itemsets (min_support=0.5):
    support         itemsets
0  0.714286          (bread)
1  0.714286         (butter)
2  0.714286           (milk)
3  0.571429  (butter, bread)

Association Rules (min_support=0.5, min_confidence=0.6):


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(butter),(bread),0.571429,0.8,1.12
1,(bread),(butter),0.571429,0.8,1.12


**Observation:**
You should notice that a lower `min_support` results in more frequent itemsets and potentially more rules. A higher `min_support` leads to fewer, more common itemsets and rules.

### 4.2. Experimenting with `min_threshold` for Confidence

Let's use the frequent itemsets from `min_support=0.2` and vary the `min_threshold` for confidence.

In [10]:
# Using frequent_itemsets2_low_support (min_support=0.2)

# Lower min_confidence
rules2_low_confidence = association_rules(frequent_itemsets2_low_support, metric="confidence", min_threshold=0.5)
print("Association Rules (min_support=0.2, min_confidence=0.5):")
print(rules2_low_confidence[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

# Higher min_confidence
rules2_high_confidence = association_rules(frequent_itemsets2_low_support, metric="confidence", min_threshold=0.8)
print("\nAssociation Rules (min_support=0.2, min_confidence=0.8):")
rules2_high_confidence[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Association Rules (min_support=0.2, min_confidence=0.5):
         antecedents      consequents   support  confidence      lift
0           (butter)          (bread)  0.571429    0.800000  1.120000
1            (bread)         (butter)  0.571429    0.800000  1.120000
2           (cheese)          (bread)  0.285714    0.666667  0.933333
3              (jam)          (bread)  0.428571    1.000000  1.400000
4            (bread)            (jam)  0.428571    0.600000  1.400000
5             (milk)          (bread)  0.428571    0.600000  0.840000
6            (bread)           (milk)  0.428571    0.600000  0.840000
7           (cheese)         (butter)  0.428571    1.000000  1.400000
8           (butter)         (cheese)  0.428571    0.600000  1.400000
9              (jam)         (butter)  0.285714    0.666667  0.933333
10            (milk)         (butter)  0.428571    0.600000  0.840000
11          (butter)           (milk)  0.428571    0.600000  0.840000
12          (cheese)           (m

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(jam),(bread),0.428571,1.0,1.4
1,(cheese),(butter),0.428571,1.0,1.4
2,"(cheese, bread)",(butter),0.285714,1.0,1.4
3,"(jam, butter)",(bread),0.285714,1.0,1.4
4,"(milk, cheese)",(butter),0.285714,1.0,1.4


**Observation:**
A lower `min_threshold` for confidence will generate more rules, including those where the antecedent doesn't strongly imply the consequent. A higher `min_threshold` will yield fewer, but stronger, rules where the implication is more certain.

## 5. Task: Groceries Dataset

Now it's your turn! We have a small dataset of grocery items from a few transactions. Your task is to perform association rule mining on this dataset.

In [58]:
# Student dataset
student_dataset = [
    ['Apples', 'Bananas', 'Cereal'],
    ['Milk', 'Bread', 'Butter'],
    ['Apples', 'Bread', 'Eggs'],
    ['Bananas', 'Milk', 'Cereal', 'Sugar'],
    ['Apples', 'Milk', 'Bread', 'Butter'],
    ['Coffee', 'Sugar', 'Cookies'],
    ['Apples', 'Bananas', 'Bread'],
    ['Milk', 'Cereal', 'Sugar'],
    ['Apples', 'Bread', 'Butter', 'Cheese'],
    ['Bananas', 'Cereal', 'Yogurt']
]

# Print the student dataset
print("Student Dataset:")
for transaction in student_dataset:
    print(transaction)

Student Dataset:
['Apples', 'Bananas', 'Cereal']
['Milk', 'Bread', 'Butter']
['Apples', 'Bread', 'Eggs']
['Bananas', 'Milk', 'Cereal', 'Sugar']
['Apples', 'Milk', 'Bread', 'Butter']
['Coffee', 'Sugar', 'Cookies']
['Apples', 'Bananas', 'Bread']
['Milk', 'Cereal', 'Sugar']
['Apples', 'Bread', 'Butter', 'Cheese']
['Bananas', 'Cereal', 'Yogurt']


### Your Tasks:

1.  **Load and Preprocess the Data:**
    * Use `TransactionEncoder` to transform `student_dataset` into a one-hot encoded pandas DataFrame.

In [59]:
# Your code here for Task 1

te_student = TransactionEncoder()
te_ary_student = te_student.fit(student_dataset).transform(student_dataset)
df_student = pd.DataFrame(te_ary_student, columns=te_student.columns_)

print("One-Hot Encoded Student DataFrame:")
df_student

One-Hot Encoded Student DataFrame:


Unnamed: 0,Apples,Bananas,Bread,Butter,Cereal,Cheese,Coffee,Cookies,Eggs,Milk,Sugar,Yogurt
0,True,True,False,False,True,False,False,False,False,False,False,False
1,False,False,True,True,False,False,False,False,False,True,False,False
2,True,False,True,False,False,False,False,False,True,False,False,False
3,False,True,False,False,True,False,False,False,False,True,True,False
4,True,False,True,True,False,False,False,False,False,True,False,False
5,False,False,False,False,False,False,True,True,False,False,True,False
6,True,True,True,False,False,False,False,False,False,False,False,False
7,False,False,False,False,True,False,False,False,False,True,True,False
8,True,False,True,True,False,True,False,False,False,False,False,False
9,False,True,False,False,True,False,False,False,False,False,False,True


2.  **Apply the Apriori Algorithm:**
    * Find frequent itemsets using the Apriori algorithm. Choose a `min_support` value that you think is reasonable for this dataset (e.g., an itemset should appear in at least 2 or 3 transactions). Justify your choice briefly.

In [62]:
# Your code here for Task 2
# Justification for min_support:
# I decided to choose a min_support of 0.3 since a min_support of 2 lead to too many itemsets being considered frequent (around 20). From a logical standpoint, many of the itemsets didn't 
# immediately seem like they'd make sense (such as buying both apples and butter), while having a min_support of 0.3 still kept itemsets that had items that logically pair together (such as bread
# and butter, and bananas and cereal (commonly eaten together)). Also, in the next part, a min_support of 0.2 gave me 32 rules for 0.5 min_confidence and 21 rules for 0.6 min_confidence, which 
# seems like too many rules to be useful (especially in comparison to the 6 rules I get later with a min_support of 0.3). Therefore, since I get fewer but clearer rules with a min_support of 0.3, 
# I decided it would be better to use a min_support of 0.3 since I'd get more common, safer association rules.

# The dataset has 10 transactions. A min_support of 0.2 means an itemset must appear in at least 10 * 0.2 = 2 transactions.
# A min_support of 0.3 means an itemset must appear in at least 10 * 0.3 = 3 transactions.
# Let's start with min_support = 0.2, as it's a small dataset and we want to see some initial patterns.

min_support = 0.3

frequent_itemsets_student_data = apriori(df_student, min_support=min_support, use_colnames=True)

print(f"Frequent Itemsets (min_support={min_support}):")
print(frequent_itemsets_student_data)

Frequent Itemsets (min_support=0.3):
   support           itemsets
0      0.5           (Apples)
1      0.4          (Bananas)
2      0.5            (Bread)
3      0.3           (Butter)
4      0.4           (Cereal)
5      0.4             (Milk)
6      0.3            (Sugar)
7      0.4    (Bread, Apples)
8      0.3  (Bananas, Cereal)
9      0.3    (Bread, Butter)


3.  **Generate Association Rules:**
    * Generate association rules from the frequent itemsets. Choose a `min_threshold` for confidence (e.g., 0.5 or 0.6). Justify your choice briefly.

In [63]:
# Your code here for Task 3
# Justification for min_threshold (confidence): 
# Since I chose a min_support of 0.3, it happened to be the case that all association rules generated had a confidence of 0.6 or higher, so using 0.5 vs 0.6 for min_support had no impact. 
# However, when I had a min_support of 0.2, I noticed that I got 32 association rules with a min_confidence of 0.5 vs 21 for a min_confidence of 0.6. It seemed like there were too many rules for
# easy data analysis when I had a confidence of 0.5, so in that case a confidence of 0.6 seemed better. Although I'm using a min_support of 0.3 which happens to mean that a min_confidence value
# of 0.5 vs 0.6 has no change at all, I decided to go with 0.6 just because it worked better in the 0.2 min_support case.

# A confidence of 0.5 means that in 50% of the cases where the antecedent is present, the consequent is also present.
# For a small dataset, this might reveal some initial interesting rules without being too restrictive.

min_confidence = 0.6
rules_confidence = association_rules(frequent_itemsets_student_data, metric="confidence", min_threshold=min_confidence)
print(f"Association Rules (min_confidence={min_confidence}):")
rules_confidence[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

Association Rules (min_confidence=0.6):


Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(Bread),(Apples),0.4,0.8,1.6
1,(Apples),(Bread),0.4,0.8,1.6
2,(Bananas),(Cereal),0.3,0.75,1.875
3,(Cereal),(Bananas),0.3,0.75,1.875
4,(Bread),(Butter),0.3,0.6,2.0
5,(Butter),(Bread),0.3,1.0,2.0


4.  **Identify and Interpret Interesting Rules:**
    * From the generated rules, select 2-3 rules that you find interesting.
    * For each selected rule, explain what it means in the context of the grocery data. Discuss its support, confidence, and lift.

*(Double-click here to edit and write your interpretation for Task 4)*

**Example Interpretation (you will pick your own rules from your results):**

**Rule 1: Butter -> Bread**
* **Support:** 0.3 - This means that 30% of all transactions contain both Butter and Bread.
* **Confidence:** 1.00 - This means that 100% of the transactions that contain Butter also contain Bread.
* **Lift:** 2.00 - This means that customers are 2.00 times more likely to buy Bread if they buy Butter, compared to if the purchases were independent.
* **Interestingness:** I find this rule interesting not only because it has a confidence of 100%, but also since it makes sense that most people would buy bread and butter together, as many people eat toast with butter and thus it's a very logical combination. It's also interesting that it has such a high lift.

**Rule 2: Bread -> Apples**
* **Support:** 0.4 - This means that 40% of all transactions contain both Bread and Apples.
* **Confidence:** 0.80 - This means that 80% of the transactions that contain Bread also contain Apples.
* **Lift:** 1.60 - This means that customers are 1.60 times more likely to buy Apples if they buy Bread, compared to if the purchases were independent.
* **Interestingness:** I find this rule interesting since it has a very high support of 0.4, although the combination itself isn't inherently intuitive. Eating apples with bread together isn't a very common snack. However, this could reveal more sublte patterns in the data, as perhaps people who buy bread are making school lunch sandwiches, and are also putting apples in those lunches too, which would make a much more logical combination. The lift isn't as high as the other rules, although it's still fascinating to see how association rules help us see logical patterns that aren't always obvious.

**Rule 3: Bread -> Butter**
* **Support:** 0.3 - This means that 30% of all transactions contain both Bread and Butter.
* **Confidence:** 0.60 - This means that 60% of the transactions that contain Bread also contain Butter.
* **Lift:** 2.00 - This means that customers are 2.00 times more likely to buy Butter if they buy Bread, compared to if the purchases were independent.
* **Interestingness:** I find this rule interesting since despite the fact that the Butter -> Bread rule has a confidence of 1.00, the Bread -> Butter rule has a much lower confidence (although the lift and support are exaclty the same). This makes sense, as more people might have other uses for bread that don't involve butter (like making sandwiches, and without any butter) but fewer uses for butter but without any bread. This is very interesting since it reveals that when the antecedent and consequent are swapped, it doesn't always mean that they have the same confidence, as it may be harder to predict one item based on another if there are many transactions involving one item but not the other.


5.  **Experiment (Optional but Recommended):**
    * Try changing your `min_support` and `min_threshold` (confidence) values. For example, make `min_support` lower or higher, and `min_threshold` for confidence lower or higher.
    * How does this affect the number and type of rules generated? Briefly describe your observations.

In [64]:
# Your code here for Task 5 (Optional)

# Example: Lowering min_support further and keeping confidence moderate
min_support_v2 = 0.1 # At least 1 transaction

min_confidence_v2 = 0.5

frequent_itemsets_v2 = apriori(df_student, min_support=min_support_v2, use_colnames=True)
rules_student_v2 = association_rules(frequent_itemsets_v2, metric="confidence", min_threshold=min_confidence_v2)
print(f"\n--- Experiment: min_support={min_support_v2}, min_confidence=0.5 ---")
print("Frequent Itemsets:")
print(frequent_itemsets_v2)
pd.set_option('display.max_rows', None)
print("\nAssociation Rules:")
print(rules_student_v2[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

# Example: Using original min_support (0.2) but increasing confidence
min_confidence_v3 = 0.8

min_support_v3 = 0.2

frequent_itemsets_v3 = apriori(df_student, min_support=min_support_v3, use_colnames=True)
rules_v3 = association_rules(frequent_itemsets_v3, metric="confidence", min_threshold=min_confidence_v3)

print(f"\n--- Experiment: min_support={min_support_v3}, min_confidence={min_confidence_v3} ---")
print("Frequent Itemsets (same as Task 2):")
print(frequent_itemsets_v3)
print("\nAssociation Rules:")
print(rules_v3[['antecedents', 'consequents', 'support', 'confidence', 'lift']])


--- Experiment: min_support=0.1, min_confidence=0.5 ---
Frequent Itemsets:
    support                         itemsets
0       0.5                         (Apples)
1       0.4                        (Bananas)
2       0.5                          (Bread)
3       0.3                         (Butter)
4       0.4                         (Cereal)
5       0.1                         (Cheese)
6       0.1                         (Coffee)
7       0.1                        (Cookies)
8       0.1                           (Eggs)
9       0.4                           (Milk)
10      0.3                          (Sugar)
11      0.1                         (Yogurt)
12      0.2                (Bananas, Apples)
13      0.4                  (Bread, Apples)
14      0.2                 (Apples, Butter)
15      0.1                 (Cereal, Apples)
16      0.1                 (Cheese, Apples)
17      0.1                   (Eggs, Apples)
18      0.1                   (Apples, Milk)
19      0.1             

*(Double-click here to edit and write your observations for Task 5)*

**Student's Observations for Task 5:**

* **Lowering `min_support` (e.g., to 0.1 from 0.2) while keeping `min_confidence` constant (e.g., at 0.5):**
Lowering min_support while keeping min_confidence constant leads to a very high amount of items being included in the frequent itemsets list (total of 56). This also leads to a very high number of association rules (106), as the somewhat moderate confidence and very low min_support means there are many different rules that have moderate confidence and are included in the list. The number of items in both the antecedent and the consequent varies widely, and the lift can get quite high (all the way up to 10.0), and never dips below 1.0 (so no negative correlation). Also, in general there are a high number of single-item consequents. Overall though, there are far too many rules and frequent itemsets to determine useful patterns from the data, making a min_support value of 0.1 and a min_confidence value of 0.5 not very useful for analysis.
    
* **Increasing `min_confidence` (e.g., to 0.8 from 0.5) while keeping `min_support` constant (e.g., at 0.2):**
Increasing min_confidence to 0.8 while keeping min_support at 0.2 leads to a somewhat large number of itemsets in frequent_itemsets (20 in total). However, the high min_confidence value leads to only 9 association rules being considered. All of the consequents have only 1 item, and the antecedents mostly have 2 items (except for the top 3). The lift can get somewhat high for these rules, ranging from 1.60 to 3.33. The slightly larger min_support and higher min_confidence is much more useful for determining patterns from the data, as there are much fewer rules to consider and all of them seem to make snese logically (ex: it makes sense that buying butter would make one more likely to buy bread). 
    

## 6. Conclusion

In this lab, we explored association rule mining using the Apriori algorithm. We learned how to:
* Prepare data for association rule mining.
* Use the `mlxtend` library to find frequent itemsets and generate rules.
* Interpret key metrics like support, confidence, and lift.
* Understand how parameters like `min_support` and `min_threshold` (for confidence) influence the outcome.

Association rule mining is a powerful tool for uncovering hidden patterns in transactional data, with applications in retail, e-commerce, healthcare, and more.