### **Lab: Introduction to Association Rules Using Python and Pandas**

#### **Objective**:
This lab will guide you through the process of generating and analyzing **association rules** using a dataset. Association rules are used to discover relationships between items in large datasets, like in market basket analysis. You will learn to compute frequent itemsets and generate rules based on **support** and **confidence**.

---

### **Dataset**:
We'll create a simple market basket dataset where each transaction contains a set of products bought together. The dataset will be used to compute itemsets and extract association rules.

---

### **Steps to Follow:**

#### **Step 1: Install Required Libraries**

We need to install and import the `mlxtend` library, which helps with frequent itemset generation and rule mining.

In [1]:
!pip install mlxtend



In [2]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

#### **Step 2: Create a Market Basket Dataset**

In this lab, we simulate a dataset representing transactions where each transaction contains a combination of items.



In [3]:
# Create a simple dataset of transactions
data = {'Milk': [1, 1, 0, 1, 0, 0],
        'Bread': [1, 0, 0, 1, 1, 1],
        'Butter': [0, 1, 1, 0, 1, 1],
        'Cheese': [0, 1, 1, 0, 1, 1],
        'Apples': [1, 0, 0, 1, 1, 0],
        'Bananas': [1, 1, 0, 0, 0, 0]}

# Convert the dictionary into a Pandas DataFrame
basket = pd.DataFrame(data)

basket


  and should_run_async(code)


Unnamed: 0,Milk,Bread,Butter,Cheese,Apples,Bananas
0,1,1,0,0,1,1
1,1,0,1,1,0,1
2,0,0,1,1,0,0
3,1,1,0,0,1,0
4,0,1,1,1,1,0
5,0,1,1,1,0,0


Each row represents a transaction where `1` means the product was purchased and `0` means it wasn't.

#### **Step 3: Data Preprocessing**

Before applying the Apriori algorithm, ensure the data is in the appropriate format. Each column should represent an item, and each row a transaction.



In [4]:
# Ensure that all values are binary (0 or 1)
basket = basket.applymap(lambda x: 1 if x > 0 else 0)


  and should_run_async(code)
  basket = basket.applymap(lambda x: 1 if x > 0 else 0)


#### **Step 4: Generate Frequent Itemsets Using Apriori Algorithm**

The **Apriori algorithm** is used to generate frequent itemsets based on a minimum support threshold.

- **Support**: The fraction of transactions that contain a specific itemset.

We will generate itemsets that occur in at least **50%** of transactions (i.e., a minimum support of 0.5).

In [6]:
# Generate frequent itemsets with minimum support of 50%
frequent_itemsets = apriori(basket, min_support=0.5, use_colnames=True)

print("Frequent Itemsets:")
frequent_itemsets


Frequent Itemsets:


  and should_run_async(code)


Unnamed: 0,support,itemsets
0,0.5,(Milk)
1,0.666667,(Bread)
2,0.666667,(Butter)
3,0.666667,(Cheese)
4,0.5,(Apples)
5,0.5,"(Apples, Bread)"
6,0.666667,"(Cheese, Butter)"


#### **Step 5: Generate Association Rules**

Once we have frequent itemsets, we can generate association rules. Association rules are derived based on the **support**, **confidence**, and **lift**.

- **Confidence**: The likelihood of purchasing item B given that item A was purchased.
- **Lift**: The increase in the likelihood of purchasing item B when A is purchased compared to if they were independent.

We’ll set a minimum confidence of 70%.


In [8]:
# Generate association rules with a minimum confidence of 70%
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

print("Association Rules:")
rules


Association Rules:


  and should_run_async(code)


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Apples),(Bread),0.5,0.666667,0.5,1.0,1.5,0.166667,inf,0.666667
1,(Bread),(Apples),0.666667,0.5,0.5,0.75,1.5,0.166667,2.0,1.0
2,(Cheese),(Butter),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1.0
3,(Butter),(Cheese),0.666667,0.666667,0.666667,1.0,1.5,0.222222,inf,1.0


#### **Step 6: Evaluate and Interpret Results**

The generated association rules provide insights into relationships between items. For example:
- **Support**: How frequently a rule appears in the dataset.
- **Confidence**: How likely the consequent (right-hand side) item is purchased given the antecedent (left-hand side) item is purchased.
- **Lift**: A value greater than 1 means the antecedent and consequent occur more frequently together than expected by chance.

#### **Example Output**:

The rules might look something like this:

| antecedents | consequents | support | confidence | lift |
|-------------|-------------|---------|------------|------|
| {Bread}     | {Milk}      | 0.50    | 0.75       | 1.25 |
| {Cheese}    | {Butter}    | 0.67    | 0.80       | 1.5  |

You can interpret these results to understand which products are likely to be bought together.



### **Assignment: Association Rule Mining Using the Groceries Dataset**

#### **Objective**:
In this assignment, you will apply association rule mining techniques to the **Groceries dataset** to discover relationships between products bought together. Using the Apriori algorithm, you will compute frequent itemsets and extract association rules based on **support** and **confidence** metrics.

---

### **Dataset**:
The dataset can be accessed via the following link:

[Groceries Dataset CSV](https://raw.githubusercontent.com/9meo/bas240/refs/heads/main/LAB8/Groceries_dataset.csv)

The dataset contains transactional data of groceries purchased by customers. Each transaction lists the products purchased by a customer.

---

### **Assignment Steps**:

#### **Step 1: Load the Dataset**

Download and load the **Groceries dataset** into a Pandas DataFrame:

```python
import pandas as pd

# Load the Groceries dataset
url = "https://raw.githubusercontent.com/9meo/bas240/refs/heads/main/LAB8/Groceries_dataset.csv"
groceries = pd.read_csv(url)

# Show the first few rows of the dataset
print(groceries.head())
```

- The dataset contains two main columns:
  - **Member_number**: Identifies each unique customer.
  - **itemDescription**: Describes the products purchased in each transaction.

#### **Step 2: Data Preprocessing**

Before applying the Apriori algorithm, we need to transform the dataset into the right format. The data needs to be represented as **one-hot encoded** transactions, where each column represents a product, and each row represents a transaction (1 if the product was purchased, 0 if not).

Steps:
- Group products by transaction (i.e., `Member_number`).
- Convert the data into a **basket format** (one-hot encoded) where rows represent transactions, and columns represent products.

```python
# Group transactions by member and item
basket = groceries.groupby(['Member_number', 'itemDescription'])['itemDescription'].count().unstack().reset_index().fillna(0).set_index('Member_number')

# Convert to binary (1 if purchased, 0 otherwise)
basket = basket.applymap(lambda x: 1 if x > 0 else 0)

# Show the first few rows of the basket
basket.head()
```
### **Assignment Tasks**:

1. **Load the Dataset**:
   - Download and load the dataset into a Pandas DataFrame.
   
2. **Preprocess the Data**:
   - Group products by transaction (i.e., `Member_number`) and convert it into a **basket format** where rows represent transactions and columns represent products.

3. **Generate Frequent Itemsets**:
   - Use the **Apriori algorithm** to identify frequent itemsets. Choose a minimum support threshold and justify your choice.

4. **Generate Association Rules**:
   - Generate association rules based on **support**, **confidence**, and **lift**. Choose a minimum confidence threshold and justify your choice.

5. **Analyze the Results**:
   - Interpret the association rules generated and discuss how these rules can help a grocery store understand customer purchasing patterns.
   - Highlight a few important rules and explain their significance.

6. **Visualize** (Optional):
   - Visualize the relationship between support, confidence, and lift using a scatter plot.


Optionally, visualize the association rules by plotting **support** vs **confidence** or other metrics.

```python
import matplotlib.pyplot as plt
import seaborn as sns

# Plot the support vs confidence for rules
plt.figure(figsize=(10, 6))
sns.scatterplot(x='support', y='confidence', size='lift', data=rules, legend=False, sizes=(100, 1000))
plt.title('Support vs Confidence of Association Rules')
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.show()
```
