# Apriori and Association Rules
**Author:** Magudeshwaran and Senthilkumaran

**Goal:** Find frequent itemsets and association rules in a sample dataset using the Apriori algorithm.

### Step 1: Import Libraries
We need `pandas` to create our DataFrame and `mlxtend` for the Apriori algorithm.

In [None]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

### Step 2: Create the Dataset
We will create a small, sample dataset of transactions. Each row represents a transaction, and each column represents an item. A `1` means the item was in the transaction, and a `0` means it was not.

In [None]:
data = {
    'apple': [1, 0, 1, 1, 0],
    'banana': [1, 1, 1, 0, 1],
    'carrot': [0, 1, 0, 1, 1],
    'spinach': [0, 0, 1, 1, 1],
    'tomato': [1, 1, 0, 0, 1]
}
df = pd.DataFrame(data)

### Step 3: Convert Data to Boolean Type
The Apriori function works best with boolean (`True`/`False`) values. We will convert our `1`s and `0`s to booleans to ensure the code runs efficiently and without warnings.

In [None]:
df_bool = df.astype(bool)
df_bool.head()

### Step 4: Find Frequent Itemsets
Now, we run the **Apriori algorithm** to find itemsets that appear frequently in our data. We set `min_support=0.4`, which means we are only interested in itemsets that appear in at least 40% of the transactions.

In [None]:
frequent_itemsets = apriori(df_bool, min_support=0.4, use_colnames=True)
print("Frequent Itemsets:")
print(frequent_itemsets)

### Step 5: Generate Association Rules
From the frequent itemsets, we can generate **association rules**. We will look for rules that have a **confidence** of at least 0.7. Confidence is a measure of how often the items in the consequent appear in transactions that contain the antecedent.

In [None]:
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

### Step 6: View the Results
Let's look at the generated rules. We will display the most important columns:
- **antecedents:** The item(s) on the left side of the rule (IF this...).
- **consequents:** The item(s) on the right side of the rule (...THEN this).
- **support:** How frequently the itemset appears in the data.
- **confidence:** The reliability of the rule.
- **lift:** How much more likely the consequent is purchased when the antecedent is purchased.

In [None]:
print("Association Rules:")
rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]