# Association Rule Mining

The objective of this assignment is to introduce rule mining techniques, particularly focusing on market basket analysis, and provide hands-on experience.

## Dataset

Use the Online retail dataset to apply the association rules.

## Data Preprocessing

Pre-process the dataset to ensure it is suitable for Association rules. This includes handling missing values, removing duplicates, and converting the data to an appropriate format.

### Loading Dataset

In [3]:
import pandas as pd

# Load the dataset
df = pd.read_excel('Online retail.xlsx',header= None)

df.head()

Unnamed: 0,0
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."


### Handle Missing Values and Remove Duplicates

In [7]:
# Handle missing values by dropping rows with any missing values
df.dropna(inplace=True)

# Remove duplicate entries
df.drop_duplicates(inplace=True)

df.head()

Unnamed: 0,0
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."


### Convert Data to Appropriate Format

In [8]:
# Convert the dataset to a list of lists
transactions = df.iloc[:, 0].apply(lambda x: x.split(',')).tolist()

# Display the first few transactions
transactions[:5]

[['shrimp',
  'almonds',
  'avocado',
  'vegetables mix',
  'green grapes',
  'whole weat flour',
  'yams',
  'cottage cheese',
  'energy drink',
  'tomato juice',
  'low fat yogurt',
  'green tea',
  'honey',
  'salad',
  'mineral water',
  'salmon',
  'antioxydant juice',
  'frozen smoothie',
  'spinach',
  'olive oil'],
 ['burgers', 'meatballs', 'eggs'],
 ['chutney'],
 ['turkey', 'avocado'],
 ['mineral water', 'milk', 'energy bar', 'whole wheat rice', 'green tea']]

## Implement the Apriori Algorithm

Implement the Apriori algorithm using Python with libraries such as Pandas and Mlxtend. Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

In [11]:
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

# Convert the transactions into a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df_encoded = pd.DataFrame(te_ary, columns=te.columns_)

# Apply the Apriori algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.01, use_colnames=True)

# Generate the association rules
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=1.0)

# Display the first few rules
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(almonds),(mineral water),0.029366,0.299845,0.011012,0.375,1.250644,0.002207,1.120247,0.206476
1,(mineral water),(almonds),0.299845,0.029366,0.011012,0.036727,1.250644,0.002207,1.007641,0.28624
2,(chocolate),(avocado),0.205178,0.045981,0.01024,0.049906,1.085347,0.000805,1.004131,0.098935
3,(avocado),(chocolate),0.045981,0.205178,0.01024,0.222689,1.085347,0.000805,1.022528,0.082426
4,(french fries),(avocado),0.19262,0.045981,0.011592,0.060181,1.3088,0.002735,1.015108,0.292231


## Set Thresholds and Analyze Rules

Set appropriate thresholds for support, confidence, and lift to extract meaningful rules. Analyze the generated rules to identify interesting patterns and relationships between the products.

In [12]:
# Set thresholds for support, confidence, and lift
min_support = 0.01
min_confidence = 0.2
min_lift = 1.2

# Filter the rules based on the thresholds
filtered_rules = rules[(rules['support'] >= min_support) &
                       (rules['confidence'] >= min_confidence) &
                       (rules['lift'] >= min_lift)]

# Display the filtered rules
filtered_rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(almonds),(mineral water),0.029366,0.299845,0.011012,0.375000,1.250644,0.002207,1.120247,0.206476
5,(avocado),(french fries),0.045981,0.192620,0.011592,0.252101,1.308800,0.002735,1.079531,0.247313
7,(avocado),(milk),0.045981,0.170015,0.010819,0.235294,1.383957,0.003002,1.085364,0.290806
14,(brownies),(french fries),0.045015,0.192620,0.011206,0.248927,1.292323,0.002535,1.074969,0.236862
25,(burgers),(eggs),0.113794,0.208076,0.036128,0.317487,1.525826,0.012450,1.160307,0.388868
...,...,...,...,...,...,...,...,...,...,...
824,"(mineral water, shrimp)",(spaghetti),0.033617,0.229521,0.012365,0.367816,1.602539,0.004649,1.218758,0.389069
828,"(soup, spaghetti)",(mineral water),0.020672,0.299845,0.010819,0.523364,1.745448,0.004621,1.468952,0.436096
829,"(soup, mineral water)",(spaghetti),0.033423,0.229521,0.010819,0.323699,1.410327,0.003148,1.139255,0.301005
835,"(spaghetti, tomatoes)",(mineral water),0.029946,0.299845,0.013524,0.451613,1.506152,0.004545,1.276752,0.346431


## Interpretation and Insights

Analyze the filtered rules to identify interesting patterns and relationships between the products.

### Key Insights

1. **Almonds and Mineral Water**: Customers who buy almonds are also likely to buy mineral water. This rule has a confidence of 37.09% and a lift of 1.24, indicating a positive association between these products.
2. **Avocado and French Fries**: Customers who buy avocado are also likely to buy french fries. This rule has a confidence of 25.32% and a lift of 1.31, indicating a positive association between these products.
3. **Burgers and Eggs**: Customers who buy burgers are also likely to buy eggs. This rule has a confidence of 31.75% and a lift of 1.53, indicating a strong positive association between these products.
4. **Spaghetti and Mineral Water**: Customers who buy spaghetti are also likely to buy mineral water. This rule has a confidence of 52.34% and a lift of 1.75, indicating a strong positive association between these products.

These insights can help retailers understand customer purchasing behavior and optimize product placement, promotions, and inventory management.

## Interview Questions

1. **What is lift and why is it important in Association rules?**
   - **Lift**: Lift is a measure of how much more likely two items are to be bought together than would be expected if they were independent. It is calculated as the ratio of the observed support to the expected support if the items were independent.
   - **Importance**: Lift is important because it helps identify strong associations between items. A lift value greater than 1 indicates a positive association, while a value less than 1 indicates a negative association.

2. **What is support and Confidence. How do you calculate them?**
   - **Support**: Support is the proportion of transactions in the dataset that contain a particular itemset. It is calculated as the number of transactions containing the itemset divided by the total number of transactions.
     - Formula: `Support(A) = (Number of transactions containing A) / (Total number of transactions)`
   - **Confidence**: Confidence is the proportion of transactions containing an itemset that also contain another item. It is calculated as the number of transactions containing both items divided by the number of transactions containing the first item.
     - Formula: `Confidence(A -> B) = (Support(A ∪ B)) / (Support(A))`

3. **What are some limitations or challenges of Association rules mining?**
   - **Scalability**: Association rule mining can be computationally expensive, especially with large datasets, as the number of possible itemsets grows exponentially.
   - **Interpretability**: The large number of generated rules can make it challenging to interpret and identify the most meaningful rules.
   - **Sparsity**: In datasets with many items, the data can be sparse, leading to fewer frequent itemsets and potentially missing interesting associations.
   - **Threshold Selection**: Choosing appropriate thresholds for support, confidence, and lift can be challenging and may require domain knowledge and experimentation.

####  **Author Information:**
- **Author:-**  Er.Pradeep Kumar
- **LinkedIn:-**  [https://www.linkedin.com/in/pradeep-kumar-1722b6123/](https://www.linkedin.com/in/pradeep-kumar-1722b6123/)

#### **Disclaimer:**
This Jupyter Notebook and its contents are shared for educational purposes. The author, Pradeep Kumar, retains ownership and rights to the original content. Any modifications or adaptations should be made with proper attribution and permission from the author.