# ASSOCIATION RULES

### The Objective of this assignment is to introduce students to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.

### Dataset:

#### Use the Online retail dataset to apply the association rules.

In [1]:
import pandas as pd

# Load the dataset
file_path = 'Online retail.xlsx'
df = pd.read_excel(file_path, sheet_name='Sheet1')

# Split the transactions into lists
df['Transaction'] = df.iloc[:, 0].apply(lambda x: x.split(','))

# Create a basket-like format by one-hot encoding the transactions
all_items = sorted(set(item for sublist in df['Transaction'] for item in sublist))
basket = pd.DataFrame(0, index=range(len(df)), columns=all_items)

for i, transaction in enumerate(df['Transaction']):
    for item in transaction:
        basket.at[i, item] = 1

# Display the first few rows of the basket format
basket.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [2]:
!pip install mlxtend



### Association Rule Mining:

#### • Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

#### •	Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

#### • Set appropriate threshold for support, confidence and lift to extract meaning full rules.

In [3]:
from mlxtend.frequent_patterns import apriori,association_rules
frequent_itemsets = apriori(basket, min_support=0.005,max_len=3,use_colnames = True)
frequent_itemsets 



Unnamed: 0,support,itemsets
0,0.020267,(almonds)
1,0.008800,(antioxydant juice)
2,0.033200,(avocado)
3,0.008667,(bacon)
4,0.010800,(barbecue sauce)
...,...,...
716,0.007467,"(mineral water, spaghetti, soup)"
717,0.009333,"(mineral water, tomatoes, spaghetti)"
718,0.006400,"(mineral water, spaghetti, turkey)"
719,0.006267,"(mineral water, spaghetti, whole wheat rice)"


In [4]:
frequent_itemsets.sort_values('support',ascending = False,inplace=True)
frequent_itemsets.sort_values 

<bound method DataFrame.sort_values of       support                                     itemsets
60   0.238267                              (mineral water)
27   0.179733                                       (eggs)
83   0.174133                                  (spaghetti)
33   0.170933                               (french fries)
20   0.163867                                  (chocolate)
..        ...                                          ...
642  0.005067              (mineral water, tomatoes, eggs)
644  0.005067                 (eggs, spaghetti, olive oil)
670  0.005067     (mineral water, frozen vegetables, soup)
676  0.005067  (mineral water, grated cheese, ground beef)
720  0.005067             (spaghetti, pancakes, olive oil)

[721 rows x 2 columns]>

In [5]:
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
rules.head(20)
rules.sort_values('lift',ascending = False).head(10)  

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
1266,(escalope),(pasta),0.079333,0.015733,0.005867,0.07395,4.700185,0.004618,1.062865,0.855079
1267,(pasta),(escalope),0.015733,0.079333,0.005867,0.372881,4.700185,0.004618,1.46809,0.799826
1785,(pasta),(shrimp),0.015733,0.071333,0.005067,0.322034,4.514494,0.003944,1.369783,0.790935
1784,(shrimp),(pasta),0.071333,0.015733,0.005067,0.071028,4.514494,0.003944,1.059522,0.838289
662,(whole wheat pasta),(olive oil),0.029467,0.065733,0.008,0.271493,4.130221,0.006063,1.282441,0.780893
663,(olive oil),(whole wheat pasta),0.065733,0.029467,0.008,0.121704,4.130221,0.006063,1.105018,0.811205
1118,"(spaghetti, herb & pepper)",(ground beef),0.016267,0.098267,0.0064,0.393443,4.003826,0.004802,1.486641,0.762645
1119,(ground beef),"(spaghetti, herb & pepper)",0.098267,0.016267,0.0064,0.065129,4.003826,0.004802,1.052266,0.831996
952,"(mineral water, herb & pepper)",(ground beef),0.017067,0.098267,0.006667,0.390625,3.975153,0.00499,1.479768,0.761432
957,(ground beef),"(mineral water, herb & pepper)",0.098267,0.017067,0.006667,0.067843,3.975153,0.00499,1.054471,0.829999


### Analysis and Interpretation:

#### •	Analyse the generated rules to identify interesting patterns and relationships between the products.

#### •	Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.

### 1. High Lift Values:

#### The rules with high lift values, such as {pasta} -> {escalope} (4.7) and {shrimp} -> {pasta} (4.5), indicate a strong association between these product pairs. The lift value above 1 suggests that the occurrence of the antecedent (e.g., pasta) significantly increases the likelihood of the consequent (e.g., escalope) being purchased together.

### 2. Confidence and Support:

#### Low Confidence: Despite high lift values, some rules like {pasta} -> {escalope} have a low confidence (0.37). This suggests that while these products are likely to be bought together, pasta is purchased more frequently without escalope, limiting the confidence.
#### Moderate Support: The support values are generally low (around 0.005 to 0.009). This suggests that these product pairs are not bought together in a large percentage of transactions, but when they do appear, the association is strong.

### 3. Specific Product Relationships:

#### Ground Beef: The rules involving ground beef indicate a strong association with both {spaghetti, herb & pepper} and {herb & pepper, mineral water}. The lift values (~4.0) indicate that these products are often bought together. For instance, customers who purchase ground beef are likely to also purchase spaghetti, herb & pepper, which suggests a possible recipe combination.
#### Pasta and Escalope: The rule {pasta} -> {escalope} shows that customers buying pasta might also be preparing meals that involve escalope, hence the strong lift despite the low confidence.

### 4. Multiple Product Combinations:

#### Herb & Pepper, Mineral Water, Ground Beef: The rule {ground beef} -> {herb & pepper, mineral water} with a lift of 3.98 suggests that these products are often bought together, perhaps as ingredients for a particular type of dish. The confidence is low, but the lift indicates a strong relationship when they do co-occur.

## Interpretation and Insights:

### 1. Meal Preparation Patterns:

#### The rules suggest that certain meal preparation patterns are common among customers. For example, ground beef, when bought, is likely paired with ingredients like herb & pepper, mineral water, and spaghetti. This indicates that customers might be buying ingredients for specific recipes, such as pasta dishes.

### 2. Cross-Selling Opportunities:

#### Products like pasta and escalope or ground beef and herb & pepper can be marketed together. Retailers could create combo offers or place these products near each other in stores to encourage joint purchases.

### 3. Targeted Promotions:

#### Given the strong lift values, promotions targeting customers who purchase one product with a discount or recommendation for the associated product could drive additional sales. For instance, offering a discount on herb & pepper when ground beef is purchased might be an effective strategy.

### 4. Niche Market Segments:

#### The relatively low support but high lift and confidence values suggest that while these product combinations might not be universally popular, they are significant within certain customer segments. Identifying and targeting these niche segments with tailored promotions could be profitable.