# ASSOCIATION RULES

The Objective of this assignment is to introduce students to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.
### Dataset:
Use the Online retail dataset to apply the association rules.

### Data Preprocessing:
Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format. 

### Association Rule Mining:
•	Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.
•	 Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.
•	Set appropriate threshold for support, confidence and lift to extract meaning full rules.

### Analysis and Interpretation:
•	Analyse the generated rules to identify interesting patterns and relationships between the products.
•	Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.


In [1]:
pip install mlxtend


Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install pandas mlxtend openpyxl


Note: you may need to restart the kernel to use updated packages.


In [None]:
# Importing basics libraries

In [3]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
from mlxtend.preprocessing import TransactionEncoder

### Data Loading: Loads the Excel file using pandas.

In [4]:
# Load the dataset
file_path = 'Online retail.xlsx'  # Update the file path if needed
data = pd.read_excel('Online retail.xlsx')

### Data Preprocessing:

In [5]:
data.head()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt


In [6]:
data.tail()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"
7499,"eggs,frozen smoothie,yogurt cake,low fat yogurt"


In [7]:
data.shape

(7500, 1)

In [8]:
data.describe()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
count,7500
unique,5175
top,cookies
freq,223


In [9]:
data.sample()

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
1824,energy bar


### Splits product transactions into lists.
### Removes rows with missing values and duplicates.

In [16]:
# Drop any rows with missing values
data.dropna(inplace=True)

In [17]:
# Remove duplicate transactions
data = data.drop_duplicates()

### Transaction Encoding: Converts the dataset into a one-hot encoded format suitable for Apriori.

In [19]:
 #Apply TransactionEncoder to transform the dataset
te = TransactionEncoder()
te_ary = te.fit(data).transform(data)
df = pd.DataFrame(te_ary, columns=te.columns_)

### Apriori Algorithm: Finds frequent itemsets with a minimum support of 0.01.

In [20]:
# Apply the Apriori algorithm with a minimum support threshold
frequent_itemsets = apriori(df, min_support=0.01, use_colnames=True)

### Association Rules: Extracts association rules with a confidence threshold of 0.2.

In [21]:
#  Generate the association rules with a minimum confidence threshold
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.2)

In [22]:
# Display the generated rules
print(rules)

                   antecedents      consequents  antecedent support  \
0                    (almonds)  (mineral water)            0.029179   
1                    (avocado)      (chocolate)            0.045797   
2                    (avocado)   (french fries)            0.045797   
3                    (avocado)           (milk)            0.045797   
4                    (avocado)  (mineral water)            0.045797   
..                         ...              ...                 ...   
350        (shrimp, spaghetti)  (mineral water)            0.030338   
351      (mineral water, soup)      (spaghetti)            0.033430   
352          (spaghetti, soup)  (mineral water)            0.020676   
353  (tomatoes, mineral water)      (spaghetti)            0.034589   
354      (tomatoes, spaghetti)  (mineral water)            0.029952   

     consequent support   support  confidence      lift  leverage  conviction  \
0              0.299710  0.010821    0.370861  1.237399  0.002076 

### Association rule mining is a data mining technique used to discover interesting relationships, correlations, or patterns between items in large datasets, typically in transactional databases like market baskets or e-commerce systems.

### The Apriori algorithm is one of the foundational algorithms used in association rule mining. It generates frequent itemsets based on an iterative process:

Pruning Strategy: Apriori is based on the principle that all subsets of a frequent itemset must also be frequent. This means it only considers itemsets whose subsets have already been identified as frequent, significantly reducing the search space.

Support Threshold: Apriori filters itemsets that do not meet the minimum support threshold, ensuring that only the most relevant associations are kept.