# **Association Rules**

The Objective of this assignment is to introduce students to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.

**Data Preprocessing:**

Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format.  

**Association Rule Mining:**

Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

 Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

Set appropriate threshold for support, confidence and lift to extract meaning full rules.

**Analysis and Interpretation:**

Analyse the generated rules to identify interesting patterns and relationships between the products.

Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.

**Task 1: Data Preprocessing:**

Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format.

In [2]:
!pip install apyori



In [16]:
# Step 1: Importing necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from apyori import apriori
# from mlxtend.preprocessing import TransactionEncoder
# from mlxtend.frequent_patterns import apriori, association_rules

# Step 2: Load the dataset
df = pd.read_csv('/content/Online retail.csv', header=None)
df


  and should_run_async(code)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,butter,light mayo,fresh bread,,,,,,,,,,,,,,,,,
7497,burgers,frozen vegetables,eggs,french fries,magazines,green tea,,,,,,,,,,,,,,
7498,chicken,,,,,,,,,,,,,,,,,,,
7499,escalope,green tea,,,,,,,,,,,,,,,,,,


In [17]:
num_records = len(df)
print(num_records)

transactions = []
for i in range(0, num_records):
  transactions.append([str(df.values[i,j]) for j in range(0, 20)])

  and should_run_async(code)


7501


In [18]:
df.head()

  and should_run_async(code)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19
0,shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
1,burgers,meatballs,eggs,,,,,,,,,,,,,,,,,
2,chutney,,,,,,,,,,,,,,,,,,,
3,turkey,avocado,,,,,,,,,,,,,,,,,,
4,mineral water,milk,energy bar,whole wheat rice,green tea,,,,,,,,,,,,,,,


**Task 2: Association Rule Mining:**

Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

 Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

In [60]:
association_rules = apriori(transactions, min_support=0.003, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)

  and should_run_async(code)


In [20]:
association_results

  and should_run_async(code)


[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'fromage blanc', 'honey'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

 Set appropriate threshold for support, confidence and lift to extract meaning full rules.

In [58]:
def inspect(association_results):
  lhs = [tuple(result[2][0][0])[0] for result in association_results]
  rhs = [tuple(result[2][0][1])[0] for result in association_results]
  supports = [result[1] for result in association_results]
  confidences = [result[2][0][2] for result in association_results]
  lifts = [result[2][0][3] for result in association_results]
  return list(zip(lhs, rhs, supports, confidences, lifts))

  # Putting the resukts well organised  into a pandas DataFrame
resultsinDataFrame = pd.DataFrame(inspect(association_results), columns=['Left hand side','Right hand side','Support','Confidence','Lift'])

  and should_run_async(code)


In [55]:
print(resultsinDataFrame)

           Left hand side Right hand side   Support  Confidence      Lift
0             light cream         chicken  0.004533    0.290598  4.843951
1    mushroom cream sauce        escalope  0.005733    0.300699  3.790833
2                   pasta        escalope  0.005866    0.372881  4.700812
3           fromage blanc           honey  0.003333    0.245098  5.164271
4           herb & pepper     ground beef  0.015998    0.323450  3.291994
..                    ...             ...       ...         ...       ...
155             olive oil   mineral water  0.003066    0.216981  3.632981
156              pancakes   mineral water  0.003066    0.211009  3.532991
157              tomatoes   mineral water  0.003066    0.261364  4.376091
158         mineral water       olive oil  0.003333    0.211864  3.223519
159              tomatoes   mineral water  0.003333    0.238095  3.986501

[160 rows x 5 columns]


  and should_run_async(code)


In [57]:
# Displaying the results sorted in descending order
resultsinDataFrame.nlargest(n=10, columns='Lift')

  and should_run_async(code)


Unnamed: 0,Left hand side,Right hand side,Support,Confidence,Lift
97,frozen vegetables,mineral water,0.003066,0.383333,7.987176
150,frozen vegetables,mineral water,0.003066,0.383333,7.987176
96,frozen vegetables,mineral water,0.003333,0.294118,6.128268
149,frozen vegetables,mineral water,0.003333,0.294118,6.128268
132,mineral water,olive oil,0.003866,0.402778,6.128268
59,mineral water,olive oil,0.003866,0.402778,6.115863
50,tomato sauce,spaghetti,0.003066,0.216981,5.535971
122,tomato sauce,ground beef,0.003066,0.216981,5.535971
28,fromage blanc,honey,0.003333,0.245098,5.178818
3,fromage blanc,honey,0.003333,0.245098,5.164271


In [59]:
resultsinDataFrame.nlargest(n=10, columns='Confidence')

  and should_run_async(code)


Unnamed: 0,Left hand side,Right hand side,Support,Confidence,Lift
14,cereals,spaghetti,0.003066,0.676471,3.885303
70,cereals,spaghetti,0.003066,0.676471,3.885303
63,olive oil,spaghetti,0.004399,0.611111,3.509912
134,olive oil,spaghetti,0.004399,0.611111,3.509912
22,cooking oil,spaghetti,0.004799,0.571429,3.281995
85,cooking oil,spaghetti,0.004799,0.571429,3.281995
76,frozen vegetables,spaghetti,0.003066,0.534884,3.0721
93,frozen vegetables,spaghetti,0.003066,0.534884,3.0721
138,frozen vegetables,spaghetti,0.003066,0.534884,3.0721
147,frozen vegetables,spaghetti,0.003066,0.534884,3.0721


**Task 3: Analysis and Interpretation**

 Analyse the generated rules to identify interesting patterns and relationships between the products.

 Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.

**Insights:**

1) Rules with high confidence: Might indicate that if a customer buys cereals, they are very likely to also buy spaghetti.

2) Rules with high lift: Suggest that frozen vegetables and mineral water are bought together more often than would be expected by chance.