# $$ASSOCIATION \ \ RULES$$
The Objective of this assignment is to introduce students to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.

In [1]:
#Import Libraries
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import association_rules,apriori
import warnings
warnings.filterwarnings('ignore')

# 1: Dataset:
Use the Online retail dataset to apply the association rules


In [2]:
df = pd.read_excel('Online retail.xlsx', header=None)
df.head()

Unnamed: 0,0
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."


In [4]:
df.describe()

Unnamed: 0,0
count,7501
unique,5176
top,cookies
freq,223


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   0       7501 non-null   object
dtypes: object(1)
memory usage: 58.7+ KB


In [8]:
df.columns = ['Products']

In [7]:
df.head()

Unnamed: 0,Products
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."


In [9]:
df.index.name = 'ID'
df

Unnamed: 0_level_0,Products
ID,Unnamed: 1_level_1
0,"shrimp,almonds,avocado,vegetables mix,green gr..."
1,"burgers,meatballs,eggs"
2,chutney
3,"turkey,avocado"
4,"mineral water,milk,energy bar,whole wheat rice..."
...,...
7496,"butter,light mayo,fresh bread"
7497,"burgers,frozen vegetables,eggs,french fries,ma..."
7498,chicken
7499,"escalope,green tea"


In [10]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7501 entries, 0 to 7500
Data columns (total 1 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   Products  7501 non-null   object
dtypes: object(1)
memory usage: 58.7+ KB


In [11]:
df.isnull().sum()

Products    0
dtype: int64

# 2: Data Preprocessing:
Pre-process the dataset to ensure it is suitable for Association rules, this may include handling missing values, removing duplicates, and converting the data to appropriate format.  

In [12]:
df.dtypes

Products    object
dtype: object

In [13]:
df.iloc[[0]]

Unnamed: 0_level_0,Products
ID,Unnamed: 1_level_1
0,"shrimp,almonds,avocado,vegetables mix,green gr..."


In [14]:
def txt_split(txt):
    return txt.split(',')

In [15]:
data = df['Products'].apply(txt_split)
data

ID
0       [shrimp, almonds, avocado, vegetables mix, gre...
1                              [burgers, meatballs, eggs]
2                                               [chutney]
3                                       [turkey, avocado]
4       [mineral water, milk, energy bar, whole wheat ...
                              ...                        
7496                    [butter, light mayo, fresh bread]
7497    [burgers, frozen vegetables, eggs, french frie...
7498                                            [chicken]
7499                                [escalope, green tea]
7500    [eggs, frozen smoothie, yogurt cake, low fat y...
Name: Products, Length: 7501, dtype: object

In [16]:
from mlxtend.preprocessing import TransactionEncoder

In [17]:
te = TransactionEncoder()
encoded_df = te.fit_transform(data)
encoded_df

array([[False,  True,  True, ...,  True, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       ...,
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False, False, False],
       [False, False, False, ..., False,  True, False]])

In [18]:
te.columns_

[' asparagus',
 'almonds',
 'antioxydant juice',
 'asparagus',
 'avocado',
 'babies food',
 'bacon',
 'barbecue sauce',
 'black tea',
 'blueberries',
 'body spray',
 'bramble',
 'brownies',
 'bug spray',
 'burger sauce',
 'burgers',
 'butter',
 'cake',
 'candy bars',
 'carrots',
 'cauliflower',
 'cereals',
 'champagne',
 'chicken',
 'chili',
 'chocolate',
 'chocolate bread',
 'chutney',
 'cider',
 'clothes accessories',
 'cookies',
 'cooking oil',
 'corn',
 'cottage cheese',
 'cream',
 'dessert wine',
 'eggplant',
 'eggs',
 'energy bar',
 'energy drink',
 'escalope',
 'extra dark chocolate',
 'flax seed',
 'french fries',
 'french wine',
 'fresh bread',
 'fresh tuna',
 'fromage blanc',
 'frozen smoothie',
 'frozen vegetables',
 'gluten free bar',
 'grated cheese',
 'green beans',
 'green grapes',
 'green tea',
 'ground beef',
 'gums',
 'ham',
 'hand protein bar',
 'herb & pepper',
 'honey',
 'hot dogs',
 'ketchup',
 'light cream',
 'light mayo',
 'low fat yogurt',
 'magazines',
 'mashe

In [19]:
data = pd.DataFrame(encoded_df, columns=te.columns_)
data.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,False,True,True,False,True,False,False,False,False,False,...,False,True,False,False,True,False,False,True,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,True,False,False,False,False,False,...,True,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False


In [20]:
data.shape

(7501, 120)

In [21]:
data.replace([True,False],[1,0], inplace=True)
data.head()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,...,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
0,0,1,1,0,1,0,0,0,0,0,...,0,1,0,0,1,0,0,1,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,1,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0


In [22]:
pd.set_option('display.max_columns', None)
data[data.duplicated()]

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,body spray,bramble,brownies,bug spray,burger sauce,burgers,butter,cake,candy bars,carrots,cauliflower,cereals,champagne,chicken,chili,chocolate,chocolate bread,chutney,cider,clothes accessories,cookies,cooking oil,corn,cottage cheese,cream,dessert wine,eggplant,eggs,energy bar,energy drink,escalope,extra dark chocolate,flax seed,french fries,french wine,fresh bread,fresh tuna,fromage blanc,frozen smoothie,frozen vegetables,gluten free bar,grated cheese,green beans,green grapes,green tea,ground beef,gums,ham,hand protein bar,herb & pepper,honey,hot dogs,ketchup,light cream,light mayo,low fat yogurt,magazines,mashed potato,mayonnaise,meatballs,melons,milk,mineral water,mint,mint green tea,muffins,mushroom cream sauce,napkins,nonfat milk,oatmeal,oil,olive oil,pancakes,parmesan cheese,pasta,pepper,pet food,pickles,protein bar,red wine,rice,salad,salmon,salt,sandwich,shallot,shampoo,shrimp,soda,soup,spaghetti,sparkling water,spinach,strawberries,strong cheese,tea,tomato juice,tomato sauce,tomatoes,toothpaste,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
34,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
42,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
64,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
65,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7491,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7492,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7495,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7498,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [40]:
data.describe()

Unnamed: 0,asparagus,almonds,antioxydant juice,asparagus.1,avocado,babies food,bacon,barbecue sauce,black tea,blueberries,body spray,bramble,brownies,bug spray,burger sauce,burgers,butter,cake,candy bars,carrots,cauliflower,cereals,champagne,chicken,chili,chocolate,chocolate bread,chutney,cider,clothes accessories,cookies,cooking oil,corn,cottage cheese,cream,dessert wine,eggplant,eggs,energy bar,energy drink,escalope,extra dark chocolate,flax seed,french fries,french wine,fresh bread,fresh tuna,fromage blanc,frozen smoothie,frozen vegetables,gluten free bar,grated cheese,green beans,green grapes,green tea,ground beef,gums,ham,hand protein bar,herb & pepper,honey,hot dogs,ketchup,light cream,light mayo,low fat yogurt,magazines,mashed potato,mayonnaise,meatballs,melons,milk,mineral water,mint,mint green tea,muffins,mushroom cream sauce,napkins,nonfat milk,oatmeal,oil,olive oil,pancakes,parmesan cheese,pasta,pepper,pet food,pickles,protein bar,red wine,rice,salad,salmon,salt,sandwich,shallot,shampoo,shrimp,soda,soup,spaghetti,sparkling water,spinach,strawberries,strong cheese,tea,tomato juice,tomato sauce,tomatoes,toothpaste,turkey,vegetables mix,water spray,white wine,whole weat flour,whole wheat pasta,whole wheat rice,yams,yogurt cake,zucchini
count,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0,7501.0
mean,0.000133,0.020397,0.008932,0.004666,0.033329,0.004533,0.008666,0.010799,0.014265,0.009199,0.011465,0.001866,0.033729,0.008666,0.005866,0.087188,0.030129,0.081056,0.009732,0.015331,0.004799,0.02573,0.046794,0.059992,0.006133,0.163845,0.004266,0.004133,0.010532,0.008399,0.080389,0.05106,0.004799,0.031862,0.000933,0.004399,0.013198,0.179709,0.027063,0.026663,0.079323,0.011998,0.009065,0.170911,0.02253,0.043061,0.022264,0.013598,0.063325,0.095321,0.006932,0.052393,0.008666,0.009065,0.132116,0.098254,0.013465,0.02653,0.005199,0.04946,0.04746,0.032396,0.004399,0.015598,0.027196,0.076523,0.010932,0.004133,0.006133,0.020931,0.011998,0.129583,0.238368,0.017464,0.005599,0.02413,0.019064,0.000667,0.010399,0.004399,0.023064,0.065858,0.095054,0.019864,0.015731,0.02653,0.006532,0.005999,0.018531,0.02813,0.018797,0.004933,0.042528,0.009199,0.004533,0.007732,0.004933,0.071457,0.006266,0.050527,0.17411,0.006266,0.007066,0.02133,0.007732,0.003866,0.030396,0.014131,0.068391,0.008132,0.062525,0.02573,0.0004,0.016531,0.009332,0.029463,0.058526,0.011465,0.02733,0.009465
std,0.011546,0.141364,0.094093,0.068153,0.179506,0.067177,0.092691,0.10336,0.118588,0.095474,0.106467,0.043165,0.180542,0.092691,0.076369,0.28213,0.170954,0.272939,0.098176,0.122875,0.069116,0.158339,0.211211,0.237488,0.078075,0.370159,0.06518,0.064158,0.10209,0.091266,0.271913,0.220135,0.069116,0.175645,0.030536,0.066186,0.114131,0.383971,0.162278,0.161108,0.27026,0.108885,0.094786,0.376456,0.14841,0.203008,0.14755,0.115823,0.243563,0.293677,0.082978,0.222833,0.092691,0.094786,0.338639,0.297677,0.115262,0.160715,0.071923,0.216841,0.212636,0.17706,0.066186,0.123922,0.162666,0.265851,0.103989,0.064158,0.078075,0.143161,0.108885,0.335866,0.426114,0.131002,0.074623,0.153463,0.13676,0.025811,0.101449,0.066186,0.150116,0.24805,0.293309,0.139542,0.124442,0.160715,0.080565,0.077227,0.13487,0.165354,0.135818,0.070064,0.201803,0.095474,0.067177,0.087599,0.070064,0.257604,0.078914,0.219044,0.379229,0.078914,0.083766,0.144493,0.087599,0.062062,0.171686,0.118041,0.252432,0.089818,0.242123,0.158339,0.019996,0.127515,0.096157,0.169111,0.23475,0.106467,0.163053,0.096835
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


* #### We will remove items whose sell is very less (< 2%) and items whose sell is very high (>80%)

In [52]:
series = data.sum()

In [54]:
series.sort_values() 

 asparagus          1
water spray         3
napkins             5
cream               7
bramble            14
                 ... 
chocolate        1229
french fries     1282
spaghetti        1306
eggs             1348
mineral water    1788
Length: 120, dtype: int64

In [57]:
7500*0.05

375.0

In [58]:
series[(series >= 7500*0.02) & (series <= 7500*0.80)]

burgers               654
cake                  608
chicken               450
chocolate            1229
cookies               603
cooking oil           383
eggs                 1348
escalope              595
french fries         1282
frozen smoothie       475
frozen vegetables     715
grated cheese         393
green tea             991
ground beef           737
low fat yogurt        574
milk                  972
olive oil             494
pancakes              713
shrimp                536
soup                  379
spaghetti            1306
tomatoes              513
turkey                469
whole wheat rice      439
dtype: int64

In [103]:
new_data = data[series[(series >= 7500*0.02) & (series <= 7500*0.80)].keys().tolist()]

In [104]:
new_data

Unnamed: 0,almonds,avocado,brownies,burgers,butter,cake,cereals,champagne,chicken,chocolate,cookies,cooking oil,cottage cheese,eggs,energy bar,energy drink,escalope,french fries,french wine,fresh bread,fresh tuna,frozen smoothie,frozen vegetables,grated cheese,green tea,ground beef,ham,herb & pepper,honey,hot dogs,light mayo,low fat yogurt,meatballs,milk,mineral water,muffins,oil,olive oil,pancakes,pepper,red wine,salmon,shrimp,soup,spaghetti,strawberries,tomato juice,tomatoes,turkey,vegetables mix,whole wheat pasta,whole wheat rice,yogurt cake
0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,1,1,0,0,0,1,0,0,1,0,0,0
1,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7496,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7497,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7498,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7499,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


# 3: Association Rule Mining:

* #### Implement an Apriori algorithm using tool like python with libraries such as Pandas and Mlxtend etc.

* #### Apply association rule mining techniques to the pre-processed dataset to discover interesting relationships between products purchased together.

* #### Set appropriate threshold for support, confidence and lift to extract meaning full rules.

In [131]:
scores = apriori(new_data,min_support=0.001, use_colnames=True)
scores

Unnamed: 0,support,itemsets
0,0.020397,(almonds)
1,0.033329,(avocado)
2,0.033729,(brownies)
3,0.087188,(burgers)
4,0.030129,(butter)
...,...,...
5490,0.001466,"(spaghetti, frozen vegetables, mineral water, ..."
5491,0.001200,"(spaghetti, frozen vegetables, mineral water, ..."
5492,0.001067,"(spaghetti, frozen vegetables, olive oil, mine..."
5493,0.001067,"(spaghetti, ground beef, mineral water, herb &..."


In [132]:
rules = association_rules(scores)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,"(meatballs, whole wheat pasta)",(milk),0.0016,0.129583,0.001333,0.833333,6.430898,0.001126,5.222504,0.845854
1,"(red wine, soup)",(mineral water),0.002,0.238368,0.001866,0.933333,3.915511,0.00139,11.424477,0.746097
2,"(turkey, whole wheat pasta)",(mineral water),0.001733,0.238368,0.001466,0.846154,3.549776,0.001053,4.950607,0.719539
3,"(eggs, ground beef, brownies)",(mineral water),0.0012,0.238368,0.001067,0.888889,3.729058,0.000781,6.854686,0.732715
4,"(frozen vegetables, olive oil, burgers)",(mineral water),0.001466,0.238368,0.0012,0.818182,3.432428,0.00085,4.188975,0.709702
5,"(pancakes, frozen vegetables, burgers)",(spaghetti),0.001733,0.17411,0.001466,0.846154,4.859877,0.001165,5.368284,0.795612
6,"(milk, salmon, burgers)",(spaghetti),0.0012,0.17411,0.001067,0.888889,5.105326,0.000858,7.433009,0.805092
7,"(mineral water, cake, meatballs)",(milk),0.001067,0.129583,0.001067,1.0,7.717078,0.000928,inf,0.871347
8,"(cake, milk, meatballs)",(mineral water),0.0012,0.238368,0.001067,0.888889,3.729058,0.000781,6.854686,0.732715
9,"(olive oil, cake, shrimp)",(mineral water),0.0012,0.238368,0.0012,1.0,4.19519,0.000914,inf,0.762547


In [133]:
rules.sort_values(by = 'lift', ascending=False)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
33,"(spaghetti, ground beef, chocolate, mineral wa...",(frozen vegetables),0.0012,0.095321,0.001067,0.888889,9.325253,0.000952,8.142114,0.893837
7,"(mineral water, cake, meatballs)",(milk),0.001067,0.129583,0.001067,1.0,7.717078,0.000928,inf,0.871347
14,"(mineral water, escalope, hot dogs)",(milk),0.0012,0.129583,0.001067,0.888889,6.859625,0.000911,7.833755,0.855246
0,"(meatballs, whole wheat pasta)",(milk),0.0016,0.129583,0.001333,0.833333,6.430898,0.001126,5.222504,0.845854
12,"(escalope, shrimp, french fries)",(chocolate),0.0012,0.163845,0.001067,0.888889,5.425188,0.00087,7.525397,0.816654
6,"(milk, salmon, burgers)",(spaghetti),0.0012,0.17411,0.001067,0.888889,5.105326,0.000858,7.433009,0.805092
21,"(salmon, ground beef, shrimp)",(spaghetti),0.0012,0.17411,0.001067,0.888889,5.105326,0.000858,7.433009,0.805092
28,"(frozen vegetables, mineral water, ground beef...",(spaghetti),0.002,0.17411,0.001733,0.866667,4.977693,0.001385,6.194174,0.800705
5,"(pancakes, frozen vegetables, burgers)",(spaghetti),0.001733,0.17411,0.001466,0.846154,4.859877,0.001165,5.368284,0.795612
18,"(frozen vegetables, olive oil, tomatoes)",(spaghetti),0.002533,0.17411,0.002133,0.842105,4.836624,0.001692,5.230636,0.795259


# 4. Analysis and Interpretation:
* Analyse the generated rules to identify interesting patterns and relationships between the products.
* Interpret the results and provide insights into customer purchasing behaviour based on the discovered rules.

1) Most of the time when people buy products in bulk, they tend to buy Mineral water and spaghetti
2) when people buy 'chocolate', 'ground beef', 'milk', 'mineral water', 'spaghetti', they tend to buy frozen vegetables also
3) when people buy escalope, shrimp, french friesthey tend to buy frozen chocolate also

# 5. Interview Questions

Que. 1.	What is lift and why is it important in Association rules?

Ans: Lift is a measure used in association rule learning to evaluate the strength of an association rule. Association rules are widely used in data mining to identify relationships between variables in large datasets.

Que. 2.	What is support and Confidence. How do you calculate them?

Ans: Support and Confidence are two fundamental metrics used in association rule learning to measure the usefulness and reliability of rules discovered in data mining.

i) Support of an itemset (a set of items) refers to the proportion of transactions in the dataset that contain the itemset. It gives an idea of how frequently the itemset appears in the dataset.

ii) Confidence of an association rule $𝐴 \implies 𝐵$ is a measure of the reliability of the rule. It is defined as the proportion of transactions containing $𝐴$
that also contain $B$.

Que.3. What are some limitations or challenges of Association rules mining?

Ans: i) Handling large datasets can be computationally intensive and resource-demanding.

ii) Setting appropriate minimum support and confidence thresholds is subjective and can either miss important rules or generate too many irrelevant ones.

iii) Many generated rules can be redundant or fit the noise in the data, leading to overfitting.

iv) Understanding and interpreting a large number of complex rules can be challenging.