# Association rules (Apriori Algorithm)
In the Apriori algorithm, there are several measurements that can be used to evaluate the quality of the association rules that are generated. These measurements are commonly used to select the most interesting and useful rules from the set of generated rules. Some of the most commonly used measurements are:

Support: The support of a rule is the proportion of transactions in the dataset that contain both the antecedent and the consequent of the rule. A high support indicates that the rule is applicable to a large number of transactions in the dataset.

Confidence: The confidence of a rule is the proportion of transactions that contain the antecedent of the rule, which also contain the consequent of the rule. A high confidence indicates that the rule is likely to be true for a transaction that contains the antecedent.

Lift: The lift of a rule measures the degree of association between the antecedent and the consequent of the rule, relative to what would be expected if they were independent. A lift greater than 1 indicates a positive association between the antecedent and the consequent, while a lift less than 1 indicates a negative association.

Conviction: The conviction of a rule measures the degree to which the consequent of the rule is dependent on the antecedent, taking into account the frequency of the consequent. A high conviction indicates that the rule is highly dependent on the antecedent, and is not likely to occur by chance.

Leverage: The leverage of a rule measures the difference between the observed frequency of the antecedent and consequent occurring together, and the expected frequency if they were independent. A high leverage value indicates that the antecedent and consequent are strongly associated.

These measurements are useful for evaluating the quality and usefulness of association rules generated by the Apriori algorithm. By using these measurements, it is possible to identify the most interesting and meaningful rules for a given dataset and application.


-------------------------------------------------------------------------


Support, LHS, RHS, Lift, Confidence, and Conviction are all measures used in association rule mining, which is a data mining technique used to identify relationships or associations between different items in a dataset.

Support: The support of an itemset is the proportion of transactions in the dataset that contain that itemset. It measures how frequently an itemset appears in the dataset, and is calculated as the number of transactions containing the itemset divided by the total number of transactions.
tranaction contains X / count of transaction

LHS: LHS refers to the left-hand side of an association rule. It is the set of items that appear on the left-hand side of the "=>" symbol in an association rule. For example, in the association rule {Milk, Bread} => {Butter}, Milk and Bread are on the LHS.
Lift = (support {LHS, RHS}) / (support {LHS} x support {RHS})

RHS: RHS refers to the right-hand side of an association rule. It is the set of items that appear on the right-hand side of the "=>" symbol in an association rule. For example, in the association rule {Milk, Bread} => {Butter}, Butter is on the RHS.

Lift: The lift of an association rule measures the strength of the association between the LHS and the RHS, taking into account the frequency of occurrence of both the LHS and the RHS. It is calculated as the ratio of the support of the itemset {LHS, RHS} to the product of the supports of the itemsets LHS and RHS.

Confidence: The confidence of an association rule measures the proportion of transactions that contain the RHS, given that they also contain the LHS. It is calculated as the support of the itemset {LHS, RHS} divided by the support of the itemset LHS.

Conviction: The conviction of an association rule measures the degree of dependence between the LHS and the RHS, taking into account the frequency of occurrence of both the LHS and the RHS. It is calculated as the ratio of the complement of the support of the itemset {LHS, not RHS} to the complement of the support of the itemset LHS.
Conviction = (1 - support {RHS}) / (1 - confidence {LHS => RHS})

All of these measures are important in association rule mining, as they help to identify the most interesting and meaningful relationships between items in a dataset. By analyzing these measures, analysts can gain insights into the behavior and preferences of consumers, as well as identify potential cross-selling and upselling opportunities for businesses.

https://www.youtube.com/watch?v=qMQfUy8ndco&ab_channel=EzzaAk

## Loading the dataset

In [None]:
import pandas as pd
import numpy as np
import re

from google.colab import drive
drive.mount('/content/drive')

!pip install efficient-apriori==2.0.1
from efficient_apriori import apriori

Mounted at /content/drive
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting efficient-apriori==2.0.1
  Downloading efficient_apriori-2.0.1-py3-none-any.whl (14 kB)
Installing collected packages: efficient-apriori
Successfully installed efficient-apriori-2.0.1


In [None]:
Data = pd.read_excel('https://archive.ics.uci.edu/ml/machine-learning-databases/00502/online_retail_II.xlsx')

In [None]:
Data

Unnamed: 0,Invoice,StockCode,Description,Quantity,InvoiceDate,Price,Customer ID,Country
0,489434,85048,15CM CHRISTMAS GLASS BALL 20 LIGHTS,12,2009-12-01 07:45:00,6.95,13085.0,United Kingdom
1,489434,79323P,PINK CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
2,489434,79323W,WHITE CHERRY LIGHTS,12,2009-12-01 07:45:00,6.75,13085.0,United Kingdom
3,489434,22041,"RECORD FRAME 7"" SINGLE SIZE",48,2009-12-01 07:45:00,2.10,13085.0,United Kingdom
4,489434,21232,STRAWBERRY CERAMIC TRINKET BOX,24,2009-12-01 07:45:00,1.25,13085.0,United Kingdom
...,...,...,...,...,...,...,...,...
525456,538171,22271,FELTCRAFT DOLL ROSIE,2,2010-12-09 20:01:00,2.95,17530.0,United Kingdom
525457,538171,22750,FELTCRAFT PRINCESS LOLA DOLL,1,2010-12-09 20:01:00,3.75,17530.0,United Kingdom
525458,538171,22751,FELTCRAFT PRINCESS OLIVIA DOLL,1,2010-12-09 20:01:00,3.75,17530.0,United Kingdom
525459,538171,20970,PINK FLORAL FELTCRAFT SHOULDER BAG,2,2010-12-09 20:01:00,3.75,17530.0,United Kingdom


## Data Selection

In [None]:
Selected_Data = Data[['Invoice', 'Description', 'Country', 'StockCode']]
Selected_Data

Unnamed: 0,Invoice,Description,Country,StockCode
0,489434,15CM CHRISTMAS GLASS BALL 20 LIGHTS,United Kingdom,85048
1,489434,PINK CHERRY LIGHTS,United Kingdom,79323P
2,489434,WHITE CHERRY LIGHTS,United Kingdom,79323W
3,489434,"RECORD FRAME 7"" SINGLE SIZE",United Kingdom,22041
4,489434,STRAWBERRY CERAMIC TRINKET BOX,United Kingdom,21232
...,...,...,...,...
525456,538171,FELTCRAFT DOLL ROSIE,United Kingdom,22271
525457,538171,FELTCRAFT PRINCESS LOLA DOLL,United Kingdom,22750
525458,538171,FELTCRAFT PRINCESS OLIVIA DOLL,United Kingdom,22751
525459,538171,PINK FLORAL FELTCRAFT SHOULDER BAG,United Kingdom,20970


## Data Exploration

In [None]:
Selected_Data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 525461 entries, 0 to 525460
Data columns (total 4 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   Invoice      525461 non-null  object
 1   Description  522533 non-null  object
 2   Country      525461 non-null  object
 3   StockCode    525461 non-null  object
dtypes: object(4)
memory usage: 16.0+ MB


In [None]:
Exploration = pd.DataFrame({
                                    'No. Unique' : Selected_Data.nunique(),
                                    'NaN (Number)' : Selected_Data.isna().sum(),
                                    'Missing (Object)' : Selected_Data.isnull().sum(),
                                    'Duplicated' : Selected_Data.duplicated().sum()
                                    })

Exploration

Unnamed: 0,No. Unique,NaN (Number),Missing (Object),Duplicated
Invoice,28816,0,0,13335
Description,4681,2928,2928,13335
Country,40,0,0,13335
StockCode,4632,0,0,13335


## Data Cleansing

### Drop all N/A is acceptable

In [None]:
Selected_Data.dropna(inplace=True)
Selected_Data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return func(*args, **kwargs)


Unnamed: 0,Invoice,Description,Country,StockCode
0,489434,15CM CHRISTMAS GLASS BALL 20 LIGHTS,United Kingdom,85048
1,489434,PINK CHERRY LIGHTS,United Kingdom,79323P
2,489434,WHITE CHERRY LIGHTS,United Kingdom,79323W
3,489434,"RECORD FRAME 7"" SINGLE SIZE",United Kingdom,22041
4,489434,STRAWBERRY CERAMIC TRINKET BOX,United Kingdom,21232
...,...,...,...,...
525456,538171,FELTCRAFT DOLL ROSIE,United Kingdom,22271
525457,538171,FELTCRAFT PRINCESS LOLA DOLL,United Kingdom,22750
525458,538171,FELTCRAFT PRINCESS OLIVIA DOLL,United Kingdom,22751
525459,538171,PINK FLORAL FELTCRAFT SHOULDER BAG,United Kingdom,20970


### Delete rows where StockCode does not start with a digit
Because StockCode start with alphabet is not about product sold.

In [None]:
Selected_Data['StockCode'] = Selected_Data['StockCode'].astype('str')
Data_Digit = Selected_Data[Selected_Data['StockCode'].str[0].str.isdigit()]
Data_Digit

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  Selected_Data['StockCode'] = Selected_Data['StockCode'].astype('str')


Unnamed: 0,Invoice,Description,Country,StockCode
0,489434,15CM CHRISTMAS GLASS BALL 20 LIGHTS,United Kingdom,85048
1,489434,PINK CHERRY LIGHTS,United Kingdom,79323P
2,489434,WHITE CHERRY LIGHTS,United Kingdom,79323W
3,489434,"RECORD FRAME 7"" SINGLE SIZE",United Kingdom,22041
4,489434,STRAWBERRY CERAMIC TRINKET BOX,United Kingdom,21232
...,...,...,...,...
525456,538171,FELTCRAFT DOLL ROSIE,United Kingdom,22271
525457,538171,FELTCRAFT PRINCESS LOLA DOLL,United Kingdom,22750
525458,538171,FELTCRAFT PRINCESS OLIVIA DOLL,United Kingdom,22751
525459,538171,PINK FLORAL FELTCRAFT SHOULDER BAG,United Kingdom,20970


In [None]:
Data_Digit.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 519484 entries, 0 to 525460
Data columns (total 4 columns):
 #   Column       Non-Null Count   Dtype 
---  ------       --------------   ----- 
 0   Invoice      519484 non-null  object
 1   Description  519484 non-null  object
 2   Country      519484 non-null  object
 3   StockCode    519484 non-null  object
dtypes: object(4)
memory usage: 19.8+ MB


## Data Preparation

### We will try with a small subset of data
Create a dataframe of transactions where Country is UK

In [None]:
Data_UK= Data_Digit[Data_Digit['Country'] == 'United Kingdom']
Data_UK

Unnamed: 0,Invoice,Description,Country,StockCode
0,489434,15CM CHRISTMAS GLASS BALL 20 LIGHTS,United Kingdom,85048
1,489434,PINK CHERRY LIGHTS,United Kingdom,79323P
2,489434,WHITE CHERRY LIGHTS,United Kingdom,79323W
3,489434,"RECORD FRAME 7"" SINGLE SIZE",United Kingdom,22041
4,489434,STRAWBERRY CERAMIC TRINKET BOX,United Kingdom,21232
...,...,...,...,...
525456,538171,FELTCRAFT DOLL ROSIE,United Kingdom,22271
525457,538171,FELTCRAFT PRINCESS LOLA DOLL,United Kingdom,22750
525458,538171,FELTCRAFT PRINCESS OLIVIA DOLL,United Kingdom,22751
525459,538171,PINK FLORAL FELTCRAFT SHOULDER BAG,United Kingdom,20970


### Create a list of products for each transaction (Invoice)

In [None]:
transactions = Data_UK.groupby('Invoice')['Description'].apply(list)
transactions

Invoice
489434     [15CM CHRISTMAS GLASS BALL 20 LIGHTS, PINK CHE...
489435     [CAT BOWL , DOG BOWL , CHASING BALL DESIGN, HE...
489436     [DOOR MAT BLACK FLOCK , LOVE BUILDING BLOCK WO...
489437     [CHRISTMAS CRAFT HEART DECORATIONS, CHRISTMAS ...
489438     [DINOSAURS  WRITING SET , SET OF MEADOW  FLOWE...
                                 ...                        
C538119    [HAND WARMER UNION JACK, HAND WARMER OWL DESIG...
C538121                               [SAVOY ART DECO CLOCK]
C538122                      [GROW YOUR OWN PLANT IN A CAN ]
C538124    [ROSES REGENCY TEACUP AND SAUCER , REGENCY CAK...
C538164                        [SET OF 3 BLACK FLYING DUCKS]
Name: Description, Length: 23080, dtype: object

In [None]:
# Get the list of products from each transaction (invoice)

trans = transactions.to_list()
trans

[['15CM CHRISTMAS GLASS BALL 20 LIGHTS',
  'PINK CHERRY LIGHTS',
  ' WHITE CHERRY LIGHTS',
  'RECORD FRAME 7" SINGLE SIZE ',
  'STRAWBERRY CERAMIC TRINKET BOX',
  'PINK DOUGHNUT TRINKET POT ',
  'SAVE THE PLANET MUG',
  'FANCY FONT HOME SWEET HOME DOORMAT'],
 ['CAT BOWL ',
  'DOG BOWL , CHASING BALL DESIGN',
  'HEART MEASURING SPOONS LARGE',
  'LUNCHBOX WITH CUTLERY FAIRY CAKES '],
 ['DOOR MAT BLACK FLOCK ',
  'LOVE BUILDING BLOCK WORD',
  'HOME BUILDING BLOCK WORD',
  'ASSORTED COLOUR BIRD ORNAMENT',
  ' PEACE WOODEN BLOCK LETTERS',
  'CHRISTMAS CRAFT WHITE FAIRY ',
  'HEART IVORY TRELLIS LARGE',
  'HEART FILIGREE DOVE LARGE',
  'FULL ENGLISH BREAKFAST PLATE',
  'PIZZA PLATE IN BOX',
  'BLACK DINER WALL CLOCK',
  'SET OF 3 BLACK FLYING DUCKS',
  'AREA PATROLLED METAL SIGN',
  'PLEASE ONE PERSON  METAL SIGN',
  'BATH BUILDING BLOCK WORD',
  'CLASSIC WHITE FRAME',
  'SMALL MARSHMALLOWS PINK BOWL',
  'BISCUITS SMALL BOWL LIGHT BLUE',
  'SCOTTIE DOG HOT WATER BOTTLE'],
 ['CHRISTMAS CRAFT 

## Modelling

In [None]:
# Find interesting itemsets and assoication rules

itemsets, rules = apriori(trans, min_support=0.005, min_confidence=0.5)

## To DataFrame
[Reference](https://stackoverflow.com/questions/70256325/apriori-rule-to-pandas-dataframe)

In [None]:
# create a DataFrame from the rules
DF_Rules = pd.DataFrame(columns=['Antecedents', 'Consequents', 'Support', 'Confidence', 'Lift', 'Conviction'])

for rule in rules:
    antecedents = list(rule.lhs)
    consequents = list(rule.rhs)
    support = rule.support
    confidence = rule.confidence
    lift = rule.lift
    conviction = rule.conviction
    row = {'Antecedents': antecedents,
                'Consequents': consequents,
                'Support': support,
                'Confidence': confidence,
                'Lift': lift,
                'Conviction': conviction}
    DF_Rules = DF_Rules.append(row, ignore_index=True)

# print the DataFrame
DF_Rules

Unnamed: 0,Antecedents,Consequents,Support,Confidence,Lift,Conviction
0,[12 PENCIL SMALL TUBE WOODLAND],[12 PENCILS SMALL TUBE SKULL],0.006932,0.547945,26.129288,2.165732
1,[12 PENCILS SMALL TUBE RED RETROSPOT],[12 PENCILS SMALL TUBE SKULL],0.005589,0.648241,30.911998,2.783241
2,[3 STRIPEY MICE FELTCRAFT],[FELTCRAFT 6 FLOWER FRIENDS],0.008232,0.527778,14.449717,2.040300
3,[CHOCOLATE BOX RIBBONS ],[6 RIBBONS RUSTIC CHARM],0.005113,0.510823,15.657083,1.977553
4,[60 CAKE CASES VINTAGE CHRISTMAS],[SET OF 20 VINTAGE CHRISTMAS NAPKINS],0.009489,0.504608,23.719673,1.975661
...,...,...,...,...,...,...
1108,"[WOOD S/3 CABINET ANT WHITE FINISH, WOODEN PIC...","[WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN FR...",0.008276,0.712687,53.930511,3.434525
1109,"[WOOD S/3 CABINET ANT WHITE FINISH, WOODEN FRA...","[WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN PI...",0.008276,0.584098,46.646985,2.374305
1110,"[WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN PI...","[WOOD S/3 CABINET ANT WHITE FINISH, WOODEN FRA...",0.008276,0.660900,46.646985,2.907198
1111,"[WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN FR...","[WOOD S/3 CABINET ANT WHITE FINISH, WOODEN PIC...",0.008276,0.626230,53.930511,2.644372


## DataFrame Decoration

In [None]:
DF_Rules['Antecedents'] = DF_Rules['Antecedents'].astype('str')
DF_Rules['Consequents'] = DF_Rules['Consequents'].astype('str')
DF_Rules.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1113 entries, 0 to 1112
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Antecedents  1113 non-null   object 
 1   Consequents  1113 non-null   object 
 2   Support      1113 non-null   float64
 3   Confidence   1113 non-null   float64
 4   Lift         1113 non-null   float64
 5   Conviction   1113 non-null   float64
dtypes: float64(4), object(2)
memory usage: 52.3+ KB


In [None]:
# String Cleaning
DF_Rules['Antecedents'] = DF_Rules['Antecedents'].str.replace(r"[\[\]\'\"]", "").str.rstrip(',')
DF_Rules['Consequents'] = DF_Rules['Consequents'].str.replace(r"[\[\]\'\"]", "").str.rstrip(',')


DF_Rules

  DF_Rules['Antecedents'] = DF_Rules['Antecedents'].str.replace(r"[\[\]\'\"]", "").str.rstrip(',')
  DF_Rules['Consequents'] = DF_Rules['Consequents'].str.replace(r"[\[\]\'\"]", "").str.rstrip(',')


Unnamed: 0,Antecedents,Consequents,Support,Confidence,Lift,Conviction
0,12 PENCIL SMALL TUBE WOODLAND,12 PENCILS SMALL TUBE SKULL,0.006932,0.547945,26.129288,2.165732
1,12 PENCILS SMALL TUBE RED RETROSPOT,12 PENCILS SMALL TUBE SKULL,0.005589,0.648241,30.911998,2.783241
2,3 STRIPEY MICE FELTCRAFT,FELTCRAFT 6 FLOWER FRIENDS,0.008232,0.527778,14.449717,2.040300
3,CHOCOLATE BOX RIBBONS,6 RIBBONS RUSTIC CHARM,0.005113,0.510823,15.657083,1.977553
4,60 CAKE CASES VINTAGE CHRISTMAS,SET OF 20 VINTAGE CHRISTMAS NAPKINS,0.009489,0.504608,23.719673,1.975661
...,...,...,...,...,...,...
1108,"WOOD S/3 CABINET ANT WHITE FINISH, WOODEN PICT...","WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN FRA...",0.008276,0.712687,53.930511,3.434525
1109,"WOOD S/3 CABINET ANT WHITE FINISH, WOODEN FRAM...","WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN PIC...",0.008276,0.584098,46.646985,2.374305
1110,"WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN PIC...","WOOD S/3 CABINET ANT WHITE FINISH, WOODEN FRAM...",0.008276,0.660900,46.646985,2.907198
1111,"WOOD 2 DRAWER CABINET WHITE FINISH, WOODEN FRA...","WOOD S/3 CABINET ANT WHITE FINISH, WOODEN PICT...",0.008276,0.626230,53.930511,2.644372


## To CSV

In [None]:
DF_Rules.to_csv('/content/drive/MyDrive/Data Master/Colab/Class 5/Apriori.csv', index=False)