# A3Q2: Frequent Itemset and Association Rules Mining using Apriori Algorithm

In this part, you will build a system which can help make recommendations like the one below:

<img src="attachment:df2529e6-04ef-4583-be75-356f168f8eee.png" width="500">

To solve this assignment you will need to go though these pages:

* https://rasbt.github.io/mlxtend/user_guide/preprocessing/TransactionEncoder/
* https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/
* https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/
* https://rasbt.github.io/mlxtend/user_guide/frequent_patterns/fpgrowth/

The `apply` function in `pandas` can prove very useful for this assignment. See https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

**Source**: Online Retail. (2015). UCI Machine Learning Repository. https://doi.org/10.24432/C5BW33.

In [1]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

## Load and Inspect Data

In [2]:
invoices = pd.read_csv('data/data.csv')
invoices.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


## Data Transformation

Drop everything except InvoiceNo and StockCode since we can use InvoiceNo for transaction id and StockCode for item name

In [27]:
# Your code here

In [4]:
data.head()

Unnamed: 0,InvoiceNo,StockCode
0,536365,85123A
1,536365,71053
2,536365,84406B
3,536365,84029G
4,536365,84029E


Group the data by InvoiceNo and create a list of StockCode for each invoice

In [29]:
# Your code here

In [6]:
transactions[0:4]

[['85123A', '71053', '84406B', '84029G', '84029E', '22752', '21730'],
 ['22633', '22632'],
 ['84879',
  '22745',
  '22748',
  '22749',
  '22310',
  '84969',
  '22623',
  '22622',
  '21754',
  '21755',
  '21777',
  '48187'],
 ['22960', '22913', '22912', '22914']]

Using TransactionEncoder, convert the transactions into a dataset where each row represents a transaction and each column represents an item. The values will be True or False depending on whether the item is present in that specific transaction.

In [28]:
# Your code here

In [8]:
transactions_df.head()

Unnamed: 0,10002,10080,10120,10123C,10123G,10124A,10124G,10125,10133,10134,...,M,PADS,POST,S,gift_0001_10,gift_0001_20,gift_0001_30,gift_0001_40,gift_0001_50,m
0,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
3,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False


## Use Apriori to get the frequent itemsets and inspect the results

Use apriori to find the frequent_itemsets for `min_sup` = `1%`

In [30]:
# Your code here

In [10]:
frequent_itemsets.shape

(1087, 2)

In [11]:
frequent_itemsets.head()

Unnamed: 0,support,itemsets
0,0.020193,(15036)
1,0.012587,(15056BL)
2,0.017876,(15056N)
3,0.011236,(16237)
4,0.01251,(20675)


Add an additional column called `items_count` to the dataframe which represents the number of items in the itemset.

In [12]:
# Your code here

In [13]:
frequent_itemsets.head()

Unnamed: 0,support,itemsets,items_count
0,0.020193,(15036),1
1,0.012587,(15056BL),1
2,0.017876,(15056N),1
3,0.011236,(16237),1
4,0.01251,(20675),1


Display the various itemsets generated sorted (descending) by the items_count.

In [14]:
# Your code here

Unnamed: 0,support,itemsets,items_count
1086,0.011699,"(22697, 22699, 22423, 22698)",4
1085,0.010386,"(22386, 22411, 21931, 85099B)",4
1084,0.010077,"(20719, 20723, 20724, 22355)",4
1032,0.012548,"(22384, 20725, 20728)",3
1024,0.011042,"(22384, 20726, 20725)",3


Show how many itemsets exist by items_count

In [26]:
# Your code here

## Generate association rules
Generate all association rules using the `lift` metric with a minimum value of 2

In [25]:
# Your code here

In [17]:
rules.shape

(1338, 10)

In [18]:
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(20712),(20711),0.033668,0.020541,0.011158,0.331422,16.135019,0.010467,1.464989,0.970705
1,(20711),(20712),0.020541,0.033668,0.011158,0.543233,16.135019,0.010467,2.115591,0.957695
2,(21931),(20711),0.046371,0.020541,0.011506,0.248127,12.079846,0.010553,1.302692,0.961818
3,(20711),(21931),0.020541,0.046371,0.011506,0.56015,12.079846,0.010553,2.16808,0.936453
4,(22386),(20711),0.047529,0.020541,0.010888,0.229082,11.152679,0.009912,1.270511,0.955762


In [19]:
invoices.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


Add the names of the items back in the data frame as save all rules in a csv file

In [31]:
# Your code here

In [21]:
rules.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric,antecedents_desciption,consequents_description
0,(20712),(20711),0.033668,0.020541,0.011158,0.331422,16.135019,0.010467,1.464989,0.970705,[JUMBO BAG WOODLAND ANIMALS],[JUMBO BAG TOYS ]
1,(20711),(20712),0.020541,0.033668,0.011158,0.543233,16.135019,0.010467,2.115591,0.957695,[JUMBO BAG TOYS ],[JUMBO BAG WOODLAND ANIMALS]
2,(21931),(20711),0.046371,0.020541,0.011506,0.248127,12.079846,0.010553,1.302692,0.961818,[JUMBO STORAGE BAG SUKI],[JUMBO BAG TOYS ]
3,(20711),(21931),0.020541,0.046371,0.011506,0.56015,12.079846,0.010553,2.16808,0.936453,[JUMBO BAG TOYS ],[JUMBO STORAGE BAG SUKI]
4,(22386),(20711),0.047529,0.020541,0.010888,0.229082,11.152679,0.009912,1.270511,0.955762,[JUMBO BAG PINK POLKADOT],[JUMBO BAG TOYS ]


In [24]:
rules.shape

(1338, 12)

In [23]:
# I used the following line to create the rules_100.csv file which only gives you 100 rules.
# rules.sample(100).to_csv('rules_100.csv', index=False)

# You must submit the rules.csv file that contains all the 1338 rules by running the following command
rules.to_csv('rules.csv', index=False) # 20 Marks