In [1]:
from mlxtend.frequent_patterns import fpgrowth, association_rules, apriori
import timeit
import pandas as pd
import json
import csv

In [6]:
#! pip install mlxtend


0

In [29]:
#! python -m wget 'https://github.com/dbdmg/data-science-lab/raw/master/datasets/online_retail.csv' -o online_retail.csv
0

0

In [16]:
! python -m wget "https://raw.githubusercontent.com/dbdmg/data-science-lab/master/datasets/modified_coco.json" -o coco.json


Saved under coco.json


In [19]:
file = "coco.json"
with open(file) as f:
    coco_data = json.load(f)


In [22]:
coco_data[0]

{'file_name': '000000095096.png',
 'image_id': 95096,
 'annotations': ['car', 'car', 'train', 'stop sign']}

{  
"file_name": "000000465265.png",  
"image_id": 465265,  
"annotations": \[  
    "person",  
    "person",  
    "person",  
    "fire hydrant",  
    "handbag",  
    "chair",  
    "cell phone"  
]  
}  
This means that the image contains 3 people, a fire hydrant, a handbag, a chair and a cell phone. 

In [23]:
len(coco_data)

5000

## 2.1 Association rules from frequent itemsets

This exercise will work on the Online Retail Data Set.  
In particular, you will do some data preprocessing on the dataset to extract all itemsets available (where each itemset is the collection of items contained in a single invoice).  
Then, using FP-Growth and Apriori implementations, you will extract a list of frequent itemsets.  
From those, you will finally extract several different association rules.

1. First, you need to load the dataset into memory, using the csv module.  
Make sure you identify all valid rows.  
Also consider that rows having an InvoiceNo that starts with C should be discarded, as they indicate that the invoice is about a cancelled purchase.

• InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter §c’, it indicates a cancellation.  
• StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each
distinct product.  
• Description: Product (item) name. Nominal.  
• Quantity: The quantities of each product (item) per transaction. Numeric.  
• InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated.  
• UnitPrice: Unit price. Numeric, Product price per unit in sterling.  
• CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.  
• Country: Country name. Nominal, the name of the country where each customer resides.  

In [2]:
file = "online_retail.csv"
retail_data = []
with open(file) as f:
    label = next(f)
    labels_index = {i: k for i, k in enumerate(label.strip().split(","))}
    for row in csv.reader(f):
        if len(row) == 8 and row[0][0] not in ("c", "C"):
            retail_data.append(list([
                row[0],
                row[1],
                row[2],
                float(row[3]),
                row[4],
                float(row[5]),
                row[6],
                row[7],]
            ))

labels_index

{0: 'InvoiceNo',
 1: 'StockCode',
 2: 'Description',
 3: 'Quantity',
 4: 'InvoiceDate',
 5: 'UnitPrice',
 6: 'CustomerID',
 7: 'Country'}

In [24]:
len(retail_data)

532621

2. Now that you have a dataset of items, you should aggregate it at an “invoice” level.  
For each invoice (identified by InvoiceNo) there can be multiple items (from multiple rows) in the dataset.  
For each invoice, you should build a list of all items belonging to it.

In [25]:
# chars of invoice
set([y for x in retail_data for y in x[0]])

{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A'}

In [3]:
invoice_itemset = {}
global_itemset = set()
for row in retail_data:
    invoice = row[0]
    stock_desc = row[2]
    if row[0] in invoice_itemset:
        invoice_itemset[invoice].append(stock_desc)
    else:
        invoice_itemset[invoice] = [stock_desc]
    global_itemset.add(stock_desc)

In [4]:
invoice_itemset["574021"]

['GARDENERS KNEELING PAD KEEP CALM ',
 'HOT WATER BOTTLE KEEP CALM',
 'DOORMAT KEEP CALM AND COME IN']

In [5]:
len(global_itemset), len(invoice_itemset)

(4208, 22064)

3. You should now have a list (one for each invoice) of lists (each list containing the items bought for
that invoice).  
Now, we need to convert this into a matrix form.  
Of the many possible formats, we will use the one expected by the Mlxtend library, which is as follows.  
Given an ordered list of M
possible items (in this case, all possible products that can be bought), and given N itemsets (in this
case, invoices), we should build a matrix of N rows and M columns.  
The element at the ith row and jth column should be 1 if the ith itemset (invoice) contains the jth item (product), 0 otherwise.

In [6]:
mat = []
for inv in invoice_itemset.values():
    row = []
    for gx in global_itemset:
        if gx in inv:
            row.append(1)
        else:
            row.append(0)
    mat.append(row)        

In [7]:
len(mat[0]), len(mat)

(4208, 22064)

In [8]:
df = pd.DataFrame(data= mat, columns= global_itemset)

In [9]:
df.head()

Unnamed: 0,Unnamed: 1,SILVER LATTICE VANILLA CANDLE POT,MAGNETS PACK OF 4 RETRO PHOTO,IVORY PAPER CUP CAKE CASES,BINGO SET,CREAM CLIMBING HYDRANGA ART FLOWER,ALPHABET HEARTS STICKER SHEET,BLUE BUNNY EASTER EGG BASKET,3 TIER CAKE TIN RED AND CREAM,ENCHANTED BIRD COATHANGER 5 HOOK,...,VICTORIAN METAL POSTCARD SPRING,PACKING CHARGE,ROSE FLOWER CANDLE+INCENSE 16X16CM,VINTAGE UNION JACK DOORSTOP,LOVE GARLAND PAINTED ZINC,FOLKART ZINC STAR CHRISTMAS DEC,REGENCY SUGAR BOWL GREEN,NUMBER TILE VINTAGE FONT 0,REGENCY TEA PLATE GREEN,ETCHED GLASS STAR TREE DECORATION
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


4. With the df that you defined in the previous exercise,  
you can now use the fp_growth function. This function, which is described in the detail in the official documentation.  
The first argument required is the previously built DataFrame, df.  
The second is the minimum support (minsup), i.e. the minimum fraction of the entire dataset in which the itemset should show up for it to be considered “frequent”.  
Try using different values of minsup, such as 0.5, 0.1, 0.05, 0.02, 0.01.  
How many results do you obtain as minsup varies?  
You can check the number of frequent itemsets identified and print them all with the following code snipped:
```
    fi = fpgrowth(df, 0.05)
    print(len(fi))
    print(fi.to_string())


In [10]:
for x in (0.5,0.1,0.05,0.02,0.01):
    print(x, len(fpgrowth(df, x)))

0.5 0
0.1 1
0.05 23
0.02 303
0.01 1472


In [11]:
fi = fpgrowth(df, 0.1, use_colnames= True)


In [12]:
fi

Unnamed: 0,support,itemsets
0,0.102429,(WHITE HANGING HEART T-LIGHT HOLDER)


In [101]:
def get_desc(item):
    for row in retail_data:
        if row[1] == item:
            return row[2]

def itemCode_to_desc(items_Set):
    return [get_desc(i) for i in items_Set]
    

In [102]:
# lets add description to the items
fi["description"] = fi.itemsets.apply(itemCode_to_desc)

In [115]:
pd.set_option("max_colwidth", 100)


In [116]:
fi

Unnamed: 0,support,itemsets,description
0,0.086718,(85123A),[WHITE HANGING HEART T-LIGHT HOLDER]
1,0.056680,(84879),[ASSORTED COLOUR BIRD ORNAMENT]
2,0.030386,(21754),[HOME BUILDING BLOCK WORD]
3,0.024363,(21755),[LOVE BUILDING BLOCK WORD]
4,0.023243,(48187),[DOORMAT NEW ENGLAND]
...,...,...,...
215,0.021197,"(22697, 22699, 22698)","[GREEN REGENCY TEACUP AND SAUCER, ROSES REGENCY TEACUP AND SAUCER , PINK REGENCY TEACUP AND SAUCER]"
216,0.021429,"(23199, 85099B)","[JUMBO BAG APPLES, JUMBO BAG RED RETROSPOT]"
217,0.020000,"(23202, 23203)","[mailout, mailout]"
218,0.022471,"(85099B, 23203)","[JUMBO BAG RED RETROSPOT, mailout]"


In [142]:
filt = fi.itemsets.apply(len) > 1
fi[filt].shape

(36, 3)

given a tupple (23199, 85099B), we can check if
23199 ==> 85099B  
or 
85099B ==> 23199  

lets say minconf = 0.5

In [133]:
# Get all the supports
filt = fi.itemsets == frozenset(["23199", "85099B"])
P_23199_85099B = fi.loc[filt, "support"].values[0]
filt = fi.itemsets == frozenset(["23199"])
P_23199 = fi.loc[filt, "support"].values[0]
filt = fi.itemsets == frozenset(["85099B"])
P_85099B = fi.loc[filt, "support"].values[0]


In [135]:
# 23199 ==> 85099B
conf = P_23199_85099B / P_23199
conf

0.5555555555555556

In [136]:
# 85099B ==> 23199
conf = P_23199_85099B / P_85099B
conf

0.25995316159250587

we can say that 23199 ==> 85099B holds with a probability of above 50%

In [151]:
association_rules(fi, min_threshold=0.5)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(22726),(22727),0.038726,0.041737,0.024942,0.644068,15.431412,0.023326,2.692261
1,(22727),(22726),0.041737,0.038726,0.024942,0.597595,15.431412,0.023326,2.388821
2,(22386),(85099B),0.047529,0.082432,0.032162,0.676686,8.208973,0.028244,2.838004
3,(21931),(85099B),0.046371,0.082432,0.028301,0.610325,7.403939,0.024479,2.354698
4,(85099C),(85099B),0.036564,0.082432,0.022896,0.626188,7.596379,0.019882,2.454623
5,(21929),(85099B),0.033861,0.082432,0.020154,0.595211,7.220592,0.017363,2.26678
6,(22411),(85099B),0.04583,0.082432,0.026371,0.5754,6.980264,0.022593,2.161017
7,(22910),(22086),0.032124,0.045174,0.021429,0.667067,14.766704,0.019977,2.867926
8,(22384),(20725),0.042857,0.062085,0.023668,0.552252,8.895108,0.021007,2.09474
9,(20726),(20725),0.040039,0.062085,0.020541,0.513018,8.263168,0.018055,1.925976


7. Extract the association rules from the frequent itemsets extracted with minsup = 0.01.  
You can find the documentation for association_rules() on the official documentation.  
You can use the confidence as the metric to identify the rules, and a minimum threshold of 0.85  
(feel free to vary
these values and observe how the results vary).


In [143]:
fi2 = fpgrowth(df, min_support= 0.01, use_colnames= True)

In [145]:
? association_rules

[1;31mSignature:[0m
 [0massociation_rules[0m[1;33m([0m[1;33m
[0m    [0mdf[0m[1;33m,[0m[1;33m
[0m    [0mmetric[0m[1;33m=[0m[1;34m'confidence'[0m[1;33m,[0m[1;33m
[0m    [0mmin_threshold[0m[1;33m=[0m[1;36m0.8[0m[1;33m,[0m[1;33m
[0m    [0msupport_only[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Generates a DataFrame of association rules including the
metrics 'score', 'confidence', and 'lift'

Parameters
-----------
df : pandas DataFrame
  pandas DataFrame of frequent itemsets
  with columns ['support', 'itemsets']

metric : string (default: 'confidence')
  Metric to evaluate if a rule is of interest.
  **Automatically set to 'support' if `support_only=True`.**
  Otherwise, supported metrics are 'support', 'confidence', 'lift',
  'leverage', and 'conviction'
  These metrics are computed as follows:

  - support(A->C) = support(A+C) [aka 'support'], range: [0, 1]

  - confidence(A->C) =

In [156]:
association_rules(fi2, min_threshold=0.85)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,"(20723, 22356)",(20724),0.014749,0.040541,0.012664,0.858639,21.179756,0.012066,6.787287
1,"(20723, 22355, 20719)",(20724),0.011737,0.040541,0.010077,0.858553,21.177632,0.009601,6.783155
2,"(21931, 22386, 22411)",(85099B),0.011969,0.082432,0.010386,0.867742,10.526705,0.009399,6.937706
3,"(21086, 21080)",(21094),0.011429,0.020347,0.010232,0.89527,43.999051,0.009999,9.354101
4,"(22698, 22423)",(22697),0.015521,0.040811,0.013359,0.860697,21.089915,0.012726,6.885608
5,"(22697, 22698, 22423)",(22699),0.013359,0.043243,0.011699,0.875723,20.251084,0.011121,7.698554
6,"(22698, 22699, 22423)",(22697),0.013012,0.040811,0.011699,0.89911,22.031167,0.011168,9.507258
7,"(22697, 22698)",(22699),0.024865,0.043243,0.021197,0.852484,19.713703,0.020122,6.485804
8,"(22698, 22699)",(22697),0.023707,0.040811,0.021197,0.894137,21.909313,0.020229,9.060649
9,(23172),(23171),0.012124,0.014903,0.010888,0.898089,60.260387,0.010707,9.66626


8. (*) Rerun the experiments from point 4 with apriori()
Do the results match with the ones found by FP-Growth?  
Is Apriori faster or slower than FP-Growth?
You can measure how long a function call takes with the following code snippet:
```
    import timeit
    # number=1 means that it executes the function only once
    timeit.timeit(lambda: apriori(df, 0.01), number=1)


In [158]:
? apriori

[1;31mSignature:[0m
 [0mapriori[0m[1;33m([0m[1;33m
[0m    [0mdf[0m[1;33m,[0m[1;33m
[0m    [0mmin_support[0m[1;33m=[0m[1;36m0.5[0m[1;33m,[0m[1;33m
[0m    [0muse_colnames[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m    [0mmax_len[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m    [0mverbose[0m[1;33m=[0m[1;36m0[0m[1;33m,[0m[1;33m
[0m    [0mlow_memory[0m[1;33m=[0m[1;32mFalse[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m
Get frequent itemsets from a one-hot DataFrame

Parameters
-----------
df : pandas DataFrame
  pandas DataFrame the encoded format. Also supports
  DataFrames with sparse data; for more info, please
  see (https://pandas.pydata.org/pandas-docs/stable/
       user_guide/sparse.html#sparse-data-structures)

  Please note that the old pandas SparseDataFrame format
  is no longer supported in mlxtend >= 0.17.2.

  The allowed values are either 0/1 or True/False.
  For example,



In [161]:
ap_fi = apriori(df, min_support= 0.02)
ap_fi

Unnamed: 0,support,itemsets
0,0.033861,(0)
1,0.020077,(8)
2,0.021158,(9)
3,0.020193,(15)
4,0.048880,(23)
...,...,...
215,0.021081,"(2698, 3366)"
216,0.032162,"(2698, 3371)"
217,0.021429,"(3666, 2884)"
218,0.030270,"(4057, 3972)"


In [162]:
? timeit.timeit

[1;31mSignature:[0m
 [0mtimeit[0m[1;33m.[0m[0mtimeit[0m[1;33m([0m[1;33m
[0m    [0mstmt[0m[1;33m=[0m[1;34m'pass'[0m[1;33m,[0m[1;33m
[0m    [0msetup[0m[1;33m=[0m[1;34m'pass'[0m[1;33m,[0m[1;33m
[0m    [0mtimer[0m[1;33m=[0m[1;33m<[0m[0mbuilt[0m[1;33m-[0m[1;32min[0m [0mfunction[0m [0mperf_counter[0m[1;33m>[0m[1;33m,[0m[1;33m
[0m    [0mnumber[0m[1;33m=[0m[1;36m1000000[0m[1;33m,[0m[1;33m
[0m    [0mglobals[0m[1;33m=[0m[1;32mNone[0m[1;33m,[0m[1;33m
[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m Convenience function to create Timer object and call timeit method.
[1;31mFile:[0m      c:\programdata\anaconda3\lib\timeit.py
[1;31mType:[0m      function


In [166]:
timeit.timeit(lambda: apriori(df, 0.02), number=1)


44.723913699999684

In [167]:
timeit.timeit(lambda: fpgrowth(df, 0.02), number=1)


11.427067700000407

## 2.2 Apriori implementation

In [2]:
 from itertools import combinations, permutations
 from collections import defaultdict

In [3]:
data = [
    {'a','b'},
    {'b','c','d'},
    {'a','c','d','e'},
    {'a','d','e'},
    {'a','b','c'},
    {'a','b','c','d'},
    {'b','c'},
    {'a','b','c'},
    {'a','b','d'},
    {'b','c','e'}
]

In [229]:
def scan(data, itemset, limit):
    """Scans dataset and computes support for given itemsets
    Parameters:
    data: list of all transactions
    itemset: list of transactions to be evaluated
    limit: float; minsup threshold

    Returns:
    a list of dictionaries, where each row is a frequent item with values as support 
    """
    N = len(data)
    C = defaultdict(lambda: 0)
    for items in itemset:
        #items = line.split("_")
        support_count = 0
        for t in data:
            if len(t) >= len(items):
                # only the transactions with same or larger N. of items
                support_count += all([x in t for x in items])
        if support_count/N >= limit:
            C[tuple(items)] = support_count/N
    return dict(C)


In [220]:
def prune_by_subset(f_subsets, new_cands):
    for items in new_cands:
        combs = combinations(items, len(items) - 1)
        all_combs = [tuple(i) for i in combs]
        for c in all_combs:
            if c not in f_subsets:
                if items in new_cands:
                    new_cands.remove(items)
                continue
    return new_cands


In [187]:
def pref_matching(itemset):
    new_cands = []
    pref_main = ""

    # Edge case, when creating candidates for level 2
    if len(itemset[0]) == 1:
        combs = combinations([t for l in itemset for t in l], 2)
        return list(combs)

    for i in range(len(itemset)):
        main_pref = itemset[i][:-1]
        for j in range(i+1, len(itemset)):
            pref = itemset[j][:-1]
            if set(pref) == set(main_pref):
                new_cand = itemset[i] + (itemset[j][-1],)
                new_cands.append(new_cand)
    return new_cands

#pref_matching([('a', 'b'), ('a', 'c'), ('a', 'd'), ('a', 'e'), ('b', 'c'), ('b', 'd'), ('c', 'd'), ('c', 'e'), ('d', 'e')])



In [210]:
def main_apriori(data,minsup = 0.1):
    """ Apriori algorithm to find frequent transactions
    __parameters:__  
         data: unorderd list of transactions with unique, unordered items  
         minsup: minimum support level for a transaction to be frequent  
         where support(t) = freq(t) / |data|

     __returns:__
         A list of dictionaries,  
         where keys are the tuples representing frequent transactions and values as their Support.
    """

    from itertools import combinations
    from collections import defaultdict

    # level: 1
    candidates = []
    fr_items = []
    new_cands = [tuple([w], ) for w in sorted(set([x for subset in data for x in subset ]))]
    fr_items.append(scan(data, new_cands, minsup))
    if fr_items[0] == {}:
        return fr_items

    # levels > 1
    i = 0
    while(fr_items[i] != {}):
        i += 1
        prev_fr_items = list(fr_items[i-1].keys())
        new_cands = pref_matching(prev_fr_items)
        new_cands = prune_by_subset(prev_fr_items, new_cands)
        fr_items.append(scan(data, new_cands, minsup))

    return fr_items[:-1]



In [8]:
from pprint import PrettyPrinter
pp = PrettyPrinter(width=20, compact=True, indent = 4)

In [230]:
fi = main_apriori(data, 0.2)
pp.pprint(fi)

[   {   ('a',): 0.7,
        ('b',): 0.8,
        ('c',): 0.7,
        ('d',): 0.5,
        ('e',): 0.3},
    {   ('a', 'b'): 0.5,
        ('a', 'c'): 0.4,
        ('a', 'd'): 0.4,
        ('a', 'e'): 0.2,
        ('b', 'c'): 0.6,
        ('b', 'd'): 0.3,
        ('c', 'd'): 0.3,
        ('c', 'e'): 0.2,
        ('d', 'e'): 0.2},
    {   ('a', 'b', 'c'): 0.3,
        ('a', 'b', 'd'): 0.2,
        ('a', 'c', 'd'): 0.2,
        ('a', 'd', 'e'): 0.2,
        ('b', 'c', 'd'): 0.2}]


In [66]:
file = "coco.json"
with open(file) as f:
    coco_data = json.load(f)

In [67]:
coco_data_items = [set(image['annotations']) for image in coco_data]

In [68]:
coco_data_items[:2]

[{'car', 'stop sign', 'train'},
 {'bench', 'chair', 'dining table', 'person', 'potted plant'}]

In [222]:
coco_fi = main_apriori(coco_data_items, 0.02)
#pp.pprint(coco_fi)
sum([len(x) for x in coco_fi])

144

In [75]:
global_itemset = set([x for t in coco_data_items for x in t ])

In [76]:
global_itemset.__len__()

78

In [77]:

mat = []
for inv in coco_data_items:
    row = []
    for gx in global_itemset:
        if gx in inv:
            row.append(1)
        else:
            row.append(0)
    mat.append(row)   

In [78]:
df = pd.DataFrame(data = mat, columns= global_itemset)
df.head()

Unnamed: 0,person,scissors,mouse,tennis racket,banana,car,vase,skateboard,fire hydrant,oven,...,umbrella,bird,fork,couch,handbag,toothbrush,airplane,bear,clock,horse
0,0,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,1,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


In [88]:
print(fpgrowth(df, 0.1, use_colnames= True).to_string())

    support                      itemsets
0    0.3704                         (car)
1    0.1332                   (stop sign)
2    0.5886                      (person)
3    0.4338                       (bench)
4    0.1346                (fire hydrant)
5    0.1286                       (truck)
6    0.3230               (traffic light)
7    0.1230                     (handbag)
8    0.2386                 (car, person)
9    0.3208               (person, bench)
10   0.1032                  (car, truck)
11   0.1978          (traffic light, car)
12   0.1902       (traffic light, person)
13   0.1362  (traffic light, car, person)
14   0.1224             (person, handbag)


In [83]:
print(apriori(df, 0.02, use_colnames= True).to_string())

     support                                       itemsets
0     0.5886                                       (person)
1     0.0214                                (tennis racket)
2     0.3704                                          (car)
3     0.0344                                   (skateboard)
4     0.1346                                 (fire hydrant)
5     0.3230                                (traffic light)
6     0.0354                                   (cell phone)
7     0.4338                                        (bench)
8     0.0602                                        (chair)
9     0.0852                                     (backpack)
10    0.0300                               (baseball glove)
11    0.1286                                        (truck)
12    0.0368                                  (sports ball)
13    0.0762                                      (bicycle)
14    0.0276                                          (dog)
15    0.0410                            

In [233]:
timeit.timeit(lambda: main_apriori(coco_data_items, 0.02), number= 10)/10

3.0361205699999116

In [224]:
timeit.timeit(lambda: apriori(df, 0.02, use_colnames= True), number= 1)

0.38145710000026156

In [225]:
timeit.timeit(lambda: fpgrowth(df, 0.02, use_colnames= True), number= 1)

0.22929500000100234

In [1]:
2 + 2 

4