##  Example of using "Trie of rules" library

There are two options for using the library:

1) having a data and minimum support value, create a Trie of rules with authomaticly generated association rrules using popular library(mlxtend: http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/)

2) having original data, create a Trie of rules with previosly mined association rules(or frequent sequences). The association rules set should be presented as a list of items in a rule.

Both approaches are provided below.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from trieofrules import trieofrules

#original dataset
data = [
        ['f','c','a','m','p','q'],
        ['f','c','a','b','m'],
        ['f','b','e'],
        ['c','b','p'],
        ['f','c','a','m','p'] ]

#association rules for visualisation presented as a list of frequent sequences
rules = [['f','c','a','m','p'],
        ['f','b'],
        ['c','b'],
        ['f','c','q']
        ]

#1st approcah using only data
TOR = trieofrules(data = data, min_support = 0.3, alg='FP-max' ) #supported algorithms: FP-max, FP-growth, Apriori

#2nd approach using data and pre-mined frequent sequences
TOR_premined = trieofrules(data = data, alg = 'Apriori', frequent_sequences = rules)

#draw trie of rules without metics
TOR_premined.draw()

#save as a grpah file. Supprted formats: gexf, gml, graphml
TOR.save_graph(filename = 'Trie of rules example.gml',fileformat = 'gml')
 

## Bakery dataset example

In [3]:
import pandas as pd

data = pd.read_csv('datasets/BreadBasket_DMS.csv')
data.head()
transactions = []
items = set()
number_of_dirty_transactions = 0
for el in data.groupby('Transaction'):
    list_of_itmes= []
    
    for item in el[1]['Item']:
        if item != "NONE":
            list_of_itmes.append(item)
            items.add(item)
    transactions.append(list_of_itmes)
 

transactions_without_dublicates = []
for transaction in transactions:    
    transactions_without_dublicates.append(list(dict.fromkeys(transaction)))
#transactions_without_dublicates
print(len(transactions_without_dublicates))


9531


In [4]:
TOR_bakery = trieofrules(data = transactions_without_dublicates, min_support = 0.02, alg='FP-max' ) 
TOR_bakery.save()


In [5]:
import math
for i in [i/1000 for i in range(1,1000,10)]:
    print(math.log(i+1))

0.0009995003330834232
0.010939940038334263
0.02078253918252841
0.03052920503482279
0.04018178963283176
0.04974209189481401
0.05921185963184603
0.06859279146561167
0.0778865386570712
0.08709470685093373
0.0962188577405429
0.10526051065749294
0.11422114409002286
0.1231021971339834
0.1319050708799386
0.14063112973974562
0.14928170271575447
0.15785808461558032
0.16636153721522529
0.1747932903731631
0.18315454309784654
0.19144646457095527
0.1996701951285677
0.20782684720231653
0.21591750622247025
0.223943231484774
0.2319050569827826
0.23980399220731693
0.2476410229145973
0.2554171118645054
0.2631331995303682
0.27079020478156274
0.27838902554018824
0.2859305394129745
0.29341560429954144
0.3008450589780618
0.30821972366932904
0.31554040058017735
0.3228078744271551
0.33002291294130587
0.33718626735486995
0.34429867287067695
0.3513608491149636
0.3583735005743139
0.3653373170173851
0.37225297390205087
0.37912113276856246
0.3859424416193005
0.39271753528566167
0.39944703578260143
0.40613155265132