# Association Rule Analysis
Association Rule - simple I1, I2, I3, I4, I5
Association rule mining is a technique used to uncover hidden relationships between variables in large datasets. It is a popular method in data mining and machine learning and has a wide range of applications in various fields, such as market basket analysis, customer segmentation, and fraud detection.
https://www.datacamp.com/tutorial/association-rule-mining-python
Association rule mining is a technique used to identify patterns in large data sets. It involves finding relationships between variables in the data and using those relationships to make predictions or decisions. The goal of association rule mining is to uncover rules that describe the relationships between different items in the data set.
It is used to identify relationships between items that are frequently purchased together. For example, the rule "If a customer buys bread, they are also likely to buy milk" is an association rule that could be mined from this data set. We can use such rules to inform decisions about store layout, product placement, and marketing efforts.
It typically involves using algorithms to analyze the data and identify the relationships. These algorithms can be based on statistical methods or machine learning techniques. The resulting rules are often expressed in the form of "if-then" statements, where the "if" part represents the antecedent (the condition being tested) and the "then" part represents the consequent (the outcome that occurs if the condition is met).

## Use Cases
### Market Basket Analysis
One of the most well-known applications of association rule mining is in market basket analysis. This involves analyzing the items customers purchase together to understand their purchasing habits and preferences. 
For example, a retailer might use association rule mining to discover that customers who purchase diapers are also likely to purchase baby formula. We can use this information to optimize product placements and promotions to increase sales
https://www.datacamp.com/tutorial/market-basket-analysis-r

## Customer Segmentation
Association rule mining can also be used to segment customers based on their purchasing habits. 
For example, a company might use association rule mining to discover that customers who purchase certain types of products are more likely to be younger. Similarly, they could learn that customers who purchase certain combinations of products are more likely to be located in specific geographic regions. 
We can use this information to tailor marketing campaigns and personalized recommendations to specific customer segments.
https://www.datacamp.com/tutorial/introduction-customer-segmentation-python

## Fraud Detection
You can also use association rule mining to detect fraudulent activity. For example, a credit card company might use association rule mining to identify patterns of fraudulent transactions, such as multiple purchases from the same merchant within a short period of time. 
We can then use this information can to flag potentially fraudulent activity and take preventative measures to protect customers.
https://www.datacamp.com/blog/data-science-in-banking

## Social Network Analysis
Various companies use association rule mining to identify patterns in social media data that can inform the analysis of social networks. 
For example, an analysis of Twitter data might reveal that users who tweet about a particular topic are also likely to tweet about other related topics, which could inform the identification of groups or communities within the network.

## Recommendation Systems
Association rule mining can be used to suggest items that a customer might be interested in based on their past purchases or browsing history. For example, a music streaming service might use association rule mining to recommend new artists or albums to a user based on their listening history.
https://www.datacamp.com/tutorial/recommender-systems-python

## AR Algorithms : Py(apyori), R (arules)
### Apriori Alogrithm 
 is one of the most widely used algorithms for association rule mining. It works by first identifying the frequent itemsets in the dataset (itemsets that appear in a certain number of transactions). It then uses these frequent itemsets to generate association rules, which are statements of the form "if item A is purchased, then item B is also likely to be purchased." The Apriori algorithm uses a bottom-up approach, starting with individual items and gradually building up to more complex itemsets.
 
### FP-growth Algorithm 
The FP-Growth (Frequent Pattern Growth) algorithm is another popular algorithm for association rule mining. It works by constructing a tree-like structure called a FP-tree, which encodes the frequent itemsets in the dataset. The FP-tree is then used to generate association rules in a similar manner to the Apriori algorithm. The FP-Growth algorithm is generally faster than the Apriori algorithm, especially for large datasets.

### ECLAT Algorithm
The ECLAT (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a variation of the Apriori algorithm that uses a top-down approach rather than a bottom-up approach. It works by dividing the items into equivalence classes based on their support (the number of transactions in which they appear). The association rules are then generated by combining these equivalence classes in a lattice-like structure. It is a more efficient and scalable version of the Apriori algorithm.

## Apriori Algorithm
### min_sp
It starts by setting the minimum support threshold. This is the minimum number of times an item must occur in the database in order for it to be considered a frequent itemset. The algorithm then filters out any candidate itemsets that do not meet the minimum support threshold. 
### combination & their counts
The algorithm then generates a list of all possible combinations of frequent itemsets and counts the number of times each combination appears in the database. The algorithm then generates a list of association rules based on the frequent itemset combinations. 
### if-else rules
An association rule is a statement of the form "if item A is present in a transaction, then item B is also likely to be present". The strength of the association is measured using the confidence of the rule, which is the probability that item B is present given that item A is present.
### min_conf
The algorithm then filters out any association rules that do not meet a minimum confidence threshold. These rules are referred to as strong association rules. Finally, the algorithm then returns the list of strong association rules as output.

## Metrics for Evaluating Association Rules
Metrics can be used to evaluate the quality and importance of association rules and to select the most relevant rules for a given application. It is important to note that the appropriate choice of metric will depend on the specific goals and requirements of the application. Interpreting the results of association rule mining metrics requires understanding the meaning and implications of each metric, as well as how to use them to evaluate the quality and importance of the discovered association rules. 
### Support
Support is a measure of how frequently an item or itemset appears in the dataset. It is calculated as the number of transactions containing the item(s) divided by the total number of transactions in the dataset. High support indicates that an item or itemset is common in the dataset, while low support indicates that it is rare.   Support(X -> Y) = n(XUY)/N

### Confidence
Confidence is a measure of the strength of the association between two items. It is calculated as the number of transactions containing both items divided by the number of transactions containing the first item. High confidence indicates that the presence of the first item is a strong predictor of the presence of the second item. Confidence(X -> Y) = n(XUY)/n(X) , Confidence(Y -> X) = n(XUY)/n(Y)

### Lift
Lift is a measure of the strength of the association between two items, taking into account the frequency of both items in the dataset. It is calculated as the confidence of the association divided by the support of the second item. Lift is used to compare the strength of the association between two items to the expected strength of the association if the items were independent. 
Lift (X ->Y) = (n(XUY) / n(X)) / n(Y)/N

lift > 1 : Strong Association between X & Y
A lift value greater than 1 indicates that the association between two items is stronger than expected based on the frequency of the individual items. This suggests that the association may be meaningful and worth further investigation. 
A lift value less than 1 indicates that the association is weaker than expected and may be less reliable or less significant.

## Libraries in Python
### apyori
implementing the Apriori algorithm in Python. It provides functions for reading and manipulating transactional data, as well as for generating association rules and evaluating their quality
### mlxtend
implementing various machine learning algorithms and tools in Python, including association rule mining. It provides functions for reading and manipulating transactional data, as well as for generating association rules and evaluating their quality.
### pycaret
 open-source, low-code machine learning library in Python for automating machine learning workflows. It provides a wrapper on top of mlxtend for easy implementation of the Apriori algorithm.

## Libraries - Using mlxtend

In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sys
pd.set_option('display.max_columns', None)
import mlxtend
print("AR Library mlxtend present ", 'mlxtend' in sys.modules, ' Version', mlxtend.__version__)

AR Library mlxtend present  True  Version 0.21.0


### AR Libraries

In [4]:
#pip install mlxtend   #install this from anaconda prompt as Admin
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
import time
import logging

## Create Transactional Data

In [5]:
# this is list type, each [] represents 1 transaction with set of items
# if item is present its item ID is in the list
transactions = [['I1','I2','I5'],
                ['I2','I4'],
                ['I2','I3'] ,
                ['I1','I2','I4'],
                ['I1','I3'], 
                ['I2','I3'],
                ['I1','I3'], 
                ['I1','I2','I3','I5'],
                ['I1','I2','I3']]
transactions

[['I1', 'I2', 'I5'],
 ['I2', 'I4'],
 ['I2', 'I3'],
 ['I1', 'I2', 'I4'],
 ['I1', 'I3'],
 ['I2', 'I3'],
 ['I1', 'I3'],
 ['I1', 'I2', 'I3', 'I5'],
 ['I1', 'I2', 'I3']]

In [6]:
### Convert to transaction format : then only we can analyse the data
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
te_ary  # this produces no of colums as there are number of times
#with True stating item was present

array([[ True,  True, False, False,  True],
       [False,  True, False,  True, False],
       [False,  True,  True, False, False],
       [ True,  True, False,  True, False],
       [ True, False,  True, False, False],
       [False,  True,  True, False, False],
       [ True, False,  True, False, False],
       [ True,  True,  True, False,  True],
       [ True,  True,  True, False, False]])

In [7]:
te.columns_  #column order

['I1', 'I2', 'I3', 'I4', 'I5']

In [8]:
df = pd.DataFrame(te_ary, columns=te.columns_)
df  # view transactions in data frame format; True means item present

Unnamed: 0,I1,I2,I3,I4,I5
0,True,True,False,False,True
1,False,True,False,True,False
2,False,True,True,False,False
3,True,True,False,True,False
4,True,False,True,False,False
5,False,True,True,False,False
6,True,False,True,False,False
7,True,True,True,False,True
8,True,True,True,False,False


In [9]:
#this matrix of transactions : T/ F indicate their presence in each Trans ID
df.shape

(9, 5)

In [10]:
#get back orginal transactions
orgtrans1 = te_ary[:]
te.inverse_transform(orgtrans1)

[['I1', 'I2', 'I5'],
 ['I2', 'I4'],
 ['I2', 'I3'],
 ['I1', 'I2', 'I4'],
 ['I1', 'I3'],
 ['I2', 'I3'],
 ['I1', 'I3'],
 ['I1', 'I2', 'I3', 'I5'],
 ['I1', 'I2', 'I3']]

## Frequent Item Set

In [11]:
#%%% #frequent itemsets - Most Imp Step
# first find frequent items in different combinations and perform AR analysis
support_threshold = 0.01 #.01
.1 * 100
.1 * 9
#https://github.com/rasbt/mlxtend/blob/master/mlxtend/frequent_patterns/apriori.py
frequent_itemsets = apriori(df, min_support= support_threshold, use_colnames = True)
frequent_itemsets
#itemset and its support value with condition of min support value
# .44 for (I1, I2) means I1 + I2 are present together in 44% of the transactions

Unnamed: 0,support,itemsets
0,0.666667,(I1)
1,0.777778,(I2)
2,0.666667,(I3)
3,0.222222,(I4)
4,0.222222,(I5)
5,0.444444,"(I1, I2)"
6,0.444444,"(I3, I1)"
7,0.111111,"(I4, I1)"
8,0.222222,"(I5, I1)"
9,0.444444,"(I3, I2)"


In [12]:
print(frequent_itemsets) #dataframe with the itemsets

     support          itemsets
0   0.666667              (I1)
1   0.777778              (I2)
2   0.666667              (I3)
3   0.222222              (I4)
4   0.222222              (I5)
5   0.444444          (I1, I2)
6   0.444444          (I3, I1)
7   0.111111          (I4, I1)
8   0.222222          (I5, I1)
9   0.444444          (I3, I2)
10  0.222222          (I4, I2)
11  0.222222          (I5, I2)
12  0.111111          (I3, I5)
13  0.222222      (I3, I1, I2)
14  0.111111      (I4, I1, I2)
15  0.222222      (I5, I1, I2)
16  0.111111      (I3, I1, I5)
17  0.111111      (I3, I2, I5)
18  0.111111  (I3, I1, I2, I5)


In [13]:
#help(association_rules)

## Association Rules

### Support
No of occurances of item(itemset) / Total Transactions
Higher the value, more frequent the items were bought in the combination

In [14]:
#output - DF with antecedents -> consequent
supportRules3 = association_rules(frequent_itemsets, metric="support", min_threshold = .4)
print(supportRules3)
# 5th column (support)
# .44 for 0 row states - I1 + I2 present together in 44% of transactions
# above functions with min_threshold will list only those items with support value > .4

  antecedents consequents  antecedent support  consequent support   support  \
0        (I1)        (I2)            0.666667            0.777778  0.444444   
1        (I2)        (I1)            0.777778            0.666667  0.444444   
2        (I3)        (I1)            0.666667            0.666667  0.444444   
3        (I1)        (I3)            0.666667            0.666667  0.444444   
4        (I3)        (I2)            0.666667            0.777778  0.444444   
5        (I2)        (I3)            0.777778            0.666667  0.444444   

   confidence      lift  leverage  conviction  
0    0.666667  0.857143 -0.074074    0.666667  
1    0.571429  0.857143 -0.074074    0.777778  
2    0.666667  1.000000  0.000000    1.000000  
3    0.666667  1.000000  0.000000    1.000000  
4    0.666667  0.857143 -0.074074    0.666667  
5    0.571429  0.857143 -0.074074    0.777778  


In [15]:
supportRules3.head()

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
0,(I1),(I2),0.666667,0.777778,0.444444,0.666667,0.857143,-0.074074,0.666667
1,(I2),(I1),0.777778,0.666667,0.444444,0.571429,0.857143,-0.074074,0.777778
2,(I3),(I1),0.666667,0.666667,0.444444,0.666667,1.0,0.0,1.0
3,(I1),(I3),0.666667,0.666667,0.444444,0.666667,1.0,0.0,1.0
4,(I3),(I2),0.666667,0.777778,0.444444,0.666667,0.857143,-0.074074,0.666667


In [16]:
print(supportRules3[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
# Generally support, confidence and lift are used

  antecedents consequents   support  confidence      lift
0        (I1)        (I2)  0.444444    0.666667  0.857143
1        (I2)        (I1)  0.444444    0.571429  0.857143
2        (I3)        (I1)  0.444444    0.666667  1.000000
3        (I1)        (I3)  0.444444    0.666667  1.000000
4        (I3)        (I2)  0.444444    0.666667  0.857143
5        (I2)        (I3)  0.444444    0.571429  0.857143


In [17]:
### Support with different threshold values

In [18]:
supportRules2 = association_rules(frequent_itemsets, metric="support", min_threshold = .2)
print(supportRules2[['antecedents', 'consequents', 'support', 'confidence','lift']])

   antecedents consequents   support  confidence      lift
0         (I1)        (I2)  0.444444    0.666667  0.857143
1         (I2)        (I1)  0.444444    0.571429  0.857143
2         (I3)        (I1)  0.444444    0.666667  1.000000
3         (I1)        (I3)  0.444444    0.666667  1.000000
4         (I5)        (I1)  0.222222    1.000000  1.500000
5         (I1)        (I5)  0.222222    0.333333  1.500000
6         (I3)        (I2)  0.444444    0.666667  0.857143
7         (I2)        (I3)  0.444444    0.571429  0.857143
8         (I4)        (I2)  0.222222    1.000000  1.285714
9         (I2)        (I4)  0.222222    0.285714  1.285714
10        (I5)        (I2)  0.222222    1.000000  1.285714
11        (I2)        (I5)  0.222222    0.285714  1.285714
12    (I3, I1)        (I2)  0.222222    0.500000  0.642857
13    (I3, I2)        (I1)  0.222222    0.500000  0.750000
14    (I1, I2)        (I3)  0.222222    0.500000  0.750000
15        (I3)    (I1, I2)  0.222222    0.333333  0.7500

### Confidence

In [19]:
#%%%% Confidence
confidence6 = association_rules(frequent_itemsets, metric="confidence", min_threshold=.6)
#print(confidence6)
print(confidence6[['antecedents', 'consequents', 'support','confidence']])

     antecedents consequents   support  confidence
0           (I1)        (I2)  0.444444    0.666667
1           (I3)        (I1)  0.444444    0.666667
2           (I1)        (I3)  0.444444    0.666667
3           (I5)        (I1)  0.222222    1.000000
4           (I3)        (I2)  0.444444    0.666667
5           (I4)        (I2)  0.222222    1.000000
6           (I5)        (I2)  0.222222    1.000000
7       (I4, I1)        (I2)  0.111111    1.000000
8       (I5, I1)        (I2)  0.222222    1.000000
9       (I5, I2)        (I1)  0.222222    1.000000
10          (I5)    (I1, I2)  0.222222    1.000000
11      (I3, I5)        (I1)  0.111111    1.000000
12      (I3, I5)        (I2)  0.111111    1.000000
13  (I3, I1, I5)        (I2)  0.111111    1.000000
14  (I3, I2, I5)        (I1)  0.111111    1.000000
15      (I3, I5)    (I1, I2)  0.111111    1.000000


### Lift

In [20]:
#%%%% Lift  : generally > 1 for strong associations
lift1 = association_rules(frequent_itemsets, metric="lift", min_threshold=1)
#print(lift1)
print(lift1[['antecedents', 'consequents', 'support', 'lift', 'confidence']])

     antecedents   consequents   support      lift  confidence
0           (I3)          (I1)  0.444444  1.000000    0.666667
1           (I1)          (I3)  0.444444  1.000000    0.666667
2           (I5)          (I1)  0.222222  1.500000    1.000000
3           (I1)          (I5)  0.222222  1.500000    0.333333
4           (I4)          (I2)  0.222222  1.285714    1.000000
5           (I2)          (I4)  0.222222  1.285714    0.285714
6           (I5)          (I2)  0.222222  1.285714    1.000000
7           (I2)          (I5)  0.222222  1.285714    0.285714
8       (I4, I1)          (I2)  0.111111  1.285714    1.000000
9       (I1, I2)          (I4)  0.111111  1.125000    0.250000
10          (I4)      (I1, I2)  0.111111  1.125000    0.500000
11          (I2)      (I4, I1)  0.111111  1.285714    0.142857
12      (I5, I1)          (I2)  0.222222  1.285714    1.000000
13      (I5, I2)          (I1)  0.222222  1.500000    1.000000
14      (I1, I2)          (I5)  0.222222  2.250000    0

In [21]:
# Lift with different threshold
lift2 = association_rules(frequent_itemsets, metric="lift", min_threshold=2)
#print(lift2)  #high positive correlation
print(lift2[['antecedents', 'consequents', 'support', 'lift', 'confidence']])

    antecedents   consequents   support  lift  confidence
0      (I1, I2)          (I5)  0.222222  2.25        0.50
1          (I5)      (I1, I2)  0.222222  2.25        1.00
2  (I3, I1, I2)          (I5)  0.111111  2.25        0.50
3      (I3, I5)      (I1, I2)  0.111111  2.25        1.00
4      (I1, I2)      (I3, I5)  0.111111  2.25        0.25
5          (I5)  (I3, I1, I2)  0.111111  2.25        0.50


In [22]:
#Confidence and Support Threshold Combined
#twin condition : lift> 2;  confidence > .5, support > .2
lift2[(lift2.confidence > .5) & (lift2.support > .2)]

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction
1,(I5),"(I1, I2)",0.222222,0.444444,0.222222,1.0,2.25,0.123457,inf


#### Different Threshold Values

In [23]:
#min support =.3
association_rules(frequent_itemsets, metric="support", min_threshold = .3)[
    [ 'antecedents','consequents','support', 'confidence','lift']]

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(I1),(I2),0.444444,0.666667,0.857143
1,(I2),(I1),0.444444,0.571429,0.857143
2,(I3),(I1),0.444444,0.666667,1.0
3,(I1),(I3),0.444444,0.666667,1.0
4,(I3),(I2),0.444444,0.666667,0.857143
5,(I2),(I3),0.444444,0.571429,0.857143


In [24]:
#min lift =1 
association_rules(frequent_itemsets, metric="lift", min_threshold = 1)[
    [ 'antecedents','consequents','support', 'confidence','lift']]

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(I3),(I1),0.444444,0.666667,1.0
1,(I1),(I3),0.444444,0.666667,1.0
2,(I5),(I1),0.222222,1.0,1.5
3,(I1),(I5),0.222222,0.333333,1.5
4,(I4),(I2),0.222222,1.0,1.285714
5,(I2),(I4),0.222222,0.285714,1.285714
6,(I5),(I2),0.222222,1.0,1.285714
7,(I2),(I5),0.222222,0.285714,1.285714
8,"(I4, I1)",(I2),0.111111,1.0,1.285714
9,"(I1, I2)",(I4),0.111111,0.25,1.125


In [25]:
#min confidence =.6 
association_rules(frequent_itemsets, metric="confidence", min_threshold = .6)[
    ['antecedents','consequents','support', 'confidence','lift']]

Unnamed: 0,antecedents,consequents,support,confidence,lift
0,(I1),(I2),0.444444,0.666667,0.857143
1,(I3),(I1),0.444444,0.666667,1.0
2,(I1),(I3),0.444444,0.666667,1.0
3,(I5),(I1),0.222222,1.0,1.5
4,(I3),(I2),0.444444,0.666667,0.857143
5,(I4),(I2),0.222222,1.0,1.285714
6,(I5),(I2),0.222222,1.0,1.285714
7,"(I4, I1)",(I2),0.111111,1.0,1.285714
8,"(I5, I1)",(I2),0.222222,1.0,1.285714
9,"(I5, I2)",(I1),0.222222,1.0,1.5


### Part-1 Over : Interpret the results 

In [26]:
### Analysis

In [27]:
frequent_itemsets = apriori(df, min_support=0.2, use_colnames = True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.666667,(I1)
1,0.777778,(I2)
2,0.666667,(I3)
3,0.222222,(I4)
4,0.222222,(I5)
5,0.444444,"(I1, I2)"
6,0.444444,"(I3, I1)"
7,0.222222,"(I5, I1)"
8,0.444444,"(I3, I2)"
9,0.222222,"(I4, I2)"


In [28]:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'I1', 'I2'} ]

Unnamed: 0,support,itemsets
5,0.444444,"(I1, I2)"


In [29]:
frequent_itemsets[ frequent_itemsets['itemsets'] == {'I1'} ]

Unnamed: 0,support,itemsets
0,0.666667,(I1)


In [30]:
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply(lambda x: len(x))
frequent_itemsets

Unnamed: 0,support,itemsets,length
0,0.666667,(I1),1
1,0.777778,(I2),1
2,0.666667,(I3),1
3,0.222222,(I4),1
4,0.222222,(I5),1
5,0.444444,"(I1, I2)",2
6,0.444444,"(I3, I1)",2
7,0.222222,"(I5, I1)",2
8,0.444444,"(I3, I2)",2
9,0.222222,"(I4, I2)",2


In [31]:
frequent_itemsets[ (frequent_itemsets['length'] >= 1) & (frequent_itemsets[ 'support'] >= 0.3) ]

Unnamed: 0,support,itemsets,length
0,0.666667,(I1),1
1,0.777778,(I2),1
2,0.666667,(I3),1
5,0.444444,"(I1, I2)",2
6,0.444444,"(I3, I1)",2
8,0.444444,"(I3, I2)",2


In [32]:
frequent_itemsets[ (frequent_itemsets['length'] == 2) & (frequent_itemsets[ 'support'] >= 0.3) ]

Unnamed: 0,support,itemsets,length
5,0.444444,"(I1, I2)",2
6,0.444444,"(I3, I1)",2
8,0.444444,"(I3, I2)",2


##  Another Example with Item Names

Links

http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/apriori/
https://www.kaggle.com/datatheque/association-rules-mining-market-basket-analysis
http://rasbt.github.io/mlxtend/user_guide/frequent_patterns/association_rules/

- summary
metric - support, confidence, lift
frequent item set, rule (threshold - sp, conf, lift)
X->Y   : which rules interesting
Combo plan, relayout, discount, ad, recommendation system

transactions = [['Bread','Butter','Jam'],['Butter','Cheese'],['Butter','Egg'] ,['Bread','Butter','Cheese'],['Bread','Egg'], ['Butter','Egg'],['Bread','Egg'], ['Bread','Butter','Egg','Jam'],['Bread','Butter','Egg']]
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
te_ary
te.columns_
df = pd.DataFrame(te_ary, columns=te.columns_)
df

#https://pypi.org/project/efficient-apriori/
#https://stackabuse.com/association-rule-mining-via-apriori-algorithm-in-python
pip install apyori
from apyori import apriori
association_rules = apriori(df, min_support=0.0045, min_confidence=0.2, min_lift=3, min_length=2)
association_results = list(association_rules)
association_results
print(len(association_results))
print(association_results[0])
for item in association_results:
    # first index of the inner list
    # Contains base item and add item
    pair = item[0] 
    items = [x for x in pair]
    print("Rule: " + items[0] + " -> " + items[1])

    #second index of the inner list
    print("Support: " + str(item[1]))

    #third index of the list located at 0th
    #of the third index of the inner list

    print("Confidence: " + str(item[2][0][2]))
    print("Lift: " + str(item[2][0][3]))
    print("=====================================")

    

#%%%method3  : under draft
#https://pypi.org/project/efficient-apriori/
#pip install efficient_apriori
from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),   ('eggs', 'bacon', 'apple'), ('soup', 'bacon', 'banana')]
transactions
itemsets, rules = apriori(transactions, min_support=0.5, min_confidence=1)
print(rules)  # [{eggs} -> {bacon}, {soup} -> {bacon}]

itemsets, rules = apriori(transactions, min_support=0.2, min_confidence=1)
### Print out every rule with 2 items on the left hand side,
### 1 item on the right hand side, sorted by lift
rules_rhs = filter(lambda rule: len(rule.lhs) == 2 and len(rule.rhs) == 1, rules)
for rule in sorted(rules_rhs, key=lambda rule: rule.lift):
  print(rule)  # Prints the rule and its confidence, support, lift, ...

#with ids
from efficient_apriori import apriori
transactions = [('eggs', 'bacon', 'soup'),   ('eggs', 'bacon', 'apple'), ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, output_transaction_ids=True)
print(itemsets)

transactions
help(apriori)
itemsets2, rules2 = apriori(transactions, min_support=0.2, min_confidence = .3)
itemsets2
rules2

### Print out every rule with 1 items on the left hand side,1 item on the right hand side, sorted by lift
rules_rhs = filter(lambda rule: len(rule.lhs) == 1 and len(rule.rhs) == 1, rules2)
rules_rhs
for rule in sorted(rules_rhs, key=lambda rule: rule.lift):  print(rule) 
### Prints the rule and its confidence, support, lift, ...
### Print out every rule with 2 items on the left hand side,

#%%%

transactions = [['I1','I2','I5'],['I2','I4'],['I2','I3'] ,['I1','I2','I4'],['I1','I3'], ['I2','I3'],['I1','I3'], ['I1','I2','I3','I5'],['I1','I2','I3']]
transactions
#----
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
te_ary
te.columns_
df = pd.DataFrame(te_ary, columns=te.columns_)
df
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori
support_threshold = 0.01
frequent_itemsets = apriori(df, min_support= support_threshold, use_colnames = True)
frequent_itemsets
rules4 = association_rules(frequent_itemsets, metric="lift", min_threshold =1.2)
rules4
#no of items - left and right side
rules4["ant_len"] = rules4["antecedents"].apply(lambda x: len(x))
rules4
rules4["con_len"] = rules4["consequents"].apply(lambda x: len(x))
rules4
rules4[(rules4['ant_len'] >= 1) & (rules4['confidence'] > 0.75) & (rules4['lift'] > 1.2) ]
rules4[rules4['antecedents'] == {'I1','I2'}]

#%%%
transactions = [['I1','I2','I5'],['I2','I4'],['I2','I3'] ,['I1','I2','I4'],['I1','I3'], ['I2','I3'],['I1','I3'], ['I1','I2','I3','I5'],['I1','I2','I3']]
transactions

from efficient_apriori import apriori
#transactions = [('eggs', 'bacon', 'soup'),   ('eggs', 'bacon', 'apple'), ('soup', 'bacon', 'banana')]
itemsets, rules = apriori(transactions, output_transaction_ids=True)
print(itemsets)
itemsets, rules = apriori(transactions, min_support=0.4, min_confidence=.6)
print(rules)