# Association analysis

**Question 1**

---


Let  L2 = {{1,2},{1,5},{2,3},{3,4},{3,5}}. Compute the set of candidates  C3  that is obtained by joining every pair of joinable itemsets from  L2 .

**Solution**

---

Apriori uses an iterative approach known as a level-wise search, where k-itemsets are used to explore (k+1)-itemsets. The advantage of this algorithm is that uses the Apriori property, which tells that all nonempty subsets of a frequent itemset must also be frequent. In prune step it eliminates the subsets that are having a support value less than minimum threshold, thus reducing the search space.


**Question 2**

---

Let 𝑆1 denote the support of the association rule {popcorn, soda}⇒{movie}. Let 𝑆2 denote the support of the association rule {popcorn}⇒{movie}. What is the relationship between 𝑆1 and 𝑆2?

**Solution**

---

Consider the itemset NA = {popcorn, movie}, itemset NB = {popcorn, soda, movie} and N = Total. In the transaction dataset, NA is a subset of NB. So, the support count NA >= NB. 
 
SA = NA / N and SB = NB / N. So, SA >= SB 


**Question 3**

---


What is the support of the rule {}⇒{Kidney Beans} in the transaction dataset used in the tutorial presented above?


**Solution**

---



In [1]:
dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],
           ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],
           ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],
           ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]

from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit_transform(dataset)
print(te_ary)

[[False False False  True False  True  True  True  True False  True]
 [False False  True  True False  True False  True  True False  True]
 [ True False False  True False  True  True False False False False]
 [False  True False False False  True  True False False  True  True]
 [False  True False  True  True  True False False  True False False]]


In [2]:
import pandas as pd

df = pd.DataFrame(te_ary, columns=te.columns_)
display(df)

Unnamed: 0,Apple,Corn,Dill,Eggs,Ice cream,Kidney Beans,Milk,Nutmeg,Onion,Unicorn,Yogurt
0,False,False,False,True,False,True,True,True,True,False,True
1,False,False,True,True,False,True,False,True,True,False,True
2,True,False,False,True,False,True,True,False,False,False,False
3,False,True,False,False,False,True,True,False,False,True,True
4,False,True,False,True,True,True,False,False,True,False,False


By viewing the transaction dataset, we can tell that every transaction in the dataset contains ‘Kidney Beans’. So, 
 
Support of ‘Kidney Beans’ rule is: 5 / 5 = 1 

**Question 4**

---


In the transaction dataset used in the tutorial presented above, what is the maximum length of a frequent itemset for a support threshold of 0.2?

**Solution**

---



In [3]:
from mlxtend.frequent_patterns import apriori

#frequent itemsets with support threshold 0.2
frequent_itemsets = apriori(df, min_support=0.2) 
frequent_itemsets['length'] = frequent_itemsets['itemsets'].apply (lambda x: len(x)) #length of each frozenset 
print('Maximum length of a frequent itemset:') 
display(max(frequent_itemsets['length'])) 


Maximum length of a frequent itemset:


6

**Question 5**

---


Implement a function that receives a DataFrame of frequent itemsets and a strong association rule (represented by a frozenset of antecedents and a frozenset of consequents). This function should return the corresponding Kulczynski measure.


**Solution**

---



In [9]:
frequent_itemsets = apriori(df, use_colnames=True)

def kulczynski_measure(frequent_itemsets, frozen_antecedents, frozen_consequents):
  items = frozenset(frozen_antecedents).union(frozenset(frozen_consequents)) 

  #antecedents support
  supp_left = frequent_itemsets[frequent_itemsets['itemsets']==frozen_antecedents]['support'].iloc[0]

  #consequents support
  supp_right = frequent_itemsets[frequent_itemsets['itemsets']==frozen_consequents]['support'].iloc[0]

  #frequent itemset support
  support = frequent_itemsets[frequent_itemsets['itemsets']==items 
  ]['support'].iloc[0]
  kulc = ((support / supp_left) + (support / supp_right)) / 2 
  kulc_value = print('Kulczynski measure is: {}'.format(kulc))
 
  return kulc_value

kulczynski_measure(frequent_itemsets, frozenset({'Onion'}), frozenset({'Kidney Beans'}))

Kulczynski measure is: 0.8


**Question 6**

---


Implement a function that receives a DataFrame of frequent itemsets and a strong association rule (represented by a frozenset of antecedents and a frozenset of consequents). This function should return the corresponding imbalance ratio. 



**Solution**

---



In [11]:
def imbalance_ratio(frequent_itemsets, frozen_antecedents, frozen_consequents):

  items = frozenset(frozen_antecedents).union(frozenset(frozen_consequents)) 

  #number of transactions
  itemsets = 5

  #support count of antecedents 
  suppCount_left = itemsets * (frequent_itemsets[frequent_itemsets 
  ['itemsets']==frozen_antecedents]['support'].iloc[0]) 

  #support count of consequents
  suppCount_right = itemsets * (frequent_itemsets[frequent_itemsets['itemsets']==frozen_consequents]['support'].iloc[0])

  #support count of frequent itemset
  supportCount = itemsets * (frequent_itemsets[frequent_itemsets['itemsets']==items]['support'].iloc[0]) 
  ir = (abs(suppCount_left - suppCount_right) / (suppCount_left + 
  suppCount_right - supportCount))
  ir_value = print('Imbalance ratio is: {}'.format(ir))
 
  return ir_value

imbalance_ratio(frequent_itemsets, frozenset({'Onion'}), frozenset({'Kidney Beans'})) 

Imbalance ratio is: 0.4
