# Market Basket Analysis (MBA)
is a data mining technique used to identify patterns or relationships between items purchased together in a transaction. It helps businesses analyze customer purchasing behavior by finding associations between products, commonly represented through rules like "If a customer buys item A, they are likely to buy item B."

In [19]:
import pandas as pd

from mlxtend.frequent_patterns import apriori, association_rules

# Sample transactional data
data = {'Milk': [1, 1, 0, 1, 0],
        'Bread': [1, 1, 1, 0, 1],
        'Butter': [0, 1, 1, 1, 1],
        'Eggs': [1, 0, 0, 1, 1]}

# Create a DataFrame
df = pd.DataFrame(data)

# Generate frequent itemsets
frequent_itemsets = apriori(df, min_support=0.4, use_colnames=True)

print(frequent_itemsets)


   support         itemsets
0      0.6           (Milk)
1      0.8          (Bread)
2      0.8         (Butter)
3      0.6           (Eggs)
4      0.4    (Bread, Milk)
5      0.4   (Butter, Milk)
6      0.4     (Eggs, Milk)
7      0.6  (Bread, Butter)
8      0.4    (Bread, Eggs)
9      0.4   (Eggs, Butter)




In [21]:
df.head()

Unnamed: 0,Milk,Bread,Butter,Eggs
0,1,1,0,1
1,1,1,1,0
2,0,1,1,0
3,1,0,1,1
4,0,1,1,1


In [23]:
rules = association_rules(frequent_itemsets, num_itemsets=50,metric="lift", min_threshold=1.0)

# Display rules
print(rules)


  antecedents consequents  antecedent support  consequent support  support  \
0      (Eggs)      (Milk)                 0.6                 0.6      0.4   
1      (Milk)      (Eggs)                 0.6                 0.6      0.4   

   confidence      lift  representativity  leverage  conviction  \
0    0.666667  1.111111               1.0      0.04         1.2   
1    0.666667  1.111111               1.0      0.04         1.2   

   zhangs_metric  jaccard  certainty  kulczynski  
0           0.25      0.5   0.166667    0.666667  
1           0.25      0.5   0.166667    0.666667  


In [25]:
rules = association_rules(frequent_itemsets, num_itemsets=50,metric="confidence", min_threshold=1.0)

# Display rules
print(rules)

Empty DataFrame
Columns: [antecedents, consequents, antecedent support, consequent support, support, confidence, lift, representativity, leverage, conviction, zhangs_metric, jaccard, certainty, kulczynski]
Index: []


In [31]:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Define dataset
data = {'Milk': [1, 1, 0, 1, 0],
        'Bread': [1, 1, 1, 0, 1],
        'Butter': [0, 1, 1, 1, 1],
        'Eggs': [1, 0, 0, 1, 1]}

# Convert to DataFrame
df = pd.DataFrame(data)

# Generate frequent itemsets using apriori algorithm
frequent_itemsets = apriori(df, min_support=0.4, use_colnames=True)

# Generate association rules
rules = association_rules(frequent_itemsets,num_itemsets=20, metric="confidence", min_threshold=0.6)

# Add additional metrics (e.g., Zhang's metric, Jaccard, Kulczynski)
rules['zhangs_metric'] = (rules['confidence'] - rules['consequent support']) / (1 - rules['consequent support'])
rules['jaccard'] = rules['support'] / (rules['antecedent support'] + rules['consequent support'] - rules['support'])
rules['kulczynski'] = 0.5 * (rules['confidence'] + rules['support'] / rules['consequent support'])

# Display the results
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift', 
             'leverage', 'conviction', 'zhangs_metric', 'jaccard', 'kulczynski']])


  antecedents consequents  support  confidence      lift  leverage  \
0      (Milk)     (Bread)      0.4    0.666667  0.833333     -0.08   
1      (Milk)    (Butter)      0.4    0.666667  0.833333     -0.08   
2      (Eggs)      (Milk)      0.4    0.666667  1.111111      0.04   
3      (Milk)      (Eggs)      0.4    0.666667  1.111111      0.04   
4     (Bread)    (Butter)      0.6    0.750000  0.937500     -0.04   
5    (Butter)     (Bread)      0.6    0.750000  0.937500     -0.04   
6      (Eggs)     (Bread)      0.4    0.666667  0.833333     -0.08   
7      (Eggs)    (Butter)      0.4    0.666667  0.833333     -0.08   

   conviction  zhangs_metric  jaccard  kulczynski  
0         0.6      -0.666667      0.4    0.583333  
1         0.6      -0.666667      0.4    0.583333  
2         1.2       0.166667      0.5    0.666667  
3         1.2       0.166667      0.5    0.666667  
4         0.8      -0.250000      0.6    0.750000  
5         0.8      -0.250000      0.6    0.750000  
6    



Column Explanations
# 1 antecedents:

The item(s) on the left-hand side (LHS) of the rule.
Example: In row 0, the antecedent is (Eggs).

# 2 consequents:

The item(s) on the right-hand side (RHS) of the rule.
Example: In row 0, the consequent is (Milk).

# 3 antecedent support:

The proportion of transactions containing the antecedent.
Example: For (Eggs), the support is 
0.6
0.6 (60% of transactions).

# 4 consequent support:

The proportion of transactions containing the consequent.
Example: For (Milk), the support is 
0.6
0.6.

 # 5 support:

The proportion of transactions containing both the antecedent and the consequent (i.e., their union).
Example: For (Eggs → Milk), the support is 
0.4
0.4 (40% of transactions).

# 6 confidence:

Measures how often the consequent appears when the antecedent is present.
Formula:
Confidence
=
Support
(
𝐴
∪
𝐵
)
Support
(
𝐴
)
Confidence= 
Support(A)
Support(A∪B)
​
 
Example: For (Eggs → Milk), confidence is:
0.4
0.6
=
0.6667
 
(
66.67
%
)
0.6
0.4
​
 =0.6667(66.67%)
 
# 7 lift:

Measures how much more likely the consequent occurs given the antecedent, compared to random chance.
Formula:
Lift
=
Confidence
(
𝐴
→
𝐵
)
Support
(
𝐵
)
Lift= 
Support(B)
Confidence(A→B)
​
 
Example: For (Eggs → Milk), lift is:
0.6667
0.6
=
1.1111
0.6
0.6667
​
 =1.1111
Interpretation: A lift of 
1.11
1.11 suggests a weak positive association.

# 8 representativity:

Reflects how representative the rule is compared to the entire dataset. Often normalized to a scale between 0 and 1.

# 9 leverage:

Measures the difference between the observed support of 
𝐴
∪
𝐵
A∪B and what would be expected if 
𝐴
A and 
𝐵
B were independent.
Formula:
Leverage
=
Support
(
𝐴
∪
𝐵
)
−
Support
(
𝐴
)
⋅
Support
(
𝐵
)
Leverage=Support(A∪B)−Support(A)⋅Support(B)
Example: For (Eggs → Milk):
0.4
−
(
0.6
×
0.6
)
=
0.04
0.4−(0.6×0.6)=0.04
Interpretation: A leverage of 
0.04
0.04 suggests a small positive association.

# 10 conviction:

Measures the strength of implication, factoring in how often the rule would be wrong.
Formula:
Conviction
=
1
−
Support
(
𝐵
)
1
−
Confidence
(
𝐴
→
𝐵
)
Conviction= 
1−Confidence(A→B)
1−Support(B)
​
 
Example: For (Eggs → Milk):
1
−
0.6
1
−
0.6667
=
1.2
1−0.6667
1−0.6
​
 =1.2
Interpretation: A conviction of 
1.2
1.2 indicates a weak positive dependence.

# 11 zhangs_metric:

Measures the dependence between 
𝐴
A and 
𝐵
B, ranging between 
−
1
−1 (negative dependence) and 
+
1
+1 (positive dependence). A value of 
0.25
0.25 indicates mild positive dependence.

# 12 jaccard:

Measures the similarity between antecedent and consequent as the ratio of their intersection to their union.
Formula:
Jaccard
=
Support
(
𝐴
∪
𝐵
)
Support
(
𝐴
)
+
Support
(
𝐵
)
−
Support
(
𝐴
∪
𝐵
)
Jaccard= 
Support(A)+Support(B)−Support(A∪B)
Support(A∪B)
​
 
Example: For (Eggs → Milk):
0.4
0.6
+
0.6
−
0.4
=
0.5
0.6+0.6−0.4
0.4
​
 =0.5
 
# 13 certainty:

A metric similar to confidence, used in certain frameworks. It’s unclear without more context but often measures rule certainty or predictability.

# 14 Kaczynski:

An average of the confidence of the rule 
𝐴
→
𝐵
A→B and its inverse 
𝐵
→
𝐴
B→A.
Formula:
Kulczynski
=
1
2
(
Support
(
𝐴
∪
𝐵
)
Support
(
𝐴
)
+
Support
(
𝐴
∪
𝐵
)
Support
(
𝐵
)
)
Kulczynski= 
2
1
​
 ( 
Support(A)
Support(A∪B)
​
 + 
Support(B)
Support(A∪B)
​
 )
Example: For (Eggs → Milk):
1
2
(
0.4
0.6
+
0.4
0.6
)
=
0.6667
2
1
​
 ( 
0.6
0.4
​
 + 
0.6
0.4
​
 )=0.6667