# Simple Affinity Analysis

<p>sources: <em>Python: Real-World Data Science [pg.635 - 643]</em></p>

<p>data source: <a href='./affinity_dataset.txt'>affinity_dataset</a></p>

In [35]:
from collections import defaultdict
from operator import itemgetter
import numpy as np

In [43]:
# encode binary features into English
features = ['bread', 'milk', 'cheese', 'apples', 'bananas']

# load dataset
X = np.loadtxt('affinity_dataset.txt')

n_samples, n_features = X.shape

A view of the dataset

In [15]:
print(X[:5])

[[0. 1. 0. 0. 0.]
 [1. 1. 0. 0. 0.]
 [0. 0. 1. 0. 1.]
 [1. 1. 0. 0. 0.]
 [0. 0. 1. 1. 1.]]


In [36]:
valid_rules   = defaultdict(int)
invalid_rules = defaultdict(int)
occurrences   = defaultdict(int)
confidence    = defaultdict(float)

A helper function for displaying support and confidence.

In [39]:
def calculate(sorted_X):
    for index in range(n_features):
        print(f'Rule: #{index + 1}')
        premise, conclusion = sorted_X[index][0]
        print_rule(premise, conclusion, support, confidence, features)
        print()

Here we calculate affinity.

In [46]:
for sample in X:
    for premise in range(n_features):
        if not bool(sample[premise]): continue
        occurrences[premise] += 1
        for conclusion in range(n_features):
            if premise is conclusion: continue
            elif sample[conclusion]:
                valid_rules[(premise, conclusion)] += 1
            else:
                invalid_rules[(premise, conclusion)] += 1

Calculating support and confidence for the rules.

In [45]:
support = valid_rules
for premise, conclusion in valid_rules.keys():
    rule = (premise, conclusion)
    confidence[rule] = valid_rules[rule] / occurrences[premise]

Helper function for displaying rules, support, and confidence

In [40]:
def print_rule(premise, conclusion, support, confidence, features):
    premise_name = features[premise]
    conclusion_name = features[conclusion]
    s = support[(premise, conclusion)]
    print(f'Rule: If a person buys {premise_name}, then they will also buy {conclusion_name}')
    print(f' - Support: {s}')
    print(' - Confidence: {0:.3f}'.format(confidence[(premise, conclusion)]))

First 5 rules, ordered by support

In [41]:
sorted_support = sorted(support.items(), key=itemgetter(1), reverse=True)
calculate(sorted_support)

Rule: #1
Rule: If a person buys apples, then they will also buy bananas
 - Support: 27
 - Confidence: 0.628

Rule: #2
Rule: If a person buys bananas, then they will also buy apples
 - Support: 27
 - Confidence: 0.474

Rule: #3
Rule: If a person buys milk, then they will also buy bananas
 - Support: 27
 - Confidence: 0.519

Rule: #4
Rule: If a person buys bananas, then they will also buy milk
 - Support: 27
 - Confidence: 0.474

Rule: #5
Rule: If a person buys cheese, then they will also buy apples
 - Support: 22
 - Confidence: 0.564



First 5 rules, ordered by confidence

In [42]:
sorted_confidence = sorted(confidence.items(), key=itemgetter(1), reverse=True)
calculate(sorted_confidence)

Rule: #1
Rule: If a person buys apples, then they will also buy bananas
 - Support: 27
 - Confidence: 0.628

Rule: #2
Rule: If a person buys bread, then they will also buy bananas
 - Support: 16
 - Confidence: 0.571

Rule: #3
Rule: If a person buys cheese, then they will also buy apples
 - Support: 22
 - Confidence: 0.564

Rule: #4
Rule: If a person buys milk, then they will also buy bananas
 - Support: 27
 - Confidence: 0.519

Rule: #5
Rule: If a person buys cheese, then they will also buy bananas
 - Support: 20
 - Confidence: 0.513

