# Affinity Analysis

What is affinity analysis? A type of data mining that gives similarity between samples (objects). This could be similarity between:

- users on website, in order to provide service or targeted adverstising
- items to sell to those users, in order to provide recommended movies or products
- human genes, in order to find people that share the same ancestor

In [209]:
import numpy as np

from collections import namedtuple, defaultdict

In [210]:
dataset_filepath = 'affinity_dataset.txt'

In [211]:
X = np.loadtxt(dataset_filepath, dtype=np.int)
X[:5]

array([[0, 0, 1, 1, 1],
       [1, 1, 0, 1, 0],
       [1, 0, 1, 1, 0],
       [0, 0, 1, 1, 1],
       [0, 1, 0, 0, 1]])

In [212]:
# ^ Each row represents a transaction.
# Each column represents an item. They are [bread, milk, cheese, apples and bananas].

In [213]:
samples, features = X.shape
f'This dataset has {samples} samples and {features} features'

'This dataset has 100 samples and 5 features'

In [214]:
# Our enum.
features = 'bread milk cheese apple banana'.split(' ')
Item = namedtuple('Item', features)
item = Item(*range(5))

In [215]:
# Find the number of users that purchased apple.

num_apples_purchased = len(X[X[:, item.apple] == 1])
f'{num_apples_purchased} people bought apples'

'36 people bought apples'

In [216]:
# How many of the cases that a person bought Apples involved the people purchasing banana too?
purchased_apple = X[X[:, item.apple] == 1]
purchased_apple_and_banana = purchased_apple[purchased_apple[:, item.banana] == 1]

print(f'Support is {len(purchased_apple_and_banana)}')
print(f'Confidence is {len(purchased_apple_and_banana) / len(purchased_apple) * 100:.2f}%')

Support is 21
Confidence is 58.33%


In [217]:
def print_rule(features, support, confidence, i, j):
    """
    Parameters:
        i: int
            The premise
        j: int
            The conclusion
    """
    print(f'Rule: If a person buy {features[i]} they will also buy {features[j]}')
    print(f' - Confidence: {confidence[(i, j)]:.3f}')
    print(f' - Support: {support[(i, j)]}')
    print('')

In [218]:
support = defaultdict(int)
confidence = defaultdict(float)

for i in range(len(features)):
    a = X[X[:, i] == 1]
    for j in range(len(features)):
        if i == j:
            continue
        b = a[a[:, j] == 1]
        support[(i, j)] = len(b)
        confidence[(i, j)] = len(b) / len(a)
        print_rule(features, support, confidence, i, j)


Rule: If a person buy bread they will also buy milk
 - Confidence: 0.519
 - Support: 14

Rule: If a person buy bread they will also buy cheese
 - Confidence: 0.148
 - Support: 4

Rule: If a person buy bread they will also buy apple
 - Confidence: 0.185
 - Support: 5

Rule: If a person buy bread they will also buy banana
 - Confidence: 0.630
 - Support: 17

Rule: If a person buy milk they will also buy bread
 - Confidence: 0.304
 - Support: 14

Rule: If a person buy milk they will also buy cheese
 - Confidence: 0.152
 - Support: 7

Rule: If a person buy milk they will also buy apple
 - Confidence: 0.196
 - Support: 9

Rule: If a person buy milk they will also buy banana
 - Confidence: 0.413
 - Support: 19

Rule: If a person buy cheese they will also buy bread
 - Confidence: 0.098
 - Support: 4

Rule: If a person buy cheese they will also buy milk
 - Confidence: 0.171
 - Support: 7

Rule: If a person buy cheese they will also buy apple
 - Confidence: 0.610
 - Support: 25

Rule: If a pers

In [219]:
confidence = dict(sorted(confidence.items(), key=lambda x: x[1], reverse=True))
confidence

{(3, 2): 0.6944444444444444,
 (2, 4): 0.6585365853658537,
 (0, 4): 0.6296296296296297,
 (2, 3): 0.6097560975609756,
 (3, 4): 0.5833333333333334,
 (0, 1): 0.5185185185185185,
 (4, 2): 0.4576271186440678,
 (1, 4): 0.41304347826086957,
 (4, 3): 0.3559322033898305,
 (4, 1): 0.3220338983050847,
 (1, 0): 0.30434782608695654,
 (4, 0): 0.288135593220339,
 (3, 1): 0.25,
 (1, 3): 0.1956521739130435,
 (0, 3): 0.18518518518518517,
 (2, 1): 0.17073170731707318,
 (1, 2): 0.15217391304347827,
 (0, 2): 0.14814814814814814,
 (3, 0): 0.1388888888888889,
 (2, 0): 0.0975609756097561}

In [224]:
top_n = 5
for i, j in list(confidence.keys())[:top_n]:
    print_rule(features, support, confidence, i, j)

Rule: If a person buy apple they will also buy cheese
 - Confidence: 0.694
 - Support: 25

Rule: If a person buy cheese they will also buy banana
 - Confidence: 0.659
 - Support: 27

Rule: If a person buy bread they will also buy banana
 - Confidence: 0.630
 - Support: 17

Rule: If a person buy cheese they will also buy apple
 - Confidence: 0.610
 - Support: 25

Rule: If a person buy apple they will also buy banana
 - Confidence: 0.583
 - Support: 21

