# Apriori Algorithm

- a classic algorithm used in data mining for learning association rules
- it can be used to find items that are purchased together more frequently than others

## Useful tips

### Mapping through rows and columns in an array

Create an `np.vectorize` to perform mapping. Let's say we are given an array of items, and for each rows and columns, we want to just take the initials:

```python
items = np.array([['Mango', 'Onion', 'Nintendo', 'Key-chain', 'Eggs', 'Yo-yo'],
                  ['Doll', 'Onion', 'Nintendo', 'Key-chain', 'Eggs', 'Yo-yo'],
                  ['Mango', 'Apple', 'Key-chain', 'Eggs'],
                  ['Mango', 'Umbrella', 'Corn', 'Key-chain', 'Yo-yo'],
                  ['Corn', 'Onion', 'Onion', 'Key-chain', 'Ice-cream', 'Eggs']])

# Create a new np function that takes the first character from each
# items in the array (for simplification)
take_first = lambda x: x[0]
f = np.vectorize(take_first)

# Apply the function to the items
# Note that we also use frozenset to remove duplicates from each transaction
data = [frozenset(f(i)) for i in items]
```

Output:

```python
[frozenset({'E', 'K', 'M', 'N', 'O', 'Y'}),
 frozenset({'D', 'E', 'K', 'N', 'O', 'Y'}),
 frozenset({'A', 'E', 'K', 'M'}),
 frozenset({'C', 'K', 'M', 'U', 'Y'}),
 frozenset({'C', 'E', 'I', 'K', 'O'})]
```

In [1]:
import numpy as np
import pandas as pd

from functools import reduce
from itertools import combinations

from collections import Counter
from sklearn.preprocessing import LabelEncoder

In [2]:
# Every row is a transaction, and every column represent the item bought
# Note that in a single transaction, there can be similar items bought
items = np.array([['Mango', 'Onion', 'Nintendo', 'Key-chain', 'Eggs', 'Yo-yo'],
                  ['Doll', 'Onion', 'Nintendo', 'Key-chain', 'Eggs', 'Yo-yo'],
                  ['Mango', 'Apple', 'Key-chain', 'Eggs'],
                  ['Mango', 'Umbrella', 'Corn', 'Key-chain', 'Yo-yo'],
                  ['Corn', 'Onion', 'Onion', 'Key-chain', 'Ice-cream', 'Eggs']])

In [34]:
# Create a new label encoder to learn the mappings
le = LabelEncoder()

# Fit the mappings to learn them
le.fit(np.hstack(items))
# le.fit(list(set(np.hstack(items)))) # Will this be more performant?

print('Each index `i` will represent a class:\n')
for i, v in enumerate(le.classes_):
    print('{} => {}'.format(i, v))

# le.transform(['A', 'M'])

Each index `i` will represent a class:

0 => Apple
1 => Corn
2 => Doll
3 => Eggs
4 => Ice-cream
5 => Key-chain
6 => Mango
7 => Nintendo
8 => Onion
9 => Umbrella
10 => Yo-yo


In [4]:
# Now that we have learned the mappings, let's apply it to our dataset
# We will also remove duplicate items from the array using frozenset
# There is an added advantage of being able to check intersection too between sets using frozenset
encoded_items = [frozenset(le.transform(i)) for i in items]
encoded_items

[frozenset({3, 5, 6, 7, 8, 10}),
 frozenset({2, 3, 5, 7, 8, 10}),
 frozenset({0, 3, 5, 6}),
 frozenset({1, 5, 6, 9, 10}),
 frozenset({1, 3, 4, 5, 8})]

In [5]:
# We define our minimum support of 2, which means that there has to be at least 2 transactions
# for the item to be support
MIN_SUPPORT = 2

In [6]:
# Get all the individual items so that we can count them
singles = map(lambda x: list(x), encoded_items)
singles = reduce(lambda x, y: y + x, singles)
singles

[1,
 3,
 4,
 5,
 8,
 1,
 5,
 6,
 9,
 10,
 0,
 3,
 5,
 6,
 2,
 3,
 5,
 7,
 8,
 10,
 3,
 5,
 6,
 7,
 8,
 10]

In [7]:
# Get the count of occurances for each item in the transactions
singles_cnt = Counter(singles)
singles_cnt

Counter({0: 1, 1: 2, 2: 1, 3: 4, 4: 1, 5: 5, 6: 3, 7: 2, 8: 3, 9: 1, 10: 3})

In [8]:
# Once we have the counter, we filter those that has a minimum support of 2 only
# or items that appears twice only
singles_cnt = {k: v for k, v in list(singles_cnt.items()) 
               if v > MIN_SUPPORT}
singles_cnt

{3: 4, 5: 5, 6: 3, 8: 3, 10: 3}

In [9]:
# Once we have filtered items that has a minimum support of two, we will now 
# check for the support for pairs of items
data_singles = list(singles_cnt.keys())

# We use itertools.combinations to ensure only unique combinations are created
# r indicate the length of items in each combination
pairs = list(combinations(data_singles, r=2))
pairs

[(3, 5),
 (3, 8),
 (3, 6),
 (3, 10),
 (5, 8),
 (5, 6),
 (5, 10),
 (8, 6),
 (8, 10),
 (6, 10)]

In [10]:
pairs_fs = []
for i in pairs:
    for j in encoded_items:
        if frozenset(list(i)).issubset(j):
            pairs_fs.append(i)
            continue

pairs_fs = Counter(pairs_fs)
pairs_fs

Counter({(3, 5): 4,
         (3, 6): 2,
         (3, 8): 3,
         (3, 10): 2,
         (5, 6): 3,
         (5, 8): 3,
         (5, 10): 3,
         (6, 10): 2,
         (8, 6): 1,
         (8, 10): 2})

In [11]:
# Again, we filter only pairs that has a minimum support of 2
pairs_fs = {k: v for k, v in list(pairs_fs.items()) if v > MIN_SUPPORT}
pairs_fs

{(3, 5): 4, (3, 8): 3, (5, 6): 3, (5, 8): 3, (5, 10): 3}

In [18]:
# Now, we get triples with minimum support of 2
data_pairs = [list(i) for i in pairs_fs.keys()]

# Flatten it to get all the possible values
data_pairs = list(set(reduce(lambda x, y: x + y, data_pairs)))

data_pairs

[3, 5, 6, 8, 10]

In [25]:
data_pairs_recommend = [list(i) for i in pairs_fs.keys()]
# Provide recomendations for data_pairs
for i in data_pairs_recommend:
    for j in encoded_items:
        if frozenset(i).issubset(j):
            items_to_recommend = list(j.difference(i))
            if len(items_to_recommend) < 3:
                items_to_recommend_classes = ', '.join(le.inverse_transform(items_to_recommend))
                items_current_classes = ', '.join(le.inverse_transform(i))
                print('If you buy {}, you might like {} too.'.format(items_current_classes, items_to_recommend_classes))

If you buy Eggs, Key-chain, you might like Apple, Mango too.
If you buy Key-chain, Mango, you might like Apple, Eggs too.


In [13]:
triples = list(combinations(data_pairs, r=3))
triples

[(3, 5, 6),
 (3, 5, 8),
 (3, 5, 10),
 (3, 6, 8),
 (3, 6, 10),
 (3, 8, 10),
 (5, 6, 8),
 (5, 6, 10),
 (5, 8, 10),
 (6, 8, 10)]

In [14]:
triples_fs = []
for i in triples:
    for j in encoded_items:
        if frozenset(list(i)).issubset(j):
            triples_fs.append(i)
            continue

triples_fs = Counter(triples_fs)
triples_fs

Counter({(3, 5, 6): 2,
         (3, 5, 8): 3,
         (3, 5, 10): 2,
         (3, 6, 8): 1,
         (3, 6, 10): 1,
         (3, 8, 10): 2,
         (5, 6, 8): 1,
         (5, 6, 10): 2,
         (5, 8, 10): 2,
         (6, 8, 10): 1})

In [15]:
# Again, we filter only pairs that has a minimum support of 2
triples_fs = {k: v for k, v in list(triples_fs.items()) if v > MIN_SUPPORT}
triples_fs

{(3, 5, 8): 3}

In [17]:
# Check recommendation for users that purchased 3, 5 and 8 too
data_triples = [list(i) for i in triples_fs.keys()]
data_triples

for i in data_triples:
    for j in encoded_items:
        if frozenset(i).issubset(j):
            items_to_recommend = list(j.difference(i))
            items_to_recommend_classes = ', '.join(le.inverse_transform(items_to_recommend))
            items_current_classes = ', '.join(le.inverse_transform(i))
            print('If you buy {}, you might like {} too.'.format(items_current_classes, items_to_recommend_classes))

If you buy Eggs, Key-chain, Onion, you might like Mango, Nintendo, Yo-yo too.
If you buy Eggs, Key-chain, Onion, you might like Doll, Nintendo, Yo-yo too.
If you buy Eggs, Key-chain, Onion, you might like Corn, Ice-cream too.
