# Pattern mining: basic concepts and methods

## 4.1 - Basic concepts

### 4.1.1 - Market basket analysis: a motivating example

- A set of items is referred to as a itemset.
- A example of frequent itemset mining is <b>market basket analysis</b>.
- Each basket can then be represented by a Boolean vector of values assigned to the variables.
- These patterns can be represented in the form of <b>association rules</b>.
- Rule <b>support</b> and <b>confidence</b> are two measures of rule interestingness.
    - A support of 2% means that 2% of all the transactions under analysis show that conputer and antivirus software are purchased together.
    - A confidence of 60% means that 60% of the customers who purchased a computer also bough the software.
    - <b>minimum support threshold</b>
    - <b>minimum confidence threshold</b>

In [10]:
import numpy as np
np.random.seed(123)

items = ["computer", "keyboard", "mouse", "monitor", "chair", "energy drink"]
n_items = np.random.randint(len(items))

baskets = []
for i in range(200):
    b = []
    j = np.random.randint(1, n_items)
    for k in range(j):
        l = np.random.randint(1, n_items)
        if items[l] not in b:
            b.append(items[l])
    baskets.append(b)

basket_cnt = {}
for b in baskets:
    b_str = ', '.join(b)
    basket_cnt[b_str] = basket_cnt.get(b_str, 0) + 1

print(basket_cnt)

{'monitor, keyboard': 7, 'mouse, chair, monitor': 2, 'mouse, monitor, keyboard': 3, 'monitor, chair': 8, 'keyboard, monitor': 6, 'chair': 17, 'chair, monitor': 6, 'keyboard': 17, 'keyboard, chair, monitor': 5, 'mouse': 18, 'chair, keyboard': 6, 'mouse, monitor': 11, 'chair, monitor, mouse': 3, 'monitor, mouse, keyboard, chair': 1, 'monitor': 15, 'chair, mouse, keyboard': 1, 'keyboard, monitor, mouse': 3, 'mouse, chair': 5, 'mouse, monitor, chair': 2, 'monitor, chair, mouse': 5, 'keyboard, monitor, chair': 1, 'chair, mouse': 14, 'mouse, keyboard': 5, 'mouse, chair, keyboard': 3, 'chair, mouse, monitor': 2, 'monitor, keyboard, mouse': 1, 'keyboard, chair': 7, 'monitor, mouse, keyboard': 3, 'monitor, mouse': 3, 'monitor, keyboard, chair': 1, 'monitor, mouse, chair': 2, 'chair, keyboard, mouse': 2, 'chair, keyboard, monitor': 1, 'keyboard, mouse': 5, 'keyboard, mouse, monitor': 1, 'mouse, keyboard, chair, monitor': 1, 'keyboard, chair, mouse': 1, 'mouse, keyboard, monitor, chair': 2, 'chai

### 4.1.2 - Frequent items, closed itemsets, and association rules

- The rule A => B holds in the transaction set D with <b>support</b> s, where s is the percentage of transactions in D that contain A U B.
- The rule A => B has <b>confidence</b> c in the transaction set D, where c is the percentage of transaction in D containing A that also contain B.
    - support(A => B) = P(A U B)
    - confidence(A => B) = P(B | A)
- Rules that satisfy both a minimum support threshold (min_sup) and minimum confidence threshold (min_conf) are called <b>strong</b