# Association analyses - Theory
---

This type of analysis is often referred to as *market basket analysis*,
because the most common initial use case is to find out which items from a range of products are purchased together particularly frequently in order to derive business-enhancing activities.

It is therefore a matter of *rules* of the form "If $X$, then $Y$"
($X \rightarrow Y$ for short), where $X$ and $Y$ represent so-called *item sets*. The term *item* is the technical term for an
object of investigation. In the market basket analysis, these items are, for example, articles in a supermarket or an online retailer. Everyone has probably already encountered the use of such
Rules: When shopping online, you receive *purchase recommendations* for products that are
are similar to or complement the one you are interested in.
The business objective of such shopping basket analyses is therefore *cross-selling*.

The rules sought must be filtered out of the existing sales data with virtually no prior knowledge. The association analysis is therefore a method of *unsupervised learning*.

To illustrate this, let's look at the following table. When looking at ReceiptNo 47110816, the items with ItemNo 2 and 530 are purchased together (is it beer and diapers?...).


  ReceiptNo | CustomerNo | ItemNo | Quantity
:---------:|:--------:|:---------:|:-------
47110815 | 649 | 30 | 4
47110816 | 563 | 2 | 1
47110816 | 563 | 530 | 2
47110817 | 43 | 122 | 3
... | ... | ... | ...


 In the following, we will take a closer look at the
calculations that market basket analyses require.

## Terms of market basket analyses
---

The analyses are always based on a set $I = \{ i_1, i_2, \dots, i_m\}$ of $m$ items
$i_k$, with k=1,$\dots,m$. The *transactions* $T\subseteq I$ are then considered. These
represent the purchase of a customer in the classic market basket analysis, where (usually) several items are purchased together. In general, they represent events in which the items under investigation occur together. Finally, a
*Database* $D = \{ T_1, T_2, ..., T_n\}$ of $n$ transactions $T_j,\ j=1,\dots,n$, which is also called the
*Population*. Rules are then sought

$$
 X \rightarrow Y\quad\text{with}\quad X,Y \subseteq I\quad\text{and}\quad X\cap Y = \emptyset
$$

i.e. $X$ and $Y$ must be *disjoint*. This refers to $X$ as the *body*
(or the *antecedent*) of the rule, while $Y$ is the *head* (or the *consequent* 
(in the sense of logical reasoning)) of the rule.

The requirement for the empty intersection of $X$ and $Y$ means in the context of classical
market basket analysis that an item may not occur in both the body and the head of the rule.
However, this would also make no sense, because for an item $i$ the rule $\{i\}\rightarrow\{i\}$ is of course always fulfilled, because $i$ naturally always occurs with itself.

## Key figures of market basket analyses
---

### Support and Confidence
The most important parameters are initially
* the *support* ($=$ relative frequency in the population)
$$
       \begin{aligned}
         \text{sup}(X) & = \frac{\lvert\lbrace T \in D \mid X \subseteq T \rbrace\rvert}{\lvert D \rvert}\\\
         \text{sup}(X \rightarrow Y) & = \frac{\lvert\lbrace T \in D \mid X \cup Y \subseteq T \rbrace\rvert}{\lvert D \rvert}
         = \text{sup}(X\cup Y) = \text{sup}(Y \rightarrow X)
       \end{aligned}
$$
* the *confidence* ($=$ relative frequency in the $X$ part of the population)
$$
       \text{conf}(X \rightarrow Y) =
         \frac{\lvert\lbrace T \in D \mid X \cup Y \subseteq T \rbrace\rvert}{\lvert\lbrace T \in D \mid X \subseteq T \rbrace\rvert} =
         \frac{\text{sup}(X \rightarrow Y)}{\text{sup}(X)}
$$

 
With these formulas, one should above all realize that the following **inequality** always applies
 
$$
\text{sup}(X\rightarrow Y) \leq \text{sup}(X)\qquad(1)
$$
 
because $X \cup Y \subseteq T$ is a more restrictive condition than $X \subseteq T$. Obviously, the set $X \cup Y$ contains at least as many items as $X$ (or $Y$) alone. Thus, the statement of the *inequality* is simply that all items from $X$ and $Y$ together are purchased at most as often as the items from $X$ alone. In particular, this means that
 
$$
0 \leq \text{sup}(X),\quad \text{conf}(X\rightarrow Y) \leq 1
$$

i.e. both *Support* and *Confidence* can only assume values between 0 and 1
(which of course also follows from their definition as relative frequencies).

### Lift
It should be noted that *Support* can also be interpreted as *expected Confidence*. If you have an *empty assumption* (i.e. no more detailed Information about the population), then the entire population must be considered. Thus one can state

$$
\text{sup}(X) = \text{conf}_{\text{exp}}(X)
$$

Another *measure of interest* for a rule can now be derived from this, the so-called
*Lift*. This is a concept that originally comes from marketing and is trying to answer the question of how much "better" (in whatever sense) 
we are off when certain information is available to us rather than not having the information 
(ultimately a similar concept to the *information gain* in decision trees).
In our context, the lift is calculated as the ratio of the confidence and the expected confidence, i.e

\begin{aligned}
 \text{lift}(X \rightarrow Y) = \frac{\text{conf}(X \rightarrow Y)}{\text{conf}_{\text{exp}}(Y)}
                         &= \frac{\text{conf}(X \rightarrow Y)}{\text{sup}(Y)} \\
                         &= \frac{\text{sup}(X \cup Y)}{\text{sup}(X)\cdot\text{sup}(Y)}
                          = \text{lift}(Y \rightarrow X)
\end{aligned}

This immediately results in the following interpretation
* $\text{lift}(X \rightarrow Y ) > 1$: *Complementary effect*
* $\text{lift}(X \rightarrow Y ) < 1$: *Substitution effect*

i.e. if the lift is greater than 1, then the chance of also selling the items from $Y$ improves
if you already know that the items were purchased in $X$, in the other case
it deteriorates.

**Note:** If you look at the formulas for Support, Confidence and Lift again, you can see that Support
     and Lift are *symmetrical* with regard to the precondition and conclusion of the rule, whereas with
     confidence depends very much on what the precondition and what the conclusion is.

## Example
---
We take the famous beer-and-diapers example and back it up with numbers. We assume that the *database* has already been rummaged through and that the numbers given in the following
Table below.


   Transactions | Number
:--------------- | ----------------:
Total | 3,000,000
Beer (B) | 300,000
Diapers (D) | 500,000
B & W | 150,000

The parameters *Support, Confidence* and *Lift* can therefore be calculated from this.

The result is

\begin{aligned}
\text{sup}(B) &= \frac{300.000}{3.000.000} = 10\%\\
\text{sup}(D) &= \frac{500.000}{3.000.000} = 16.67\%\\\
\text{sup}(D\rightarrow B) &= \frac{150,000}{3,000,000} = 5\% = \text{sup}(B\rightarrow D)\\
\text{conf}(D \rightarrow B) & = \frac{150,000}{500,000} = 30\%\\\
\text{conf}(B \rightarrow D) & = \frac{150.000}{300.000} = 50\%\\
\text{lift}(D \rightarrow B) & = \frac{\text{conf}(D \rightarrow B)}{\text{conf}_{\text{exp}}(B)}
                               = \frac{30\%}{10\%} = 3 = \text{lift}(B \rightarrow D)
\end{aligned}

The confidence of being able to sell diapers when buying beer is therefore higher (50%) than in the opposite case (30%)
in the opposite case (30%). However, the chance of selling beer triples when
Diapers are bought!

## A-priori algorithm
---
As trivial as the observation of the *inequality* (1) and its interpretation may seem, it forms the basis of the so-called *A-priori algorithm*, on which many
algorithms for association analysis are based on. The idea is quite simple. First of all
a *lower limit for the support* is specified so that a rule is even considered at all. The algorithm then searches for so-called *large itemsets*. An itemset is large if its support is at least as large as the required minimum support.

Approaching this search naively, would lead to an algorithm with very high complexity,
i.e. the runtimes increase immensely. This is where the *inequality* (1) comes into play again, because
it ensures that an item set can only be large if *every subset* of this
item set is *large* aswell. With this we can initially find large item sets consisting of one element, i.e. one product that was bought often enough. Each two-element large item set must than be a combination of these items, although not all of such combinations will be large aswell. The algorithm therefore stops quite soon and all disjoint subsets of the
large item sets can be examined for rule generation.

To determine the actual rules, in addition to the minimum support, a
*minimum confidence* and often also the maximum number of items in the large item sets
is specified.

Next, let's take a look at the [practical implementation in Python](2.2.a_Assoc_Practical.ipynb).