# 10. Association rule mining

Association rule mining is a method for discovering interesting relationships between variables in large datasets. The method is designed for categorical data and is used to identify frequent patterns, associations, correlations, or causal structures among sets of items in transaction databases or other data repositories.

The most common example of association rule mining is the market basket analysis. In this analysis, the goal is to identify items that are frequently purchased together. For example, if a customer buys bread, he may be more likely to buy butter as well.

## Association rules

Association rule mining plays with categorical data. As an example, consider the following dataset, where each row represents a transaction. In each transaction, the customer buys products from different categories.

| TID | Items                        |
| --- | ---------------------------- |
| 1   | {soap, milk, candy, fish}    |
| 2   | {milk, candy, fish}          |
| 3   | {fruit, milk, candy}         |
| 4   | {fruit, soap, milk, fish}    |
| 5   | {soap, milk, candy}          |
| 6   | {soap, milk, fish}           |
| 7   | {soap, fish}                 |


In this simple example, there are seven transactions, and the items are from four categories: {fruit, soap, milk, candy, fish}.

> This is just a tiny example to illustrate the concept. In real-world, there can be thousands, or even millions of transactions, and a large number of items. Association rule mining scales well to very large datasets.

An association rule is a directed implication of the form X -> Y, where X and Y are itemsets. The rule X -> Y holds in the transaction database if the occurrence of X in a transaction implies the occurrence of Y in the same transaction. 

Some examples of association rules derived from the above dataset are:
- {milk} -> {candy}
- {milk, candy} -> {fish}

The first rule states that if a customer buys milk, he will buy candy as well. The second rule states that if a customer buys both milk and candy, he will buy fish as well.

Pay attention to the fact that the rules are directed. The first rule does not imply that if a customer buys candy, he will buy milk. There may be a large number of candy-buyers who do not buy milk, even in the case that almost all milk-buyers also buy candy.

Moreover, association rules do not generally imply causation. The rule that says X -> Y does not mean that X causes Y. It only means that X and Y are associated, i.e. the occurrence of X implies the occurrence of Y.

## Support and confidence: measuring the goodness of rules

So far, we have constructed a couple of association rules, but we have no idea of wehther the rules are interesting or not. To evaluate the quality of an association rule, we use two metrics: support and confidence.

### Support

Support is always defines for an item set. It means that whenever we talk about the support of rule X -> Y, we are referring to the support of the item set X U Y. Support is the proportion of transactions in the database that contain the item set.

For example, the support of the rule {milk, candy} -> {fish} is the proportion of transactions that contain all three items: milk, candy, and fish. In this case, that is 2/7, or 29%.

Intuitively speaking, a high support for a rule, or set of items, means that the rule is interesting because it can be applied to a large number of transactions. Thus, the rule is not marginal.

### Confidence

Confidence is a measure of the reliability of the rule. It is the proportion of transactions that contain the item set X that also contain the item set Y. It can also be interpreted as the conditional probability of Y given X.

For example, the confidence of the rule {milk, candy} -> {fish} is the proportion of transactions that contain milk and candy that also contain fish. In this case, that is 2/4, or 50%.

Intuitively speaking, a high confidence for a rule means that the rule is interesting because it is reliable, or trustworthy.

## Apriori algorithm

The Apriori algorithm is a popular algorithm for mining frequent itemsets for boolean association rules. The algorithm is designed to operate on databases containing transactions, such as the ones we have seen above.

> Original citation: Agrawal, R. and Srikant, R. (1994) Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases, VLDB, Santiago de Chile, 12-15 September 1994, 487-499.

The algorithm finds all frequent item sets and uses a candidate generation function to generate all possible itemsets. The algorithm terminates when no more itemsets can be found.

Let's see how the Apriori algorithm works in practice, using the example dataset above.

The algorithm has two hyperparameters:
- the minimum support threshold for an itemset to be considered frequent. This parameter controls the search space of the algorithm. A lower value will mean a higher execution time, and potentially a lot of marginal rules.
- the minimum confidence threshold for an association rule to be considered interesting. This parameter effectively filters out the rules that are not reliable. A lower value will mean a lot of rules, but many of them will be unreliable.

In our example, let's set the minimum support threshold to 0.4 and the minimum confidence threshold to 0.75.

Apriori algorithm proceeds in two phases. In the first phase, it finds all frequent itemsets in the database. In the second phase, it generates the association rules from the frequent itemsets.

### Phase 1: Find all frequent itemsets

In the first phase, the algorithm finds all itemsets that have a support greater than the minimum support threshold. The algorithm starts with the itemsets of size 1 and iteratively increases the size of the itemsets until no more frequent itemsets can be found.

Let's see how the algorithm works with the example dataset.

The first thing to do is to find all itemsets of size 1, and calculate their support:

| Itemset | Support |
| ------- |---------|
| {fruit} | 2/7     |
| {soap}  | 4/7     |
| {milk}  | 6/7     |
| {candy} | 4/7     |
| {fish}  | 5/7     |

This is called the L1 candidate set.

At this point, the algorithm filters out the itemsets that do not meet the minimum support threshold. In this case, the itemset {fruit} is filtered out, as they do not satisfy the minimum support threshold of 0.4. As a result, the L1 frequent set is:

| Itemset | Support |
| ------- |---------|
| {soap}  | 4/7     |
| {milk}  | 6/7     |
| {candy} | 4/7     |
| {fish}  | 5/7     |

Next, the algorithm generates the L2 candidate set by joining the items in the L1 frequent set with other items in the same set. The L2 candidate set is:

| Itemset         | Support |
| --------------- |---------|
| {soap, milk}    | 4/7     |
| {soap, candy}   | 2/7     |
| {soap, fish}    | 4/7     |
| {milk, candy}   | 4/7     |
| {milk, fish}    | 4/7     |
| {candy, fish}   | 2/7     |

Again, the algorithm filters out the itemsets that do not meet the minimum support threshold. In this case, the itemsets {soap, candy} and {candy, fish} are filtered out. The L2 frequent set is:

| Itemset         | Support |
| --------------- |---------|
| {soap, milk}    | 4/7     |
| {soap, fish}    | 4/7     |
| {milk, candy}   | 4/7     |
| {milk, fish}    | 4/7     |

The algorithm continues to generate larger itemsets until no more frequent itemsets can be found. In this case, the L3 candidate set is computed. From this point onwards, a specific optimization is applied: the algorithm prunes the candidate set by removing the itemsets that have infrequent subsets. This is called the Apriori property.


The L3 candidate set, without support calculation, is:

| Itemset             | All size 2 subsets found in L2?   |
|---------------------|-----------------------------------|
| {soap, milk, fish}  | Yes                               |
| {soap, milk, candy} | No, as {soap, candy} is not in L2 |
| {milk, candy, fish} | No, as {candy, fish} is not in L2 |
| {soap, candy, fish} | No, as {soap, candy} is not in L2 |

This allows us to immediately discard the itemset {soap, milk, candy} from the candidate set, as it has an infrequent subset. The L3 candidate set, together with the support calculation, is:

| Itemset             | Support |
|---------------------|---------|
| {soap, milk, fish}  | 3/7     |

> The reason for utilizing the Apriori property is straightforward: as the previous frequent set was just computed and memorized, the lookup for the subsets is very fast. This allows the algorithm to prune the candidate set efficiently. While not strictly necessary, the Apriori property is a significant optimization that makes the algorithm faster.


The support of the only itemset in the L3 candidate set is 3/7, which is higher than the minimum support threshold of 0.4. Therefore, the L3 frequent set is:

| Itemset             | Support |
|---------------------|---------|
| {soap, milk, fish}  | 3/7     |

As it is not possible to generate any candidate itemsets of size 4, the algorithm terminates. The frequent itemsets are:

| Itemset             | Support |
|---------------------|---------|
| {soap}              | 5/7     |
| {milk}              | 6/7     |
| {candy}             | 4/7     |
| {fish}              | 5/7     |
| {soap, milk}        | 4/7     |
| {soap, fish}        | 4/7     |
| {milk, candy}       | 4/7     |
| {milk, fish}        | 4/7     |
| {soap, milk, fish}  | 3/7     |

This concludes the first phase of the Apriori algorithm.

### Phase 2: Generate association rules

In the second phase of the algorithm, the association rules are generated from the frequent itemsets. The algorithm generates all possible rules from the frequent itemsets and filters out the rules that do not meet the minimum  threshold for the selected metric (usually confidence). The creation of the rules is done by brute force, by trying all possible combinations of the items in the itemset.

In the example dataset, the following association rules are generated from the frequent itemsets, together with their confidence:

| Itemset        | Rule                   | Confidence |
|----------------|------------------------|------------|
| {soap, milk}   | {soap} -> {milk}       | 4/5        |
| {soap, milk}   | {milk} -> {soap}       | 4/6        |
| {soap, fish}   | {soap} -> {fish}       | 4/5        |
| {soap, fish}   | {fish} -> {soap}       | 4/5        |
| {milk, candy}  | {milk} -> {candy}      | 4/6        |
| {milk, candy}  | {candy} -> {milk}      | 4/4        |
| {milk, fish}   | {milk} -> {fish}       | 4/6        |
| {milk, fish}   | {fish} -> {milk}       | 4/5        |
| {soap, milk, fish} | {soap, milk} -> {fish} | 3/4        |
| {soap, milk, fish} | {soap, fish} -> {milk} | 3/4        |
| {soap, milk, fish} | {milk, fish} -> {soap} | 3/4        |
| {soap, milk, fish} | {soap} -> {milk, fish} | 3/5        |
| {soap, milk, fish} | {milk} -> {soap, fish} | 3/6        |
| {soap, milk, fish} | {soap, fish} -> {milk} | 3/5        |

Next, the algorithm filters out the rules that do not meet the minimum confidence threshold of 0.75. The final set of association rules, together with the support and confidence, is:

| Rule              | Support | Confidence |
|-------------------|---------|------------|
| {soap} -> {milk}  | 4/7     | 4/5        |
| {soap} -> {fish}  | 4/7     | 4/5        |
| {fish} -> {soap}  | 4/7     | 4/5        |
| {candy} -> {milk} | 4/7     | 4/4        |
| {fish} -> {milk}  | 4/7     | 4/5        |
| {soap, milk} -> {fish} | 3/7 | 3/4        |
| {soap, fish} -> {milk} | 3/7 | 3/4        |
| {milk, fish} -> {soap} | 3/7 | 3/4        |

This is the final output of the Apriori algorithm. The algorithm has found the frequent itemsets and generated the association rules that meet the minimum support and confidence thresholds. For application, these may be sorted by support or confidence, or other metrics, to identify the most interesting rules.

## Python implementation

The **sklearn** library does not provide an implementation of the Apriori algorithm. However, there are other libraries that do. One of them is the **mlxtend** library, which extends **sklearn** with this functionality. A benefit of choosing **mlxtend** is that it is easy to use and integrates well with the **pandas** library.

> There are also other Python libraries that implement the Apriori algorithm, such as **apyori**. There is no **pandas** support, however.

Let's apply the library to an example dataset that contains information about the breakfast clients of a coffee shop.

> As the **mlxtend** library is not included in the Anaconda distribution, you need to install it. In PyCharm, click Python packages in the bottom left corner, and search for **mlxtend**. Then install the package.

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

# read dataframe from local file
df = pd.read_csv('datasets/coffee_shop/coffee_shop.csv', sep=';')
df

Unnamed: 0,Id,Coffee,Tea,Hot_chocolate,Cheese,Sausage,Veggie_deli_slices,Cucumber,Tomato,Juice,Milk,Dark_bread,White_bread,Sweet_bun,Croissant,Fruit_chunks,Smoothie
0,1,1.0,,,,,,,,,,,,,,,
1,2,1.0,,,1.0,1.0,,,,,,,1.0,,,,
2,3,1.0,,,,1.0,,,,,1.0,,1.0,,,,
3,4,,1.0,,,,,,,,,,,,,,
4,5,1.0,,,1.0,,,1.0,1.0,,,1.0,,,,,
5,6,1.0,,,1.0,,,1.0,1.0,,,1.0,,,,,
6,7,,,1.0,1.0,,,1.0,,,,1.0,,,,,
7,8,,,1.0,1.0,,1.0,,,,,1.0,,,,,1.0
8,9,1.0,,,,,,,,,,,,1.0,1.0,,
9,10,,,1.0,,,,,,,,,,1.0,1.0,,


In the dataset, the items that the client has selected are represented by 1, whereas there is an empty value if the client has not selected the item. As the **mlxtend** library requires the data to be in a specific format, we need to preprocess the data before applying the Apriori algorithm. We replace the empty values with False and the 1 values with True, resulting in a dataset that contains only boolean values. At the same time, we drop the Id column, as it is not needed for the analysis.

In [10]:
# drop id
df = df.drop(columns='Id')

# replace NaN values with False
df = df.fillna(False)

# replace 1.0 values with True
df = df.apply(lambda x: x.map(lambda y: True if y == 1.0 else y))
df

Unnamed: 0,Coffee,Tea,Hot_chocolate,Cheese,Sausage,Veggie_deli_slices,Cucumber,Tomato,Juice,Milk,Dark_bread,White_bread,Sweet_bun,Croissant,Fruit_chunks,Smoothie
0,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
1,True,False,False,True,True,False,False,False,False,False,False,True,False,False,False,False
2,True,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False
3,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False
4,True,False,False,True,False,False,True,True,False,False,True,False,False,False,False,False
5,True,False,False,True,False,False,True,True,False,False,True,False,False,False,False,False
6,False,False,True,True,False,False,True,False,False,False,True,False,False,False,False,False
7,False,False,True,True,False,True,False,False,False,False,True,False,False,False,False,True
8,True,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False
9,False,False,True,False,False,False,False,False,False,False,False,False,True,True,False,False


At this point, the dataset is ready for the Apriori algorithm. We can apply the algorithm to find the frequent itemsets and generate the association rules.

In [11]:
# find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)
frequent_itemsets

Unnamed: 0,support,itemsets
0,0.633333,(Coffee)
1,0.116667,(Tea)
2,0.15,(Hot_chocolate)
3,0.366667,(Cheese)
4,0.233333,(Sausage)
5,0.233333,(Veggie_deli_slices)
6,0.316667,(Cucumber)
7,0.2,(Tomato)
8,0.116667,(Juice)
9,0.183333,(Milk)


In [12]:
# generate association rules

rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.5)

# sort in descending order of confidence
rules = rules.sort_values(by='confidence', ascending=False)

rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
39,"(Veggie_deli_slices, Cucumber)",(Coffee),0.100000,0.633333,0.100000,1.0,1.578947,0.036667,inf,0.407407
18,(Smoothie),(Veggie_deli_slices),0.166667,0.233333,0.166667,1.0,4.285714,0.127778,inf,0.920000
76,"(Coffee, Dark_bread, Veggie_deli_slices)",(Cucumber),0.100000,0.316667,0.100000,1.0,3.157895,0.068333,inf,0.759259
71,"(Tomato, Cucumber)",(Dark_bread),0.166667,0.266667,0.166667,1.0,3.750000,0.122222,inf,0.880000
70,"(Tomato, Dark_bread)",(Cucumber),0.166667,0.316667,0.166667,1.0,3.157895,0.113889,inf,0.820000
...,...,...,...,...,...,...,...,...,...,...
31,(Sausage),"(Coffee, Cheese)",0.233333,0.266667,0.116667,0.5,1.875000,0.054444,1.466667,0.608696
16,(Dark_bread),(Veggie_deli_slices),0.266667,0.233333,0.133333,0.5,2.142857,0.071111,1.533333,0.727273
14,(Dark_bread),(Cheese),0.266667,0.366667,0.133333,0.5,1.363636,0.035556,1.266667,0.363636
11,(Cheese),(Cucumber),0.366667,0.316667,0.183333,0.5,1.578947,0.067222,1.366667,0.578947


The output shows the association rules in decreasing order of confidence. The output above could be used to create a recommendation system for the coffee shop, suggesting items that the client might be interested in based on their previous selections. The idea is to look at the association rules in decreasing order of confidence and, if the left hand side of the rule is in the client's basket, recommend the item set on the right hand side of the rule. For instance, if the third row is the first that matches the client's basket (i.e. they have selected veggie deli slices and cucumber), the recommendation would be to suggest the client to buy coffee and dark bread as well.

The strength of association rule mining as a recommendation system is that it considers not only individual items but also combinations of items. This allows for more sophisticated recommendations that take into account the previously unhidden relationships between items.


## Lift: a derived metric

Previously, we have discussed the support and confidence metrics for evaluating association rules. Another important metric is lift. Lift is a measure of how much more likely the antecedent (left hand side) and consequent (right hand side) of a rule are to occur together than if they were statistically independent. In other words, this tells the factor with which the probability of the consequent being present increases when the antecedent is present.

Let's preprocess the set of frequent items once more, using a threshold of 2 for the lift metric, and sort the rules in descending order of lift.

In [13]:
rules = association_rules(frequent_itemsets, metric='lift', min_threshold=2)

# sort in descending order of lift
rules = rules.sort_values(by='lift', ascending=False)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
17,(Croissant),(Sweet_bun),0.100000,0.150000,0.100000,1.000000,6.666667,0.085000,inf,0.944444
16,(Sweet_bun),(Croissant),0.150000,0.100000,0.100000,0.666667,6.666667,0.085000,2.700000,1.000000
67,"(Coffee, Dark_bread)","(Veggie_deli_slices, Cucumber)",0.166667,0.100000,0.100000,0.600000,6.000000,0.083333,2.250000,1.000000
72,"(Veggie_deli_slices, Cucumber)","(Coffee, Dark_bread)",0.100000,0.166667,0.100000,1.000000,6.000000,0.083333,inf,0.925926
55,(Smoothie),"(Dark_bread, Veggie_deli_slices)",0.166667,0.133333,0.116667,0.700000,5.250000,0.094444,2.888889,0.971429
...,...,...,...,...,...,...,...,...,...,...
0,(Juice),(Cheese),0.116667,0.366667,0.100000,0.857143,2.337662,0.057222,4.433333,0.647799
5,(Veggie_deli_slices),(Dark_bread),0.233333,0.266667,0.133333,0.571429,2.142857,0.071111,1.711111,0.695652
4,(Dark_bread),(Veggie_deli_slices),0.266667,0.233333,0.133333,0.500000,2.142857,0.071111,1.533333,0.727273
45,"(Cheese, Cucumber)",(Dark_bread),0.183333,0.266667,0.100000,0.545455,2.045455,0.051111,1.613333,0.625850


This output tells us interesting things about the relationships between the items. Looking at the first row, we see that selecting croissant increases the probability of selecting sweet bun with a factor of 6. This is a strong relationship, and the coffee shop might want to consider bundling these items together to increase sales.

## Multi-class variables and numerical variables

In the previous example, all variables were binary. However, the Apriori algorithm can also be used with multi-class variables. From earlier examples we know that, in the preprocessing phase, each multi-class variable can be transformed into multiple binary variables, one for each class. This is called one-hot encoding, and it is implemented in the **OneHotEncoder** class of the **sklearn** library.

The requirement of having only binary variables in association rule mining is a technical limitation of the **mlxtend** library. Some other libraries may be able to handle multi-class variables directly.

For numerical variables, things are a little bit more complicated. One approach is to discretize the numerical variables into categories and then apply the Apriori algorithm.

Let's consider the following example, where we have one numerical variable that represents the number of cigarettes that a person smoker per day. In addition, we have one multi-class variable that represents a genetic variant (A, B, and C), and one binary variable that represents the presence of a disease. The goal is to find the association rules that predict the disease based on the genetic variant and the number of cigarettes smoked per day. That is, we may be interested only in association rules where the disease is in the consequent.

In [14]:
# create dataframe from dictionary
data = {'cigarettes': [10, 20, 0, 5, 15, 30, 0, 25, 10, 5, 20, 15, 30],
        'genetic_variant': ['A', 'B', 'A', 'C', 'C', 'C', 'A', 'B', 'A', 'B', 'C', 'B', 'A'],
        'disease': [False, False, False, False, True, True, False, False, False, False, True, False, False]}
df = pd.DataFrame(data)
display(df)

Unnamed: 0,cigarettes,genetic_variant,disease
0,10,A,False
1,20,B,False
2,0,A,False
3,5,C,False
4,15,C,True
5,30,C,True
6,0,A,False
7,25,B,False
8,10,A,False
9,5,B,False


As we want to utilize all variables in association rule mining, we need some preprocessing. We can transform the numerical variable into a fixed number of binary variables, were the threshold is repeatedly increased. In this example, we will create three binary variables: one that represents if the person smokes at least 5 cigarettes per day, one that represents if the person smokes at least 15 cigarettes per day, and one that represents if the person smokes at least 25 cigarettes per day.

In addition, we will transform the genetic_variant variable into binary variables using one-hot encoding. Finally, we will drop the original numerical variable, as it is no longer needed.

In [15]:
# create binary variables from numerical variable
df['cig_at_least_5'] = df['cigarettes'] >= 5
df['cig_at_least_15'] = df['cigarettes'] >= 15
df['cig_at_least_25'] = df['cigarettes'] >= 25

# one-hot encode the genetic_variant variable
df = pd.get_dummies(df, columns=['genetic_variant'])

# drop the original numerical variable
df = df.drop(columns='cigarettes')
df

Unnamed: 0,disease,cig_at_least_5,cig_at_least_15,cig_at_least_25,genetic_variant_A,genetic_variant_B,genetic_variant_C
0,False,True,False,False,True,False,False
1,False,True,True,False,False,True,False
2,False,False,False,False,True,False,False
3,False,True,False,False,False,False,True
4,True,True,True,False,False,False,True
5,True,True,True,True,False,False,True
6,False,False,False,False,True,False,False
7,False,True,True,True,False,True,False
8,False,True,False,False,True,False,False
9,False,True,False,False,False,True,False


Now, we can apply the Apriori algorithm to find the frequent itemsets and generate the association rules:


In [16]:
# find frequent itemsets
frequent_itemsets = apriori(df, min_support=0.1, use_colnames=True)

# generate association rules
rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.5)

# only list rules where disease is in the consequent
rules = rules[rules['consequents'].apply(lambda x: 'disease' in x)]

# sort in descending order of lift
rules = rules.sort_values(by='lift', ascending=False)
rules

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
22,"(cig_at_least_15, genetic_variant_C)",(disease),0.230769,0.230769,0.230769,1.0,4.333333,0.177515,inf,1.0
37,"(cig_at_least_15, cig_at_least_5, genetic_vari...",(disease),0.230769,0.230769,0.230769,1.0,4.333333,0.177515,inf,1.0
40,"(cig_at_least_15, genetic_variant_C)","(disease, cig_at_least_5)",0.230769,0.230769,0.230769,1.0,4.333333,0.177515,inf,1.0
3,(genetic_variant_C),(disease),0.307692,0.230769,0.230769,0.75,3.25,0.159763,3.076923,1.0
18,"(cig_at_least_5, genetic_variant_C)",(disease),0.307692,0.230769,0.230769,0.75,3.25,0.159763,3.076923,1.0
20,(genetic_variant_C),"(disease, cig_at_least_5)",0.307692,0.230769,0.230769,0.75,3.25,0.159763,3.076923,1.0
25,(genetic_variant_C),"(cig_at_least_15, disease)",0.307692,0.230769,0.230769,0.75,3.25,0.159763,3.076923,1.0
43,"(cig_at_least_5, genetic_variant_C)","(cig_at_least_15, disease)",0.307692,0.230769,0.230769,0.75,3.25,0.159763,3.076923,1.0
45,(genetic_variant_C),"(cig_at_least_15, disease, cig_at_least_5)",0.307692,0.230769,0.230769,0.75,3.25,0.159763,3.076923,1.0


The algorithm is able to detect the combination of the genetic_variant_C and smoking at least 15 cigarettes per day as the strongest predictor for the disease. This is a simple (and, in reality, far too small) example, but it illustrates how numerical variables can be included in association rule mining.