**Numerical association rules** are easy-to-understand non-trivial patterns that can be discovered in your data. Let's try them with [Desbordante](https://github.com/Desbordante/desbordante-core)!

# Install necessary dependencies

Firstly, let's download and import necessary libraries:

In [None]:
!pip install desbordante==2.3.2

Collecting desbordante==2.3.2
  Downloading desbordante-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (19 kB)
Downloading desbordante-2.3.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.0/4.0 MB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: desbordante
Successfully installed desbordante-2.3.2


Desbordante library will be used for discovery of numerical association rules and Pandas library will be used for visualising the data:

In [None]:
import desbordante
import pandas as pd

Let's download example data:

In [None]:
!wget https://raw.githubusercontent.com/Desbordante/desbordante-core/refs/heads/main/examples/datasets/dog_breeds.csv

--2025-03-20 13:13:33--  https://raw.githubusercontent.com/Desbordante/desbordante-core/refs/heads/main/examples/datasets/dog_breeds.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11999 (12K) [text/plain]
Saving to: ‘dog_breeds.csv’


2025-03-20 13:13:33 (18.5 MB/s) - ‘dog_breeds.csv’ saved [11999/11999]



# Numerical association rules: an example

Suppose we have a table containing students' exam grades and how many hours they studied for the exam. Such a table might hold the following numerical association rule:

```
Study_Hours[15.5 - 30.2] ⎤-Antecedent
Subject[Topology]        ⎦
      |
      |
      V
Grade[3 - 5]             ]-Consequent
   support = 0.21
   confidence = 0.93
```

This rule states that students who study Topology for between 15.5 and 30.2 hours will receive a grade between 3 and 5. This rule has support of 0.21, which means that 21% of rows in the dataset satisfy both the antecedent's and consequent's requirements. This rule also has confidence of 0.93, meaning that 93% of rows that satisfy the antecedent also satisfy the consequent. Note that attributes can be integers, floating point numbers, or strings.

Numerical association rules (NAR) are an extension of traditional association rules (AR), which help to discover patterns in data. Unlike ARs, which work with binary attributes (e.g., whether an item was purchased or not), NARs can handle numerical data (e.g., how many units of an item were purchased). This makes NARs more flexible for discovering relationships in datasets with numerical data. You can learn more about traditional association rules [here](./Association_Rules.ipynb). Through this link you can also find more information about support and confidence.

# Explore data

Let's have a look at the dataset:

In [None]:
dataset = pd.read_csv('dog_breeds.csv')
dataset

Unnamed: 0,Name,Origin,Type,Friendliness,Life Span,Size,Grooming Needs,Exercise Requirements,Good with Children,Intelligence,Shedding,Health Issues Risk,Weight,Training Difficulty
0,Affenpinscher,Germany,Toy,7,14,1,High,1.5,Yes,8,Moderate,Low,4.0,6
1,Afghan Hound,Afghanistan,Hound,5,13,3,Very High,2.0,No,4,High,Moderate,25.0,8
2,Airedale Terrier,England,Terrier,8,12,2,High,2.0,Yes,7,Moderate,Low,21.0,6
3,Akita,Japan,Working,6,11,3,Moderate,2.0,With Training,7,High,High,45.0,9
4,Alaskan Malamute,Alaska USA,Working,7,11,3,High,3.0,Yes,6,Very High,Moderate,36.0,8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,Wire Fox Terrier,England,Terrier,7,14,1,Moderate,2.0,Yes,7,Moderate,Moderate,8.0,7
155,Wirehaired Dachshund,Germany,Hound,7,13,1,Moderate,1.5,With Training,7,Moderate,High,8.0,7
156,Wirehaired Pointing Griffon,Netherlands,Sporting,7,13,2,High,2.0,Yes,7,Moderate,Moderate,20.0,6
157,Xoloitzcuintli,Mexico,Non-Sporting,7,15,3,Low,2.0,With Training,8,Low,Moderate,25.0,6


The dataset contains information about 159 dog breeds. Now, let's discover NARs in this table.

# Find numerical association rules

Desbordante implements an algorithm called "Differential Evolution Solver" (DES). It is a nature-inspired stochastic optimization algorithm that imitates the evolution process for NARs.

We will use a minimum support of 0.1 and a minimum confidence of 0.7. We will also use a population size of 500 and max_fitness_evaluations of 700. Larger values for max_fitness_evaluations tend to return larger rules encompassing more attributes. The population size parameter affects the number of NARs being generated and mutated. Larger values are slower but output more NARs.

Finally, as the DES algorithm is a randomized one, we need to set the seed parameter to the specially-selected value in order:

1.   to present you an interesting and illustrative example of NAR and,
2.   to ensure the repeatability of this example (i.e., that NAR found stays the same over different runs)


Note that if you do not set the seed parameter, the default value would be used.

Now, let's find NARs with Desbordante:

In [None]:
algo = desbordante.nar.algorithms.DES()
algo.load_data(table=dataset)
algo.execute(minconf=0.7, minsup=0.1, population_size=500, seed=5854,
             max_fitness_evaluations=700)
nars = algo.get_nars()
for nar in nars:
  print(nar)

{7: [1.506209 - 2.725071], 8: [With Training]} ===> {9: [4 - 9], 12: [3.956433 - 71.077742]}
{2: [Hound]} ===> {9: [5 - 8], 3: [5 - 9]}


The DES algorithm has found two NARs!

Let's print the second NAR in a more beautiful way:

In [None]:
DOWN_ARROW = "      |\n      |\n      V"

def print_rule_part(rule_part, columns):
    for column_index, value in rule_part.items():
        print(f'{columns[column_index]}{value}')

def print_nar(nar, df_columns):
    print_rule_part(nar.ante, df_columns)
    print(DOWN_ARROW)
    print_rule_part(nar.cons, df_columns)
    print(f"   support = {nar.support}")
    print(f"   confidence = {nar.confidence}")

print_nar(nars[1],dataset.columns)

Type[Hound]
      |
      |
      V
Intelligence[5 - 8]
Friendliness[5 - 9]
   support = 0.16352201257861634
   confidence = 0.9629629629629629


The above NAR states that about 96% of all dog breeds of type 'Hound' have an intelligence rating between 5 and 8 out of 10 and have a friendliness rating between 5 and 9 out of 10. This suggests that, in general, hounds are intelligent dogs and are mostly friendly. Let's see if that is true:

In [None]:
example_nar = nars[1]
min_intelligence = example_nar.cons[9].lower_bound
max_intelligence = example_nar.cons[9].upper_bound
min_friendliness = example_nar.cons[3].lower_bound
max_friendliness = example_nar.cons[3].upper_bound

def color_cells(x):
  df1=pd.DataFrame('',index=x.index,columns=x.columns)
  for i, (_, row) in enumerate(x.iterrows()):
    intelligence = row['Intelligence']
    friendliness = row['Friendliness']
    if (intelligence < min_intelligence or intelligence > max_intelligence or
        friendliness < min_friendliness or friendliness > max_friendliness):
      df1.iloc[i,0]='background-color:red;color:white;font-weight:bold'
      df1.iloc[i,1]='background-color:red;color:white;font-weight:bold'
      df1.iloc[i,2]='background-color:red;color:white;font-weight:bold'
      df1.iloc[i,3]='background-color:red;color:white;font-weight:bold'
  return df1

hound_rows = dataset[dataset['Type'] == 'Hound']
hound_rows = hound_rows[['Name','Type','Intelligence','Friendliness']]
hound_rows.style.apply(color_cells,axis=None)

Unnamed: 0,Name,Type,Intelligence,Friendliness
1,Afghan Hound,Hound,4,5
7,American Foxhound,Hound,6,8
11,Basenji,Hound,6,6
12,Basset Hound,Hound,5,8
13,Beagle,Hound,7,9
18,Bloodhound,Hound,6,7
21,Borzoi,Hound,6,6
44,Dachshund,Hound,7,7
50,English Foxhound,Hound,6,7
70,Greyhound,Hound,7,7


As observed, only 1 row with 'Type' equal to 'Hound' falls outside either the intelligence or friendliness bounds. This record accounts for the $\frac{27-1}{27} \approx 96\%$ confidence level of this rule.

# Second example

Let's try again, but this time with different settings. This time, minimum support will have a more lenient value of 0.05 and the population size will be 700. This will help discover more NARs. The value of max_fitness_evaluations will also need to be increased to 1500 in accordance with the population size to produce a non-empty result.

In [None]:
algo.execute(minconf=0.7, minsup=0.05, population_size=700,
                 max_fitness_evaluations=1500, seed=10)
nars = algo.get_nars()
for i, nar in enumerate(nars, start=1):
  print(f"NAR {i}:")
  print_nar(nar,dataset.columns)
  print()

NAR 1:
Intelligence[4 - 10]
Shedding[Moderate]
      |
      |
      V
Friendliness[6 - 10]
Life Span[9 - 16]
   support = 0.5660377358490566
   confidence = 0.9574468085106383

NAR 2:
Health Issues Risk[Moderate]
Life Span[8 - 14]
      |
      |
      V
Friendliness[5 - 8]
   support = 0.33962264150943394
   confidence = 0.7714285714285715

NAR 3:
Size[1 - 2]
Intelligence[5 - 8]
Grooming Needs[Moderate]
Weight[15.246273 - 68.261820]
      |
      |
      V
Shedding[Moderate]
   support = 0.05660377358490566
   confidence = 0.9

NAR 4:
Friendliness[5 - 10]
Exercise Requirements[1.708423 - 2.261994]
Type[Working]
      |
      |
      V
Life Span[10 - 16]
Training Difficulty[4 - 9]
   support = 0.08176100628930817
   confidence = 0.7222222222222222



These found NARs are less striking, but nevertheless they represent some thought-provoking facts.

# Conclusion

If you are reading this, then you have learnt about numerical association rules. Congratulations!

We have explored data and found that hounds are friendly and intelligent dogs.
We have also found a few more facts about different dog breeds using DES algorithm.

If you wish to find these patterns in your data, now you know how to do it 🙂
Also, you can learn more about other pattern types presented in [Desbordante](https://github.com/Desbordante/desbordante-core).