# Ejemplo de uso de CN2SD
## About this document
The purpose of this document is to show an example of using the CN2SD algorithm.

The following section will provide a detailed introduction to the CN2SD algorithm, followed by instructions on how to install the subgroups library in this environment. Subsequently, the process of running the algorithm will be described, including the necessary steps and important parameters to be taken into account.

Finally, the results obtained from the application of the CN2SD algorithm will be presented, highlighting the information obtained in the output file and the information that can be found by accessing the model properties.

## CN2SD Algorithm
The CN2SD algorithm is a subgroup discovery algorithm based on a rule induction system by adapting a classification rule method called CN2.
The modifications that are carried out are :

-Substitution of the precision-based search heuristic by a new unusualness heuristic combining the generality and the precision of the rule.

-Incorporation of the weighting of the examples in the coverage algorithm.

-Incorporation of example weightings into the unusualness search heuristic.

-Utilizing a probabilistic classification based on the class distribution of examples covered by individual rules.

The WRACC quality measure is used to evaluate the quality measures of the subgroups.

The CN2 algorithm is defined as follow [[1]](#1):

![CN2][(https://ibb.co/xhDd98b)](https://ibb.co/xhDd98b)


## Instalación de la librería `subgroups`

Para instalar la librería `subgroups` en este entorno, basta con ejecutar la siguiente celda:

In [None]:
!pip install subgroups --no-cache-dir

Collecting subgroups
  Downloading subgroups-0.0.10-py3-none-any.whl (116 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.7/116.7 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
Collecting bitarray>=2.3.0 (from subgroups)
  Downloading bitarray-2.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (273 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m273.6/273.6 kB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bitarray, subgroups
Successfully installed bitarray-2.7.6 subgroups-0.0.10


To check that the installation has been successful, the following cell can be run:

In [None]:
import subgroups.tests as st
st.run_all_tests()

## Running the algorithm

To run the CN2SD algorithm on a dataset, the following steps are necessary:

- Load the dataset into a Pandas `DataFrame` object and the target (column).
- Create the CN2SD model with the desired parameters and run it.

Below is an example of running this algorithm on a small dataset:


In [None]:
from pandas import DataFrame
from subgroups.algorithms.individual_subgroups.nominal_target.cn2sd import CN2SD

input_dataframe = DataFrame({'bread': ['yes', 'yes', 'no', 'yes', 'yes', 'yes', 'yes'], 'milk': ['yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes'], 'beer': ['no', 'yes', 'yes', 'yes', 'no' ,'yes' ,'no'], 'coke': ['no', 'no', 'yes', 'no', 'yes', 'no', 'yes'], 'diaper' : ['no', 'yes', 'yes', 'yes', 'yes', 'yes','yes']})
target = ("diaper")
model = CN2SD(beam_width = 2, weighting_scheme = 'aditive',write_results_in_file=True,file_path="./results.txt")
binary_attributes = ["bread", "milk", "beer", "coke", "diaper"]
result = model.fit(input_dataframe, target, binary_attributes)


## Results

Running the following cell, we get the output of the first subgroups found by the algorithm:

```
Description: [beer = 'no', coke = 'no'], Target: diaper = 'no' ; Quality Measure WRACC = 0.12244897959183673 ; tp = 1 ; fp = 0 ; TP = 1 ; FP = 6
Description: [beer = 'no', coke = 'no'], Target: diaper = 'no' ; Quality Measure WRACC = 0.12244897959183673 ; tp = 1 ; fp = 0 ; TP = 1 ; FP = 6
Description: [beer = 'no'], Target: diaper = 'no' ; Quality Measure WRACC = 0.04733727810650888 ; tp = 0.5 ; fp = 2 ; TP = 0.5 ; FP = 6
.
.
.
Description: [coke = 'yes'], Target: diaper = 'yes' ; Quality Measure WRACC = 0.16000000000000003 ; tp = 0.25 ; fp = 0 ; TP = 0.25 ; FP = 1
Description: [coke = 'yes'], Target: diaper = 'yes' ; Quality Measure WRACC = 0.16000000000000003 ; tp = 0.25 ; fp = 0 ; TP = 0.25 ; FP = 1
Description: [coke = 'yes'], Target: diaper = 'yes' ; Quality Measure WRACC = 0.1487603305785124 ; tp = 0.2222222222222222 ; fp = 0 ; TP = 0.2222222222222222 ; FP = 1
Description: [coke = 'yes'], Target: diaper = 'yes' ; Quality Measure WRACC = 0.1487603305785124 ; tp = 0.2222222222222222 ; fp = 0 ; TP = 0.2222222222222222 ; FP = 1
Description: [coke = 'yes'], Target: diaper = 'yes' ; Quality Measure WRACC = 0.1487603305785124 ; tp = 0.2222222222222222 ; fp =
```
Each of these lines represents a subgroup discovered by the algorithm. Taking the first result as an example, we have the following characteristics:
- The subgroup is described by the condition `[beer = 'no', coke = 'no']`.
- The target is the one we defined in the first place, i.e., `diaper`.
- The quality of the subgroup is measured with the WRAcc measure, which has a value of 0.122...
- The values of tp, fp, TP, and FP are as follows: tp = 1 ; fp = 0 ; TP = 1 ; FP = 6.

These results have been verified in the output file of the execution of the CN2SD algorithm on a toy dataset.

We can also access different statistics about the result:


In [None]:
print(model.selected_subgroups) # Number of selected subgroups
print(model.unselected_subgroups) # # Number of unselected subgroups
print(model.visited_nodes) # # Number of generated subgroups

## References
<a id="1">[1]</a>
Clark, Peter & Niblett, Tim. (2000). Induction in Noisy Domains.