# SD usage example

## About this document

The purpose of this document is to show an example of how to use the SD algorithm.

In the following section, a detailed introduction of the SD algorithm will be given, followed by instructions to install the `subgroups` library in this environment. Then, the process of executing the algorithm will be described, including the necessary to consider.

Finally, the results obtained from the application of the SD algorithm will be presented, highlighting the information obtained in the output file and the information that can be found by accessing the properties of the model.

## The SD algorithm

The SD algorithm is a subgroup discovery algorithm based on a rule induction system.
The subgroups must satisfy the criteria of the Support quality measure. The algorithm keeps the best subgroup descriptions in a fixed-width beam and at each iteration a conjunction is added to each subgroup description in the beam, replacing it with the worst subgroup in the beam if it is better than it.

The quality measure Qg is used to evaluate the quality of the subgroups. Subgroups considered to be of high quality should comprise as many examples with target value and as few examples without target value as possible.


The SD algorithm is defined as follow [[1]](#1):

[![SD](https://ibb.co/CvPQZnV)](https://ibb.co/CvPQZnV)



## Installing the `subgroups` library

To install the `subgroups` library in this environment, simply execute the following cell:

In [None]:
!pip install subgroups --no-cache-dir

Collecting subgroups
  Downloading subgroups-0.0.10-py3-none-any.whl (116 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/116.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.7/116.7 kB[0m [31m35.8 MB/s[0m eta [36m0:00:00[0m
Collecting bitarray>=2.3.0 (from subgroups)
  Downloading bitarray-2.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (273 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m273.6/273.6 kB[0m [31m310.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: bitarray, subgroups
Successfully installed bitarray-2.7.6 subgroups-0.0.10


To verify that the installation was successful, you can run the following cell:


In [None]:
import subgroups.tests as st
st.run_all_tests()

test_Operator_evaluate_method (tests.core.test_operator.TestOperator) ... ok
test_Operator_evaluate_method_with_pandasSeries (tests.core.test_operator.TestOperator) ... ok
test_Operator_generate_from_str_method (tests.core.test_operator.TestOperator) ... ok
test_Operator_string_representation (tests.core.test_operator.TestOperator) ... ok
test_Pattern_general (tests.core.test_pattern.TestPattern) ... ok
test_Pattern_is_contained_method (tests.core.test_pattern.TestPattern) ... ok
test_Pattern_is_refinement_method (tests.core.test_pattern.TestPattern) ... ok
test_Selector_attributes (tests.core.test_selector.TestSelector) ... ok
test_Selector_comparisons (tests.core.test_selector.TestSelector) ... ok
test_Selector_creation_process (tests.core.test_selector.TestSelector) ... ok
test_Selector_deletion_process (tests.core.test_selector.TestSelector) ... ok
test_Selector_generate_from_str_method (tests.core.test_selector.TestSelector) ... ok
test_Selector_match_method (tests.core.test_selec



##################################
########## CORE PACKAGE ##########
##################################


##############################################
########## QUALITY MEASURES PACKAGE ##########
##############################################


#############################################
########## DATA STRUCTURES PACKAGE ##########
#############################################


ok
test_FPTreeForSDMap_build_tree_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_build_tree_3 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_build_tree_4 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_conditional_fp_tree_1 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_conditional_fp_tree_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_1 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_3 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeNode_general (tests.data_struc



########################################
########## ALGORITHMS PACKAGE ##########
########################################


ok
test_SDMap_fpgrowth_method_2 (tests.algorithms.individual_subgroups.nominal_target.test_sdmap.TestSDMap) ... ok
test_SDMap_fpgrowth_method_3 (tests.algorithms.individual_subgroups.nominal_target.test_sdmap.TestSDMap) ... ok
test_SDMap_fpgrowth_method_4 (tests.algorithms.individual_subgroups.nominal_target.test_sdmap.TestSDMap) ... ok
test_SDMap_init_method_1 (tests.algorithms.individual_subgroups.nominal_target.test_sdmap.TestSDMap) ... ok
test_SDMap_init_method_2 (tests.algorithms.individual_subgroups.nominal_target.test_sdmap.TestSDMap) ... ok
test_SDMap_unselected_and_selected_subgroups (tests.algorithms.individual_subgroups.nominal_target.test_sdmap.TestSDMap) ... ok
test_VLSD_additional_parameters_in_fit_method (tests.algorithms.individual_subgroups.nominal_target.test_vlsd.TestVLSD) ... ok
test_VLSD_fit_method_1 (tests.algorithms.individual_subgroups.nominal_target.test_vlsd.TestVLSD) ... ok
test_VLSD_fit_method_2 (tests.algorithms.individual_subgroups.nominal_target.test_vlsd



###################################
########## UTILS PACKAGE ##########
###################################


## Running the algorithm

To run the SD algorithm on a dataset, the following steps are necessary:

- Load the dataset into a Pandas `DataFrame` object and the target (column, value).
- Create the SD model with the desired parameters and run it.

Below is an example of running this algorithm on a small dataset:

In [None]:

from pandas import DataFrame
from subgroups.algorithms.individual_subgroups.nominal_target.sd import SD

input_dataframe = DataFrame({'bread': ['yes', 'yes', 'no', 'yes', 'yes', 'yes', 'yes'], 'milk': ['yes', 'no', 'yes', 'yes', 'yes', 'yes', 'yes'], 'beer': ['no', 'yes', 'yes', 'yes', 'no' ,'yes' ,'no'], 'coke': ['no', 'no', 'yes', 'no', 'yes', 'no', 'yes'], 'diaper' : ['no', 'yes', 'yes', 'yes', 'yes', 'yes','yes']})
target = ("diaper", "yes")
model = SD(minimum_quality_measure_value= 0.57, g_parameter= 1,beam_width = 2, write_results_in_file=True, file_path="./results.txt")
binary_attributes = ["bread", "milk", "beer", "coke", "diaper"]
result = model.fit(input_dataframe, target, binary_attributes)


ModuleNotFoundError: ignored

## Results

Running the following cell, we get the output of the first subgroups found by the algorithm:

```
[Description: [bread = 'yes'], Target: diaper = 'yes'] ; Quality Measure : Support = 0.71 , Qg = 2.5; tp = 5 ; fp = 1 ; TP = 6 ; FP = 1
[Description: [milk = 'yes'], Target: diaper = 'yes'] ; Quality Measure : Support = 0.71 , Qg = 2.5; tp = 5 ; fp = 1 ; TP = 6 ; FP = 1
[Description: [beer = 'yes'], Target: diaper = 'yes'] ; Quality Measure : Support = 0.57 , Qg = 4.0; tp = 4 ; fp = 0 ; TP = 6 ; FP = 1

```
Each of these lines represents a subgroup discovered by the algorithm. Taking the fourth result as an example, we have the following characteristics:
- The subgroup is described by the condition `[bread = 'yes']`.
- The target is the one we defined in the first place, i.e., `diaper = 'yes'`.
- Subgroup support is measured with the Support measure, which has a value of 0.71.
- The quality of the subgroup is measured with the Qg measure, which has a value of 2.5
- The values of tp, fp, TP, and FP are as follows: tp = 5 ; fp = 1 ; TP = 6 ; FP = 1.

These results have been verified in the output file of the execution of the BSD algorithm on a toy dataset.

We can also access different statistics about the result:


In [None]:
print(model.selected_subgroups) # Number of selected subgroups
print(model.unselected_subgroups) # Number of unselected subgroups
print(model.visited_nodes) # Number of generated subgroups

NameError: ignored

## References
<a id="1">[1]</a>
Gamberger, Dragan & Lavrac, Nada. (2011). Expert-Guided Subgroup Discovery: Methodology and Application. Journal of Artificial Intelligence Research - JAIR. 17. 10.1613/jair.1089.