# Example of use of the BSD algorithm

## About this document

The purpose of this document is to show an example of use of the BSD algorithm.

In the following sections, a detailed introduction of the BSD algorithm will be presented, followed by instructions to install the `subgroups` library. Then, the execution process of the algorithm will be described, including the necessary steps to consider.

Finally, the results obtained from the application of the BSD algorithm will be presented, highlighting the information obtained in the output file and the one that can be accessed through the model properties.

## The BSD algorithm

BSD is a subgroup discovery algorithm that introduces the concept of dominance relation between subgroups. This algorithm also uses a list of the $k$ best subgroups along with an optimistic estimation to prune the search space.

To handle the dominances between subgroups, BSD uses a bitset that stores for each discovered pattern and each row of the dataset whether the pattern appears in the row or not.

Regarding the dominance relation, we will say that a subgroup $S$ makes another subgroup $S'$ irrelevant by dominance if and only if the positive instances of the bitset of subgroup $S'$ are included in the positive instances of subgroup $S$ and the negative instances of subgroup $S$ are included in the negative instances of subgroup $S'$.

The BSD was introduced in [[1]](#1) and can be described as follows:

[![BSD Algorithm](https://i.imgur.com/ms2ruHz.png)](https://i.imgur.com/ms2ruHz.png)

## Installing `subgroups` library

To install the `subgroups` library, simply execute the following code:

In [1]:
!pip install subgroups

Collecting subgroups
  Downloading subgroups-0.1.0-py3-none-any.whl (171 kB)
Installing collected packages: subgroups
Successfully installed subgroups-0.1.0


Finally, to verify that the installation was successful, you can run the following python code:

In [2]:
import subgroups.tests as st
st.run_all_tests()

test_Operator_evaluate_method (tests.core.test_operator.TestOperator) ... ok
test_Operator_evaluate_method_with_pandasSeries (tests.core.test_operator.TestOperator) ... ok
test_Operator_generate_from_str_method (tests.core.test_operator.TestOperator) ... ok
test_Operator_string_representation (tests.core.test_operator.TestOperator) ... ok
test_Pattern_contains_method (tests.core.test_pattern.TestPattern) ... ok
test_Pattern_general (tests.core.test_pattern.TestPattern) ... ok
test_Pattern_is_contained_method (tests.core.test_pattern.TestPattern) ... ok
test_Pattern_is_refinement_method (tests.core.test_pattern.TestPattern) ... ok
test_Selector_attributes (tests.core.test_selector.TestSelector) ... ok
test_Selector_comparisons (tests.core.test_selector.TestSelector) ... ok
test_Selector_creation_process (tests.core.test_selector.TestSelector) ... ok
test_Selector_deletion_process (tests.core.test_selector.TestSelector) ... ok
test_Selector_generate_from_str_method (tests.core.test_selec



##################################
########## CORE PACKAGE ##########
##################################


##############################################
########## QUALITY MEASURES PACKAGE ##########
##############################################


#############################################
########## DATA STRUCTURES PACKAGE ##########
#############################################


ok
test_FPTreeForSDMap_generate_conditional_fp_tree_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_1 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_3 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap) ... ok
test_FPTreeNode_general (tests.data_structures.test_fp_tree_node.TestFPTreeNode) ... ok
test_subgroup_list_1 (tests.data_structures.test_subgroup_list.TestSubgroupList) ... ok
test_subgroup_list_2 (tests.data_structures.test_subgroup_list.TestSubgroupList) ... ok
test_subgroup_list_3 (tests.data_structures.test_subgroup_list.TestSubgroupList) ... ok
test_subgroup_list_4 (tests.data_structures.test_subgroup_list.TestSubgroupList) ... ok
test_vertical_list_1 (tests.data_structures



########################################
########## ALGORITHMS PACKAGE ##########
########################################


ok
test_BSD_cardinality (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_checkRel (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_checkRelevancies (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_fit1 (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_fit2 (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_fit3 (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_fit4 (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_init_method (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_BSD_logicalAnd (tests.algorithms.subgroup_sets.test_bsd.TestBSD) ... ok
test_CBSD_checkRel (tests.algorithms.subgroup_sets.test_cbsd.TestCBSD) ... ok
test_CBSD_checkRelevancies (tests.algorithms.subgroup_sets.test_cbsd.TestCBSD) ... ok
test_CBSD_fit1 (tests.algorithms.subgroup_sets.test_cbsd.TestCBSD) ... ok
test_CBSD_fit2 (tests.algorithms.subgroup_sets.test_cbsd.TestCBSD) ... ok
test_CBSD_

test_VLSD_fit_method_7 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD) ... ok
test_VLSD_init_method_1 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD) ... ok
test_VLSD_init_method_2 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD) ... ok

ERROR: test_CN2SD_fit_method_4 (tests.algorithms.subgroup_sets.test_cn2sd.TestCN2SD)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\almc\anaconda3\lib\site-packages\subgroups\tests\algorithms\subgroup_sets\test_cn2sd.py", line 127, in test_CN2SD_fit_method_4
    input_dataframe = pandas.read_csv("subgroups/datasets/csv/shop.csv")
  File "C:\Users\almc\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\almc\anaconda3\lib\site-packages\pandas\io\parsers\readers.py", line 577, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "C:\Users\almc\anaconda3\lib\site-



###################################
########## UTILS PACKAGE ##########
###################################


ok

----------------------------------------------------------------------
Ran 1 test in 0.007s

OK


## References

<a id="1">[1]</a>
Florian Lemmerich, Mathias Rohlfs, & Martin Atzmueller. (2010, May). Fast discovery of
relevant subgroup patterns. In Twenty-Third International FLAIRS Conference. 428-433