# Example of using the SDMap algorithm

## About this document

The purpose of this document is to show an example of using the SDMap algorithm.

In the following sections, an introduction of the this algorithm will be presented, followed by instructions to install the `subgroups` library. Then, the execution process of the SDMap algorithm will be described, including the necessary steps to consider. Finally, the results obtained from the application of this algorithm will be presented, highlighting the information obtained in the output file and the one that can be accessed through the model properties.

## SDMap algorithm

SDMap [[1]](#1) is an exhaustive subgroup discovery algorithm based on the FP-Growth [[2]](#2) algorithm for frequent pattern mining. This algorithm uses the FPTree data structure in order to represent the complete dataset and to mine subgroups in two steps; a complete FPTree is first built from the input dataset, after which successive conditional FPTrees are built recursively in order to mine the subgroups.

## Installing the `subgroups` library

To install the `subgroups` library, you have to execute the following cell:

In [1]:
!pip install subgroups



After that, to verify that the installation was successful, yo can run the following cell:

In [2]:
import subgroups.tests as st
st.run_all_tests()

test_Operator_evaluate_method (tests.core.test_operator.TestOperator.test_Operator_evaluate_method) ... ok
test_Operator_evaluate_method_with_pandasSeries (tests.core.test_operator.TestOperator.test_Operator_evaluate_method_with_pandasSeries) ... ok
test_Operator_generate_from_str_method (tests.core.test_operator.TestOperator.test_Operator_generate_from_str_method) ... ok
test_Operator_string_representation (tests.core.test_operator.TestOperator.test_Operator_string_representation) ... ok
test_Pattern_contains_method (tests.core.test_pattern.TestPattern.test_Pattern_contains_method) ... ok
test_Pattern_general (tests.core.test_pattern.TestPattern.test_Pattern_general) ... ok
test_Pattern_is_contained_method (tests.core.test_pattern.TestPattern.test_Pattern_is_contained_method) ... ok
test_Pattern_is_refinement_method (tests.core.test_pattern.TestPattern.test_Pattern_is_refinement_method) ... ok
test_Selector_attributes (tests.core.test_selector.TestSelector.test_Selector_attributes) ..



##################################
########## CORE PACKAGE ##########
##################################


##############################################
########## QUALITY MEASURES PACKAGE ##########
##############################################


#############################################
########## DATA STRUCTURES PACKAGE ##########
#############################################


test_FPTreeForSDMap_generate_conditional_fp_tree_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap.test_FPTreeForSDMap_generate_conditional_fp_tree_2) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_1 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap.test_FPTreeForSDMap_generate_set_of_frequent_selectors_1) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_2 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap.test_FPTreeForSDMap_generate_set_of_frequent_selectors_2) ... ok
test_FPTreeForSDMap_generate_set_of_frequent_selectors_3 (tests.data_structures.test_fp_tree_for_sdmap.TestFPTreeForSDMap.test_FPTreeForSDMap_generate_set_of_frequent_selectors_3) ... ok
test_FPTreeNode_general (tests.data_structures.test_fp_tree_node.TestFPTreeNode.test_FPTreeNode_general) ... ok
test_subgroup_list_1 (tests.data_structures.test_subgroup_list.TestSubgroupList.test_subgroup_list_1) ... ok
test_subgroup_list_2 (tests.data_structures



########################################
########## ALGORITHMS PACKAGE ##########
########################################


ok
test_BSD_cardinality (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_cardinality) ... ok
test_BSD_checkRel (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_checkRel) ... ok
test_BSD_checkRelevancies (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_checkRelevancies) ... ok
test_BSD_fit1 (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_fit1) ... ok
test_BSD_fit2 (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_fit2) ... ok
test_BSD_fit3 (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_fit3) ... ok
test_BSD_fit4 (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_fit4) ... ok
test_BSD_init_method (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_init_method) ... ok
test_BSD_logicalAnd (tests.algorithms.subgroup_sets.test_bsd.TestBSD.test_BSD_logicalAnd) ... ok
test_CBSD_checkRel (tests.algorithms.subgroup_sets.test_cbsd.TestCBSD.test_CBSD_checkRel) ... ok
test_CBSD_checkRelevancies (tests.algorithms.subgroup_sets.test

test_VLSD_fit_method_3 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_fit_method_3) ... ok
test_VLSD_fit_method_4 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_fit_method_4) ... ok
test_VLSD_fit_method_5 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_fit_method_5) ... ok
test_VLSD_fit_method_6 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_fit_method_6) ... ok
test_VLSD_fit_method_7 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_fit_method_7) ... ok
test_VLSD_init_method_1 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_init_method_1) ... ok
test_VLSD_init_method_2 (tests.algorithms.subgroup_sets.test_vlsd.TestVLSD.test_VLSD_init_method_2) ... ok

----------------------------------------------------------------------
Ran 91 tests in 2.525s

OK
test_dataframe_filters_general (tests.utils.test_dataframe_filters.TestDataFrameFilter.test_dataframe_filters_general) ... ok
test_to_input_format_for_subgroup_li



###################################
########## UTILS PACKAGE ##########
###################################


## Running the SDMap algorithm

To run the SDMap algorithm on a dataset, it is necessary to follow these steps:

- Load the dataset into a Pandas `DataFrame` object.
- Set the target, which must be a tuple of the form (column_name, value).
- Select the quality measure to use.
- Create the SDMap model with the desired parameters and run it.

The following is an example of running this algorithm on a dataset:

In [3]:
import pandas as pd
from subgroups.quality_measures import WRAcc
from subgroups.algorithms import SDMap

dataset = pd.DataFrame({'att1': ['v3', 'v2', 'v1'], 'att2': ['v1', 'v2', 'v3'], 'att3': ['v2', 'v1', 'v1'], 'class': ['no', 'yes', 'no']})
target = ('class', 'yes')

model = SDMap(quality_measure = WRAcc(), minimum_quality_measure_value = -1, minimum_n = 0, write_results_in_file = True, file_path = "./results.txt")
model.fit(dataset, target)

## Results

Running the following cell, we get the subgroups obtained by the algorithm:

In [4]:
with open("./results.txt", "r") as file:
    for current_line in file:
        print(current_line.strip())

Description: [att1 = 'v3'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description: [att2 = 'v1'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description: [att1 = 'v3', att2 = 'v1'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description: [att3 = 'v2'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description: [att2 = 'v1', att3 = 'v2'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description: [att1 = 'v3', att3 = 'v2'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description: [att1 = 'v3', att2 = 'v1', att3 = 'v2'], Target: class = 'yes' ; Quality Measure WRAcc = -0.1111111111111111 ; tp = 0 ; fp = 1 ; TP = 1 ; FP = 2
Description

Each of these lines represents a subgroup discovered by the algorithm along with some of its characteristics. If we take the first result as an example, we have the following characteristics:

- The subgroup is described by the pattern `[att1 = 'v3']`.
- The target is the one we defined initially, i.e., `class = 'yes'`.
- The quality of the subgroup is measured by the WRAcc measure, which has a value of -0.1111111111111111
- The values of tp, fp, TP, and FP are as follows: tp = 0 ; fp = 1 ; TP = 1 ; FP = 2.

These results have been verified in the output file of the SDMap algorithm run on a toy dataset.

We can also access different statistics about the result:

In [5]:
print("Selected subgroups: ", model.selected_subgroups) # Number of selected subgroups
print("Unselected subgroups: ", model.unselected_subgroups) # Number of unselected subgroups due to not meeting the minimum quality threshold
print("Visited nodes: ", model.visited_nodes) # Number of nodes (subgroups) visited from the search space

Selected subgroups:  20
Unselected subgroups:  0
Visited nodes:  20


# References

<a id="1">[1]</a>
Atzmueller, M., Puppe, F. (2006). SD-Map – A Fast Algorithm for Exhaustive Subgroup Discovery. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Knowledge Discovery in Databases: PKDD 2006. PKDD 2006. Lecture Notes in Computer Science, vol 4213. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871637_6

<a id="2">[2]</a> 
Han, J., Pei, J., Yin, Y. et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach. Data Mining and Knowledge Discovery 8, 53–87 (2004). https://doi.org/10.1023/B:DAMI.0000005258.31418.83