# Example of using the SDMap* algorithm

## About this document

The purpose of this document is to show an example of using the SDMap* algorithm.

In the following sections, an introduction of the this algorithm will be presented, followed by instructions to install the `subgroups` library. Then, the execution process of the SDMap* algorithm will be described, including the necessary steps to consider. Finally, the results obtained from the application of this algorithm will be presented, highlighting the information obtained in the output file and the one that can be accessed through the model properties.

## SDMap* algorithm

SDMap* [[1]](#1) is a subgroup discovery algorithm that modifies the original design of SDMap to add some pruning techniques that improve performance. These changes are:

- A list of best subgroups is used to prune patterns whose optimistic estimate is worse than the quality of the worst subgroup in the list.
- In each call to FpTree where the tree does not have a single path, selectors are sorted so that those with a better optimistic estimate are processed first.
- Each new pattern generated after this sorting is checked so that its optimistic estimate is better than the quality of the worst subgroup in the list of best subgroups. If it is not, the pattern is discarded.
- In the construction of conditional FpTrees, branches whose optimistic estimate is worse than the quality of the worst subgroup in the list of best subgroups are discarded.

## Installing the `subgroups` library

To install the `subgroups` library, simply execute the following cell:

In [None]:
!pip install subgroups

To verify that the installation was successful, we may run the following cell:

In [None]:
import subgroups.tests as st
st.run_all_tests()

## Running the SDMap* algorithm

To run the SDMap* algorithm on a dataset, it is necessary to follow these steps:

- Load the dataset into a Pandas `DataFrame` object.
- Set the target, which must be a tuple of the form (column_name, value).
- Select the quality measure and optimistic estimate to use.
- Create the SDMap* model with the desired parameters and run it.

The following is an example of running this algorithm on a small dataset:

In [None]:
from pandas import DataFrame
from subgroups.algorithms import SDMapStar
from subgroups.quality_measures import WRAcc
from subgroups.quality_measures import WRAccOptimisticEstimate1

dataset = DataFrame({'att1': ['v3', 'v2', 'v1', 'v3', 'v4', 'v4'], 'att2': ['1', '2', '3', '3', '5', '6'], 'att3': ['B', 'A', 'A', 'B', 'A', 'B'], 'class': ['0', '1', '0', '0', '1', '1']})
target = ("class", "1")

model = SDMapStar(WRAcc(), WRAccOptimisticEstimate1(), 0.01, num_subgroups=3, minimum_n = 0, write_results_in_file=True, file_path="./results.txt")
model.fit(dataset, target)

## Results

Running the following cell, we get the output of the first subgroups found by the algorithm:

In [None]:
subgroups_to_read = 10
with open("./results.txt", "r") as file:
    while(subgroups_to_read > 0):
        current_line = file.readline()
        print(current_line.strip())
        subgroups_to_read = subgroups_to_read - 1

We can also access different statistics about the result:

In [None]:
print("Pruned subgroups: ", model.pruned_subgroups) # Number of subgroups pruned by the threshold of the best subgroups
print("Conditional pruned branches: ", model.conditional_pruned_branches) # Number of branches pruned from some conditional FPTree by the threshold of the best subgroups
print("Selected subgroups: ", model.selected_subgroups) # Number of selected subgroups
print("Unselected subgroups: ", model.unselected_subgroups) # Number of unselected subgroups due to not meeting the minimum quality threshold
print("Visited nodes: ", model.visited_nodes) # Number of nodes (subgroups) visited from the search space

# References

<a id="1">[1]</a>
Atzmueller, M., Lemmerich, F. (2009). Fast Subgroup Discovery for Continuous Target Concepts. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds) Foundations of Intelligent Systems. ISMIS 2009. Lecture Notes in Computer Science(), vol 5722. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04125-9_7