# Quick start -- Use Case Example
For this example is used the [UCI adult dataset](https://archive.ics.uci.edu/ml/datasets/Adult) where the objective is to predict whether a person makes more (label 1) or less (0) than $50,000 a year.

In [1]:
import pandas as pd
from sklearn.datasets import fetch_openml
from sklearn.tree import DecisionTreeClassifier

#Import dataset
d = fetch_openml(data_id=1590, as_frame=True) # Adult dataset
X = d.data
d_train=pd.get_dummies(X)
y_true = (d.target == '>50K') * 1

#training the classifier
classifier = DecisionTreeClassifier(min_samples_leaf=10, max_depth=4)
classifier.fit(d_train, y_true)

#Producing y_pred
y_pred = classifier.predict(d_train)

In [2]:
X.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country
0,25,Private,226802,11th,7,Never-married,Machine-op-inspct,Own-child,Black,Male,0,0,40,United-States
1,38,Private,89814,HS-grad,9,Married-civ-spouse,Farming-fishing,Husband,White,Male,0,0,50,United-States
2,28,Local-gov,336951,Assoc-acdm,12,Married-civ-spouse,Protective-serv,Husband,White,Male,0,0,40,United-States
3,44,Private,160323,Some-college,10,Married-civ-spouse,Machine-op-inspct,Husband,Black,Male,7688,0,40,United-States
4,18,,103497,Some-college,10,Never-married,,Own-child,White,Female,0,0,30,United-States


In [4]:
y_pred

array([0, 0, 0, ..., 0, 0, 1])

In [5]:
y_true

0        0
1        0
2        1
3        1
4        0
        ..
48837    0
48838    1
48839    0
48840    0
48841    1
Name: class, Length: 48842, dtype: int64

## Use of the FairSD package
Here we use the DSSD (Diverse Subgroup Set Discovery) algorithm and the demographic_parity_difference (from Fairlearn) to find the top-k (k = 5 by default) subgroups that exert the greatest disparity.<br/>
The execute method return a **ResultSet object**.

In [None]:
import sys
sys.path.append('../..')
import faid.metrics.subgroupdiscovery as fsd
task=fsd.SubgroupDiscoveryTask(X, y_true, y_pred, qf = "demographic_parity_difference")
result_set=fsd.DSSD().execute(task)

### ResultSet object

We can transform the result set into a dataframe as shown below. Each row of this dataframe represents a subgroup.

In [3]:
df=result_set.to_dataframe()
display(df)

Unnamed: 0,quality,description,size,proportion
0,0.641066,"education-num = (10, 13] AND marital-status = ...",5846,0.119692
1,0.635219,"education-num = (10, 13] AND relationship = ""H...",5100,0.104418
2,0.588991,"education = ""Bachelors"" AND sex = ""Male"" AND r...",4983,0.102023
3,0.583581,"education = ""Bachelors"" AND sex = ""Male""",5548,0.113591
4,0.454152,"education = ""Bachelors"" AND race = ""White""",7034,0.144015


We can also print the result set or convert it into a string as shown below.

In [4]:
resultset_string = result_set.to_string()
print(result_set)

education-num = (10, 13] AND marital-status = "Married-civ-spouse" 
education-num = (10, 13] AND relationship = "Husband" 
education = "Bachelors" AND sex = "Male" AND race = "White" 
education = "Bachelors" AND sex = "Male" 
education = "Bachelors" AND race = "White" 



### Generate a feature from a subgroup
ResultSet basically contains a list of subgroup descriptions ([Description](https://github.com/MaurizioPulizzi/fairsd/blob/main/fairsd/sgdescription.py#L80) object).<br/>
Another intresting method of Resultset object allow us to 
**select a subgroup X from the result set and automatically generate the feature "Belong to subgroup X"**.This is very useful for deepening the analysis on the found subgroups, for example we can use the FairLearn library for this purpose.<br/>
An example is shown below:

In [None]:
from fairlearn.metrics import MetricFrame
from fairlearn.metrics import selection_rate

# Here we generate the feature "Belong to subgroup n. 0"
# The result is a pandas Series. The name of this Series is "sg0".
# This series contains an element for each instance of the dataset. Each element is True 
# iff the istance belong to the subgroup sg0
sg_feature = result_set.sg_feature(sg_index=0, X=X)

# Here we basically use the FairLearn library to further analyzing the subgroup sg0
selection_rate = MetricFrame(selection_rate, y_true, y_pred, sensitive_features=sg_feature)
print(selection_rate.by_group)

### Description object
We can obtain the subgroup feature also retrieving the relative Description object first:

In [6]:
description0 = result_set.get_description(0)
sg_feature = description0.to_boolean_array(dataset = X)
print(sg_feature)

0        False
1        False
2        False
3        False
4        False
         ...  
48837    False
48838    False
48839    False
48840    False
48841    False
Length: 48842, dtype: bool


Once we have the Description object of a subgroup, we can also extract other information of the subgroup.<br/>
We can:
 * convert the Description object into a string
 * retrieve the size of the subgroup
 * retrieve the quality (fairness measure) of the subgroup
 * retrieve the names of the attributes that compose the subgroup description

In [7]:
# String conversion
str_descr = description0.to_string()
print( str_descr ) # also print(description0) works

# Size
print( description0.size() )

# Quality
print( description0.get_quality() )

# Attribute names
print( description0.get_attributes() )

education = "Bachelors" AND marital-status = "Married-civ-spouse" 
4136
0.913501543416991
['education', 'marital-status']
