# Equidad - Fairness - en el Aprendizaje Automático (ML) para problemas de clasificación


* En ML, un modelo es ***justo*** o ***tiene equidad*** si ***sus predicciones son independientes de un cierto conjunto de variables*** que consideramos ***sensibles*** (ejm-> genero, etnia, religión, edad, estado civil, orientación sexual, etc.)


* En problemas de clasificación, los modelos aprenden una función $h(X)$ para predecir una variable discreta $Y$, a partir de unas características conocidas $X$.


## Criterios de Equidad


* Se han definido 3 criterios de equidad (independencia, separación y suficiencia) para evaluar si un clasificador es justo; es decir, si sus predicciones no están influenciadas por alguna/s de las variables sensibles.


* Para evaluar estos 3 criterios consideraremos:

    + $X$: Conjunto de variables (características) que describen a un elemento.
    + $A$: Variable aleatoria protegida o sensible (genero, etnia, edad, etc.).
    + $h(X)$: Modelo de clasificación (función predictora).
    + $Y$: Predicción del clasificador (y_predict)
    + $T$: Target (y_true)
    

* Para ver un ejemplo de cálculo de estos criterios, usaremos el siguiente "dataset" de ejemplo donde:

    + $A -> genero$: toma los valores $\{'Hombre', 'Mujer'\}$
    + $T -> target$: es binario y toma los valores $\{0: negativo, 1: positivo\}$ 
    + $Y -> predicción$: es binario y toma los valores $\{0: negativo, 1: positivo\}$ 
    

|id|genero|T: target|Y: predicción|
|--|--|--|--|
|1|Hombre|1|1|
|2|Hombre|1|1|
|3|Mujer|0|0|
|4|Hombre|0|1|
|5|Mujer|1|0|
|6|Hombre|1|0|
|7|Hombre|1|1|
|8|Mujer|1|1|
|9|Hombre|0|0|
|10|Mujer|0|0|




### 1.- Independencia

* Decimos que las variables aleatorias $(Y,A)$ satisfacen la independencia si la variable sensible $A$ es estadísticamente independiente a la predicción $Y$.


$$P(Y=y \mid  A=a) = P(Y=y \mid  A=b)$$

#### ¿Cumple el criterio de Independencia?


* Evidentemente en cualquier caso real nunca van a ser las probabilidades iguales, por lo que hay que establecer un umbral $\epsilon$ en el que se considere que cumple o no el criterio de independencia.


* Por tanto el modelo cumple el criterio de independencia si:


$$\left | \  P(Y=y \mid  A=a) - P(Y=y \mid  A=b) \ \right | \leq  \epsilon$$


#### Ejemplo:


* *La probabilidad de ser clasificado por el algoritmo en cada uno de los grupos* $\{0: negativo, 1: positivo\}$ *es la misma para dos elementos (individuos) con características sensibles distinta*s $\{'Hombre', 'Mujer'\}$.


$$P(Y=1 \mid  A=Hombre) = P(Y=1 \mid  A=Mujer)$$


* $P(Y=1 \mid  A=Hombre) = \frac{4}{6} = \frac{4 \ predicción \ =1}{6 \ hombres} = 0.66$:

|id|genero|Y: predicción|
|--|--|--|--|
|1|Hombre|1|
|2|Hombre|1|
|4|Hombre|1|
|6|Hombre|0|
|7|Hombre|1|
|9|Hombre|0|

* $P(Y=1 \mid  A=Mujer) = \frac{1}{4} = \frac{1 \ predicción \ =1}{4 \ mujeres} = 0.25$:

|id|genero|Y: predicción|
|--|--|--|--|
|3|Mujer|0|
|5|Mujer|0|
|8|Mujer|1|
|10|Mujer|0|


* Por tanto:

$$ \frac{4}{6} \neq \frac{1}{4} \rightarrow 0.66 \neq 0.25$$



* De manera similar, podemos ver la independencia frente al target (y_true) para saber si la realidad esta sesgada:


$$P(T=t \mid  A=a) = P(T=t \mid  A=b)$$


$$P(T=1 \mid  A=Hombre) = P(T=1 \mid  A=Mujer)$$ 


$$\frac{4}{6} \neq \frac{2}{4} \rightarrow 0.66 \neq 0.5$$





### 2.- Separación


* Decimos que las variables aleatorias $(Y,A,T)$ satisfacen la separación si la variable sensible $A$ es estadisticamente independientes a la predicción $Y$ dado el valor objetivo $T$.


* "*La probabilidad de predecir un Verdadero Positivo y un Falso Positivo para cada grupo debe de ser la misma*".

$$P(Y=1 \mid T=1, A=a) = P(Y=1 \mid T=1, A=b)$$
$$P(Y=1 \mid T=0, A=a) = P(Y=1 \mid T=0, A=b)$$


* Una ligera "simplificación" de este criterio sería la de tomar solo la probablidad de Verdaderos Positivos, asumiendo que "elementos similares" deben de ser tratados por igual.

#### ¿Cumple el criterio de Separación?


* Una relajación del criterio de separación vendría dado por que la diferencia entre tasas no superase un determinado umbral $\epsilon$


$$\left | \  P(Y=1 \mid T=1, A=a) - P(Y=1 \mid T=1, A=b) \ \right | \leq  \epsilon$$



#### Ejemplo:


$$P(Y=1 \mid T=1, A=Hombre) = P(Y=1 \mid T=1, A=Mujer)$$


* $P(Y=1 \mid T=1, A=Hombre) = \frac{3}{4} = \frac{3 \ predicción \ = 1}{4 \ hombres \ target \ =1} = 0.75$:


|id|genero|T: target|Y: predicción|
|--|--|--|--|
|1|Hombre|1|1|
|2|Hombre|1|1|
|6|Hombre|1|0|
|7|Hombre|1|1|


* $P(Y=1 \mid T=1, A=Mujer) = \frac{1}{2} = \frac{1 \ predicción \ = 1}{2 \ mujeres \ target=1} = 0.5$:


|id|genero|T: target|Y: predicción|
|--|--|--|--|
|5|Mujer|1|0|
|8|Mujer|1|1|


* Por tanto:

$$ \frac{3}{4} \neq \frac{1}{2} \rightarrow 0.75 \neq 0.5$$


### 3.- Suficiencia


* Decimos que las variables aleatorias $(Y,A,T)$ satisfacen la suficiencia si la variable sensible $A$ es estadísticamente independiente al valor objetivo $T$ dada la predicción $Y$.


$$P(T=1 \mid Y=1, A=a) = P(T=1 \mid Y=1, A=b)$$


* Esto significa que la probabilidad de estar en realidad en cada uno de los grupos es la misma para dos individuos con características sensibles distintas dado que la predicción los englobe en el mismo grupo.


#### ¿Cumple el criterio de Suficiencia?


* Una relajación del criterio de suficiencia vendría dado por que la diferencia entre tasas no superase un determinado umbral $\epsilon$


$$\left | \  P(T=1 \mid Y=1, A=a) - P(T=1 \mid Y=1, A=b) \ \right | \leq  \epsilon$$


#### Ejemplo:


$$P(T=1 \mid Y=1, A=Hombre) = P(T=1 \mid Y=1, A=Mujer)$$



* $P(T=1 \mid Y=1, A=Hombre) = \frac{3}{4} = \frac{3 \ target = 1}{4 \ hombres \ prediccion \ =1} = 0.75$

|id|genero|T: target|Y: predicción|
|--|--|--|--|
|1|Hombre|1|1|
|2|Hombre|1|1|
|4|Hombre|0|1|
|7|Hombre|1|1|


* $P(T=1 \mid Y=1, A=Mujer) = \frac{1}{1} = \frac{1 \ target = 1}{1 \ Mujer \ prediccion \ =1} = 1.0$


|id|genero|T: target|Y: predicción|
|--|--|--|--|
|8|Mujer|1|1|


* Por tanto:

$$ \frac{3}{4} \neq \frac{1}{1} \rightarrow 0.75 \neq 1.0$$



<hr>

# IMPLEMENTACIÓN



In [27]:
import pandas as pd

# Definimos el Dataset de ejemplo
df_dataset = pd.DataFrame(
    {
        'Genero': ['Hombre', 'Hombre', 'Mujer', 'Hombre', 'Mujer', 'Hombre', 'Hombre', 'Mujer', 'Hombre', 'Mujer'],
        'y_true':    ['SI', 'SI', 'NO', 'NO', 'SI', 'SI', 'SI', 'SI', 'NO', 'NO'],
        'y_predict': ['SI', 'SI', 'NO', 'SI', 'NO', 'NO', 'SI', 'SI', 'NO', 'NO']}, 
    columns=['Genero', 'y_true', 'y_predict'])

df_dataset

Unnamed: 0,Genero,y_true,y_predict
0,Hombre,SI,SI
1,Hombre,SI,SI
2,Mujer,NO,NO
3,Hombre,NO,SI
4,Mujer,SI,NO
5,Hombre,SI,NO
6,Hombre,SI,SI
7,Mujer,SI,SI
8,Hombre,NO,NO
9,Mujer,NO,NO


In [143]:
from typing import Dict, List

import pandas as pd

from copy import deepcopy


OTHER = 'OTHER'

class Fairness():
    
    def __init__(self, fairness_params: Dict):
        self.fairness_params: Dict = fairness_params
            
            
    def fit_fairness(self, df_dataset: pd.DataFrame, sensitive_cols: List[str], target_col: str, predict_col: str) -> List[Dict]:
        
        metrics = list()
        
        # Obtengo los distintos valores del target
        ground_truth_values = df_dataset[predict_col].unique()
        for ground_truth in ground_truth_values:
        
            # Para cada una de las variables sensibles
            for column in sensitive_cols:
            
                # Obtengo los distintos valores de la variable sensible; para ver si es o no binaria
                sensitive_values = df_dataset[column].unique()
                is_sensitive_col_binary = True if len(sensitive_values) == 2 else False
            

                # Para cada uno de los valores sensibles
                for sensitive_value in sensitive_values:
                    
                    # Binarizo las predicciones y target del Dataset
                    df_process = deepcopy(df_dataset)
                    df_process.loc[df_process[target_col] != ground_truth, target_col] = OTHER
                    df_process.loc[df_process[predict_col] != ground_truth, predict_col] = OTHER
                    
                    # Si sensitive feature es binaria -> "1 sensitive feature"
                    if is_sensitive_col_binary:
#                     if False:
                        print("\n{} - {} - {}".format(column, ", ".join(sensitive_values), ground_truth))
                        # Obtengo los resultados
                        independence, separation, sufficience = self.fit_metrics(df=df_process,
                                                                                 sensitive_col=column,
                                                                                 target_col=target_col,
                                                                                 predict_col=predict_col,
                                                                                 ground_truth=ground_truth,
                                                                                 sensitive_values=sensitive_values)
                        score_weight = self.score_weight(df=df_process, 
                                                         groupby_cols=[predict_col], 
                                                         sensitive_col=column, 
                                                         predict_col=predict_col,
                                                         ground_truth=ground_truth)
                        metrics.append({'Sensitive_Feature': column,
                                        'is_Binary_Sensitive_feature': is_sensitive_col_binary,
                                        'Sensitive_Value': ", ".join(sensitive_values),
                                        'Ground_Truth': ground_truth, 
                                        'Independence_score': independence,
                                        'Separation_score': separation,
                                        'Sufficience_score': sufficience, 
                                        'Score_weight': score_weight})
                        break
                    else:
                        print("\n{} - {} - {}".format(column, sensitive_value, ground_truth))
                        
                        # Binarizo los valores sensibles para analizarlos de 1 en 1
                        df_process.loc[df_process[column] != sensitive_value, column] = OTHER
                        # Obtengo los resultados
                        independence, separation, sufficience = self.fit_metrics(df=df_process,
                                                                                 sensitive_col=column,
                                                                                 target_col=target_col,
                                                                                 predict_col=predict_col,
                                                                                 ground_truth=ground_truth,
                                                                                 sensitive_values=[sensitive_value, OTHER])
                        score_weight = self.score_weight(df=df_process, 
                                                         groupby_cols=[column, predict_col], 
                                                         sensitive_col=column, 
                                                         predict_col=predict_col,
                                                         ground_truth=ground_truth)
                        metrics.append({'Sensitive_Feature': column,
                                        'is_Binary_Sensitive_feature': is_sensitive_col_binary,
                                        'Sensitive_Value': sensitive_value,
                                        'Ground_Truth': ground_truth,
                                        'Independence_score': independence,
                                        'Separation_score': separation, 
                                        'Sufficience_score': sufficience, 
                                        'Score_weight': score_weight})
            
        return metrics
    
    
    def fit_metrics(self, df: pd.DataFrame, sensitive_col: str, target_col: str, predict_col: str,
                       ground_truth: str, sensitive_values: List[str]):
        
        independence = self.fit_independence(df=df,
                                             sensitive_col=sensitive_col,
                                             predict_col=predict_col,
                                             ground_truth=ground_truth,
                                             sensitive_values=sensitive_values)
        print("Independence: {}\n".format(independence))
        separation = self.fit_separation(df=df,
                                         sensitive_col=sensitive_col,
                                         target_col=target_col,
                                         predict_col=predict_col,
                                         ground_truth=ground_truth,
                                         sensitive_values=sensitive_values)
        print("Separation: {}\n".format(separation))
        sufficience = self.fit_sufficiency(df=df,
                                           sensitive_col=sensitive_col,
                                           target_col=target_col,
                                           predict_col=predict_col,
                                           ground_truth=ground_truth,
                                           sensitive_values=sensitive_values)
        print("Sufficience: {}\n\n".format(sufficience))
        
        return independence, separation, sufficience
        
    
    def fit_independence(self, df: pd.DataFrame, sensitive_col: str, predict_col: str, 
                         ground_truth: str, sensitive_values: List[str]) -> float:
        """
        A-> Variable sensible
        Y-> Predicción
        P(Y=y∣A=a) == P(Y=y∣A=b)
        """
        prob_a = (((df[(df[sensitive_col]==sensitive_values[0]) & 
                       (df[predict_col]==ground_truth)].shape[0])) / 
                  (df[df[sensitive_col]==sensitive_values[0]].shape[0]))
        prob_b = (((df[(df[sensitive_col]==sensitive_values[1]) & 
                       (df[predict_col]==ground_truth)].shape[0])) / 
                  (df[df[sensitive_col]==sensitive_values[1]].shape[0]))
        
        print('\tProb A = {}'.format(prob_a))
        print('\tProb B = {}'.format(prob_b))
        return abs(prob_a-prob_b)
    
    def fit_separation(self, df: pd.DataFrame, sensitive_col: str, target_col: str, predict_col: str,
                       ground_truth: str, sensitive_values: List[str]) -> float:
        """      
        A-> Variable sensible
        Y-> Predicción
        T-> Target
        P(Y=1∣T=1,A=a)=P(Y=1∣T=1,A=b)
        """
        prob_a = ((df[(df[sensitive_col]==sensitive_values[0]) &
                      (df[target_col]==ground_truth) &
                      (df[predict_col]==ground_truth)]).shape[0] / 
                  (df[(df[sensitive_col]==sensitive_values[0]) &
                      (df[target_col]==ground_truth)]).shape[0])
        prob_b = ((df[(df[sensitive_col]==sensitive_values[1]) &
                      (df[target_col]==ground_truth) &
                      (df[predict_col]==ground_truth)]).shape[0] / 
                  (df[(df[sensitive_col]==sensitive_values[1]) &
                      (df[target_col]==ground_truth)]).shape[0])
        
        print('\tProb A = {}'.format(prob_a))
        print('\tProb B = {}'.format(prob_b))
        return abs(prob_a-prob_b)
    
    def fit_sufficiency(self, df: pd.DataFrame, sensitive_col: str, target_col: str, predict_col: str,
                       ground_truth: str, sensitive_values: List[str]) -> float:
        """
        A-> Variable sensible
        Y-> Predicción
        T-> Target
        P(T=1∣Y=1,A=a)=P(T=1∣Y=1,A=b)
        """
        prob_a = ((df[(df[sensitive_col]==sensitive_values[0]) &
                      (df[target_col]==ground_truth) &
                      (df[predict_col]==ground_truth)]).shape[0] / 
                  (df[(df[sensitive_col]==sensitive_values[0]) &
                      (df[predict_col]==ground_truth)]).shape[0])
        prob_b = ((df[(df[sensitive_col]==sensitive_values[1]) &
                      (df[target_col]==ground_truth) &
                      (df[predict_col]==ground_truth)]).shape[0] / 
                  (df[(df[sensitive_col]==sensitive_values[1]) &
                      (df[predict_col]==ground_truth)]).shape[0])
        
        print('\tProb A = {}'.format(prob_a))
        print('\tProb B = {}'.format(prob_b))
        return abs(prob_a-prob_b)
    
    
    def score_weight(self, df: pd.DataFrame, sensitive_col: str, 
                     predict_col: str, ground_truth: str, groupby_cols: List[str]) -> float:
        """
        Función para calcular el porcentaje (peso) que supone el cálculo de algún criterio respecto se su
        predicción y variable sensible
        """
        dfp = df.groupby(groupby_cols)[sensitive_col].agg({'count' : 'count'}).reset_index()
        dfp['pct'] = dfp['count'] / dfp['count'].sum()
        return dfp[dfp[predict_col] == ground_truth]['pct'].iloc[0]
            

fairness = Fairness(fairness_params={})
dict_result = fairness.fit_fairness(df_dataset=df_dataset, sensitive_cols=['Genero'], target_col='y_true', predict_col='y_predict')
df_result = pd.DataFrame.from_dict(dict_result)
df_result


Genero - Hombre, Mujer - SI
	Prob A = 0.6666666666666666
	Prob B = 0.25
Independence: 0.41666666666666663

	Prob A = 0.75
	Prob B = 0.5
Separation: 0.25

	Prob A = 0.75
	Prob B = 1.0
Sufficience: 0.25



Genero - Hombre, Mujer - NO
	Prob A = 0.3333333333333333
	Prob B = 0.75
Independence: 0.4166666666666667

	Prob A = 0.5
	Prob B = 1.0
Separation: 0.5

	Prob A = 0.5
	Prob B = 0.6666666666666666
Sufficience: 0.16666666666666663




is deprecated and will be removed in a future version


Unnamed: 0,Ground_Truth,Independence_score,Score_weight,Sensitive_Feature,Sensitive_Value,Separation_score,Sufficience_score,is_Binary_Sensitive_feature
0,SI,0.416667,0.5,Genero,"Hombre, Mujer",0.25,0.25,True
1,NO,0.416667,0.5,Genero,"Hombre, Mujer",0.5,0.166667,True


<hr>

# Ejemplo de aplicación real - Clasificación Binaria



## 1.- Lectura del Dataset

In [144]:
import pandas as pd

# Leemos el dataset
df = pd.read_csv('../../datasets/bodyPerformance.csv')
print('Tamaño del dataset {}'.format(df.shape))
df.sample(3)

Tamaño del dataset (13393, 12)


Unnamed: 0,age,gender,height_cm,weight_kg,body fat_%,diastolic,systolic,gripForce,sit and bend forward_cm,sit-ups counts,broad jump_cm,class
394,46.0,M,163.5,71.1,28.6,85.0,144.0,37.9,14.9,35.0,204.0,D
11441,26.0,F,164.2,58.86,31.7,67.0,128.0,23.8,4.8,26.0,163.0,D
8092,45.0,F,154.7,41.8,15.9,72.0,119.0,18.9,22.1,28.0,146.0,D


In [145]:
df['rango_edad'] = df['age'].apply(lambda x: 10 if x < 20 
                                   else (20 if (x >= 20 and x < 30) 
                                         else (30 if (x >= 30 and x < 40) 
                                               else (40 if (x >= 40 and x < 50) 
                                                     else (50 if (x >= 50 and x < 60) 
                                                           else 60)))))
df.sample(5)

Unnamed: 0,age,gender,height_cm,weight_kg,body fat_%,diastolic,systolic,gripForce,sit and bend forward_cm,sit-ups counts,broad jump_cm,class,rango_edad
8015,39.0,F,159.9,49.8,32.9,62.0,116.0,17.1,9.0,38.0,137.0,D,30
7900,39.0,F,165.0,68.6,38.6,65.0,113.0,33.1,16.9,14.0,94.0,D,30
2365,41.0,F,163.0,64.2,34.8,70.0,116.0,24.6,4.0,17.0,131.0,D,40
11996,28.0,M,177.4,87.0,21.4,84.0,151.0,45.7,21.2,48.0,225.0,C,20
10461,26.0,M,170.6,59.4,17.7,81.0,141.0,35.4,16.5,58.0,235.0,B,20


### Distribución del target por género

In [146]:
dfp = df.groupby(['gender', 'class'])['age'].agg({'count' : 'count'}).reset_index()
dfp['perc'] = dfp.groupby('gender')['count'].apply(lambda x: x*100/x.sum())
dfp

is deprecated and will be removed in a future version
  """Entry point for launching an IPython kernel.


Unnamed: 0,gender,class,count,perc
0,F,A,1484,30.125863
1,F,B,1185,24.056029
2,F,C,1112,22.574097
3,F,D,1145,23.244011
4,M,A,1864,22.014881
5,M,B,2162,25.534428
6,M,C,2237,26.42022
7,M,D,2204,26.030471


### Para clasificación binaria me quedo con las clases B y D

In [147]:
df = df[df['class'].isin(['B', 'D'])].rename({'class': 'y_true'}, axis=1)
# df = df[df['class'].isin(['A', 'B', 'C' , 'D'])].rename({'class': 'y_true'}, axis=1)
print('Tamaño del dataset {}'.format(df.shape))
df.sample(5)

Tamaño del dataset (6696, 13)


Unnamed: 0,age,gender,height_cm,weight_kg,body fat_%,diastolic,systolic,gripForce,sit and bend forward_cm,sit-ups counts,broad jump_cm,y_true,rango_edad
11882,21.0,F,160.2,56.3,39.6,69.0,115.0,19.2,5.4,20.0,142.0,D,20
3827,46.0,F,162.0,63.2,25.9,62.0,110.0,23.3,0.7,12.0,138.0,D,40
12187,33.0,F,153.0,53.2,35.5,53.0,99.0,18.9,6.4,11.0,124.0,D,30
5861,46.0,M,172.6,89.0,23.3,79.0,130.0,41.3,-7.4,49.0,218.0,D,40
6535,28.0,M,165.5,56.1,14.5,69.0,125.0,35.2,11.4,51.0,202.0,B,20


<hr>

## 2.- Modelo & Predicción

In [148]:
from sklearn.preprocessing import LabelEncoder

# Codificamos las variables discretas
lb_gen = LabelEncoder()
lb_y = LabelEncoder()
df['gender'] = lb_gen.fit_transform(df['gender'])
df['y_true'] = lb_y.fit_transform(df['y_true'])
df.sample(5)

Unnamed: 0,age,gender,height_cm,weight_kg,body fat_%,diastolic,systolic,gripForce,sit and bend forward_cm,sit-ups counts,broad jump_cm,y_true,rango_edad
1670,44.0,1,170.1,64.3,19.0,82.0,143.0,45.3,14.4,29.0,221.0,1,40
418,50.0,0,154.6,53.2,25.1,69.0,114.0,27.9,20.2,23.0,158.0,0,50
938,35.0,1,180.6,78.0,23.9,90.0,133.0,39.1,-10.7,14.0,179.0,1,30
6774,25.0,0,166.5,60.7,35.2,69.0,107.0,28.3,19.3,44.0,134.0,0,20
5487,46.0,0,162.3,68.2,36.3,94.0,137.0,24.1,4.6,36.0,168.0,1,40


In [149]:
from sklearn.linear_model import LogisticRegression

# Creamos y entrenamos el modelo
model = LogisticRegression()
model.fit(df[df.columns[:-2]], df[df.columns[-2]])
print('Accuracy: {}'.format(model.score(df[df.columns[:-2]], df[df.columns[-2]])))

Accuracy: 0.892921146953405


In [150]:
# Calculamos las predicciones
df['y_predict'] = model.predict(df[df.columns[:-2]])
df.sample(5)

Unnamed: 0,age,gender,height_cm,weight_kg,body fat_%,diastolic,systolic,gripForce,sit and bend forward_cm,sit-ups counts,broad jump_cm,y_true,rango_edad,y_predict
10682,55.0,1,164.0,68.2,22.7,81.0,121.0,43.1,16.0,33.0,183.0,0,50,0
11291,56.0,0,162.7,57.8,24.4,73.0,128.0,28.8,27.0,19.0,175.0,0,50,0
4305,56.0,1,174.4,72.5,20.7,98.0,151.0,34.1,18.5,30.0,176.0,0,50,0
12098,21.0,1,180.6,86.5,22.9,79.0,123.0,37.2,9.2,41.0,227.0,1,20,1
2677,28.0,1,173.4,72.8,14.1,82.0,121.0,41.4,18.4,59.0,242.0,0,20,0


In [151]:
# Deshacemos en LabelEncoder de la variable género, target y predicción
df['gender'] = lb_gen.inverse_transform(df['gender'])
df['y_true'] = lb_y.inverse_transform(df['y_true'])
df['y_predict'] = lb_y.inverse_transform(df['y_predict'])
df.sample(5)

  if diff:
  if diff:
  if diff:


Unnamed: 0,age,gender,height_cm,weight_kg,body fat_%,diastolic,systolic,gripForce,sit and bend forward_cm,sit-ups counts,broad jump_cm,y_true,rango_edad,y_predict
2061,40.0,F,158.5,56.9,28.6,84.0,135.0,30.1,16.1,17.0,161.0,D,40,D
6073,35.0,F,162.0,53.0,24.7,68.0,107.0,29.1,21.4,40.0,155.0,B,30,B
991,42.0,M,168.7,68.5,27.7,91.0,156.0,36.5,10.2,44.0,203.0,D,40,D
2309,55.0,F,155.1,66.5,39.5,77.0,135.0,24.7,18.2,13.0,102.0,D,50,D
5659,27.0,M,176.4,85.5,32.9,88.0,148.0,41.2,7.9,36.0,200.0,D,20,D


## 3.- Criterio de Independencia (Binario)

In [152]:
fairness = Fairness(fairness_params={})
# dict_result = fairness.fit_fairness(df_dataset=df, sensitive_cols=['gender'], target_col='y_true', predict_col='y_predict')
# dict_result = fairness.fit_fairness(df_dataset=df, sensitive_cols=['rango_edad'], target_col='y_true', predict_col='y_predict')
dict_result = fairness.fit_fairness(df_dataset=df, sensitive_cols=['gender', 'rango_edad'], target_col='y_true', predict_col='y_predict')
df_result = pd.DataFrame.from_dict(dict_result)
df_result


gender - M, F - B
	Prob A = 0.516262024736601
	Prob B = 0.534763948497854
Independence: 0.018501923761253036

	Prob A = 0.9167437557816837
	Prob B = 0.9139240506329114
Separation: 0.002819705148772278

	Prob A = 0.8793256433007985
	Prob B = 0.8691813804173355
Sufficience: 0.01014426288346304



rango_edad - 30 - B
	Prob A = 0.5584615384615385
	Prob B = 0.5140845070422535
Independence: 0.04437703141928495

	Prob A = 0.9103840682788051
	Prob B = 0.9171709531013615
Separation: 0.006786884822556405

	Prob A = 0.8815426997245179
	Prob B = 0.8741888968997837
Sufficience: 0.00735380282473419



rango_edad - 20 - B
	Prob A = 0.5376146788990825
	Prob B = 0.5124653739612188
Independence: 0.025149304937863715



is deprecated and will be removed in a future version


	Prob A = 0.9319148936170213
	Prob B = 0.903975219411461
Separation: 0.027939674205560316

	Prob A = 0.8969283276450511
	Prob B = 0.8604422604422605
Sufficience: 0.036486067202790684



rango_edad - 40 - B
	Prob A = 0.4820717131474104
	Prob B = 0.5298664792691496
Independence: 0.047794766121739274

	Prob A = 0.9468822170900693
	Prob B = 0.9111187371310913
Separation: 0.035763479958977995

	Prob A = 0.8471074380165289
	Prob B = 0.8803050397877984
Sufficience: 0.033197601771269514



rango_edad - 50 - B
	Prob A = 0.5005128205128205
	Prob B = 0.5264813843733613
Independence: 0.025968563860540805

	Prob A = 0.9376443418013857
	Prob B = 0.9124914207275223
Separation: 0.025152921073863355

	Prob A = 0.8319672131147541
	Prob B = 0.8828021248339973
Sufficience: 0.05083491171924326



rango_edad - 60 - B
	Prob A = 0.4869942196531792
	Prob B = 0.5268154563624251
Independence: 0.03982123670924592

	Prob A = 0.8016304347826086
	Prob B = 0.9298422289358845
Separation: 0.12821179415327588

	Prob A =

Unnamed: 0,Ground_Truth,Independence_score,Score_weight,Sensitive_Feature,Sensitive_Value,Separation_score,Sufficience_score,is_Binary_Sensitive_feature
0,B,0.018502,0.5227,gender,"M, F",0.00282,0.010144,True
1,B,0.044377,0.108423,rango_edad,30,0.006787,0.007354,False
2,B,0.025149,0.218787,rango_edad,20,0.02794,0.036486,False
3,B,0.047795,0.072282,rango_edad,40,0.035763,0.033198,False
4,B,0.025969,0.072879,rango_edad,50,0.025153,0.050835,False
5,B,0.039821,0.050329,rango_edad,60,0.128212,0.00038,False
6,D,0.018502,0.4773,gender,"M, F",0.018946,0.008869,True
7,D,0.044377,0.085723,rango_edad,30,0.017237,0.026232,False
8,D,0.025149,0.188172,rango_edad,20,0.024797,0.019884,False
9,D,0.047795,0.077658,rango_edad,40,0.000352,0.052555,False
