# Working on Group-CFEs

### Using Datasets from; Retiring Adult: New Datasets for Fair Machine Learning (https://papers.nips.cc/paper/2021/file/32e54441e6382a7fbacbbbaf3c450059-Paper.pdf)


## Data Prep

In [1]:
import numpy as np 
import pandas as pd
import random
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn_extra.cluster import KMedoids
from sklearn.neighbors import NearestNeighbors
from sklearn.ensemble import GradientBoostingClassifier
from scipy.spatial import distance
from collections import Counter

In [2]:
X_train = np.load('X_train_CA.npy')
y_train = np.load('y_train_CA.npy')
X_test = np.load('X_test_CA.npy')
y_test = np.load('y_test_CA.npy')

In [3]:
model = make_pipeline(StandardScaler(), GradientBoostingClassifier())
model.fit(X_train, y_train)

yhat = model.predict(X_test)

### Understanding Features:

#### Employment Type (COW)

- 1 .Employee of a private for-profit company or business, or of an individual, for wages, salary, or commissions
- 2 .Employee of a private not-for-profit, tax-exempt, or charitable organization
- 3 .Local government employee (city, county, etc.)
- 4 .State government employee
- 5 .Federal government employee
- 6 .Self-employed in own not incorporated business, professional
.practice, or farm
- 7 .Self-employed in own incorporated business, professional
.practice or farm
- 8 .Working without pay in family business or farm
- 9 .Unemployed  and last worked 5 years ago or earlier or never worked


#### Education

- bb .N/A (less than 3 years old)
- 01 .No schooling completed
- 02 .Nursery school, preschool
- 03 .Kindergarten
- 04 .Grade 1
- 05 .Grade 2
- 06 .Grade 3
- 07 .Grade 4
- 08 .Grade 5
- 09 .Grade 6
- 10 .Grade 7
- 11 .Grade 8
- 12 .Grade 9
- 13 .Grade 10
- 14 .Grade 11
- 15 .12th grade - no diploma
- 16 .Regular high school diploma
- 17 .GED or alternative credential
- 18 .Some college, but less than 1 year
- 19 .1 or more years of college credit, no degree
- 20 .Associate's degree
- 21 .Bachelor's degree
- 22 .Master's degree
- 23 .Professional degree beyond a bachelor's degree
- 24 .Doctorate degree
 

#### Marital status
- 1 .Married
- 2 .Widowed
- 3 .Divorced
- 4 .Separated
- 5 .Never married or under 15 years old

#### Career 

See Page 84/85: https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2014-2018.pdf

#### POB  

See Page 96: https://www2.census.gov/programs-surveys/acs/tech_docs/pums/data_dict/PUMS_Data_Dictionary_2014-2018.pdf

#### Age
Self Explanatory

#### Weekly Hours
Self Explanatory

#### Gender

- 1 Male
- 2 Female

####  Recoded detailed race code (RAC1P)
- 1 .White alone
- 2 .Black or African American alone
- 3 .American Indian alone
- 4 .Alaska Native alone
- 5 .American Indian and Alaska Native tribes specified; or American Indian or Alaska Native, not specified and no other races
- 6 .Asian alone
- 7 .Native Hawaiian and Other Pacific Islander alone
- 8 .Some Other Race alone
- 9 .Two or More Races

## Counterfactuals

### A simple baseline; NUNs - Nearest Unlike Neighbors

In [4]:
pd.DataFrame(X_train, columns = ['Employment Type', 'Qualification', 'Marital status', 'Career', 'POB','AGE', 'Weekly Hours', 'Gender', 'Race'])

Unnamed: 0,Employment Type,Qualification,Marital status,Career,POB,AGE,Weekly Hours,Gender,Race
0,2.0,22.0,1.0,1821.0,6.0,46.0,45.0,2.0,9.0
1,1.0,21.0,3.0,4850.0,12.0,45.0,50.0,2.0,1.0
2,1.0,21.0,5.0,1021.0,215.0,40.0,40.0,2.0,6.0
3,1.0,24.0,1.0,300.0,210.0,59.0,40.0,1.0,6.0
4,1.0,19.0,5.0,3401.0,6.0,23.0,40.0,2.0,1.0
...,...,...,...,...,...,...,...,...,...
156527,1.0,16.0,1.0,3645.0,6.0,29.0,40.0,2.0,1.0
156528,1.0,21.0,1.0,2640.0,6.0,42.0,40.0,2.0,1.0
156529,1.0,21.0,5.0,630.0,24.0,60.0,60.0,1.0,6.0
156530,3.0,22.0,1.0,230.0,6.0,47.0,60.0,1.0,1.0


#### NUN instances where people make 50k + in the training data

In [5]:
negative_outcome = [X_test[instance] for instance in np.where(yhat == False)][0] # the people in the test set who are predicted to make less than 50k
positive_outcome = [X_test[instance] for instance in np.where(yhat == True)][0] # the people in the test set who are predicted to make more than 50k

In [6]:
positive_train_set = [X_train[instance] for instance in np.where(y_train == True)][0] # the people who make 50k in the train set
negative_train_set = positive = [X_train[instance] for instance in np.where(y_train == False)][0] # the people who dont make 50k in the train set

In [7]:
index = 0
neighbors = NearestNeighbors(n_neighbors=30, metric='hamming').fit(positive_train_set) #nb could do with a better distance function
distances, indices = neighbors.kneighbors(X_test[index].reshape(1,-1))

list(X_test[index]), list(positive_train_set[indices[0][0]]) # a NUN

([1.0, 18.0, 1.0, 5840.0, 21.0, 62.0, 40.0, 1.0, 1.0],
 [1.0, 18.0, 1.0, 7340.0, 360.0, 62.0, 40.0, 1.0, 1.0])

#### Generalizing over the NuNs --- Centroid from 30 NUNs (NB this will be a point in the dataset --- so will need to take an average or a smarter method for GCFE)

In [8]:
kmedoids = KMedoids(n_clusters=1, random_state=0).fit([positive_train_set[index_] for index_ in indices][0].reshape(30,-1))

In [9]:
(list(kmedoids.cluster_centers_[0]))

[1.0, 18.0, 1.0, 220.0, 6.0, 45.0, 40.0, 1.0, 1.0]

#### Borderline Cases

In [10]:
(np.where((model.predict_proba(negative_outcome)).T[0] < 0.6))[0][0:200] #Where the prediction is less than 0.6 (close to decision boundry)

array([   2,   19,   24,   25,   27,   38,   40,   45,   57,   99,  117,
        119,  130,  155,  166,  173,  175,  210,  211,  221,  224,  227,
        230,  237,  247,  268,  275,  277,  291,  300,  301,  313,  321,
        324,  327,  337,  343,  345,  349,  370,  377,  390,  391,  401,
        403,  404,  407,  410,  421,  423,  426,  436,  445,  454,  456,
        458,  467,  469,  481,  482,  483,  489,  492,  496,  517,  523,
        524,  533,  538,  555,  587,  594,  596,  626,  631,  656,  669,
        672,  674,  676,  682,  703,  707,  718,  729,  744,  759,  761,
        774,  786,  788,  800,  804,  837,  849,  852,  863,  865,  867,
        870,  871,  880,  885,  886,  894,  909,  915,  919,  920,  931,
        936,  940,  945,  978,  985,  997,  999, 1014, 1015, 1018, 1029,
       1030, 1055, 1057, 1066, 1077, 1098, 1099, 1100, 1101, 1127, 1138,
       1168, 1181, 1210, 1220, 1222, 1224, 1229, 1235, 1249, 1272, 1273,
       1275, 1294, 1303, 1313, 1317, 1320, 1329, 13

In [11]:
query_instance = 1406
list(negative_outcome[query_instance])

[1.0, 19.0, 1.0, 5860.0, 217.0, 56.0, 40.0, 1.0, 6.0]

In [12]:
model.predict_proba(negative_outcome[query_instance].reshape(1,-1))

array([[0.59370397, 0.40629603]])

In [13]:
distances, indices = neighbors.kneighbors(negative_outcome[query_instance].reshape(1,-1))
indices

array([[ 7419, 34773, 34108, 47523,  3808, 22996, 62130,  8099, 49605,
        12337,  3645,  7248, 17137, 22720,  7134, 12294, 40799, 11327,
        11779, 32610, 61606, 62883,  8281, 48677, 29626, 57882,  5871,
        44230, 62732, 50021]], dtype=int64)

In [14]:
list(negative_outcome[query_instance])

[1.0, 19.0, 1.0, 5860.0, 217.0, 56.0, 40.0, 1.0, 6.0]

In [15]:
list(positive_train_set[indices[0][0]])

[1.0, 19.0, 1.0, 5860.0, 217.0, 51.0, 40.0, 1.0, 6.0]

In [16]:
kmedoids = KMedoids(n_clusters=1, random_state=0).fit([positive_train_set[index_] for index_ in indices][0].reshape(30,-1))
(list(kmedoids.cluster_centers_[0]))

[1.0, 19.0, 1.0, 860.0, 217.0, 37.0, 40.0, 1.0, 6.0]

#### Custom Distance Function

In [17]:
#Should create one here to be better than hamming

#def custom_distance(instance):
     

#### Finding NNs

NB might use a custom distance function

In [18]:
neighbors_negative = NearestNeighbors(n_neighbors=30, metric='hamming').fit(negative_train_set) # other instances that dont get 50k   

In [19]:
def NUN_finder(query):
    
    distances, indices = neighbors.kneighbors(query.reshape(1,-1))
    NUN = positive_train_set[indices[0][0]]
    
    return list(NUN)

In [20]:
def explanation_generator(query): # a query predicted to be under 50k 
    
    query = query
    distances_neg, indices_neg = neighbors_negative.kneighbors(query.reshape(1,-1))
    NNs = (negative_train_set[indices_neg[0][0:5]])
    
    distances, indices = neighbors.kneighbors(query.reshape(1,-1))
    NUN = positive_train_set[indices[0][0]]
    
    NUNs = []
    for instance in NNs:
        NUNs.append(NUN_finder(instance))
    
    return query, NUN, NNs, NUNs, indices_neg #return the query, NUN, the NN's in the same class and also the corresponding NUNs
    

In [21]:
def boarderline_cases_negative():
    
    return(np.where((model.predict_proba(negative_outcome)).T[0] < 0.6))[0][0:200] #Where the prediction is less than 0.6 (e.g. close to decision boundry) 

In [22]:
boarderline_cases_negative()

array([   2,   19,   24,   25,   27,   38,   40,   45,   57,   99,  117,
        119,  130,  155,  166,  173,  175,  210,  211,  221,  224,  227,
        230,  237,  247,  268,  275,  277,  291,  300,  301,  313,  321,
        324,  327,  337,  343,  345,  349,  370,  377,  390,  391,  401,
        403,  404,  407,  410,  421,  423,  426,  436,  445,  454,  456,
        458,  467,  469,  481,  482,  483,  489,  492,  496,  517,  523,
        524,  533,  538,  555,  587,  594,  596,  626,  631,  656,  669,
        672,  674,  676,  682,  703,  707,  718,  729,  744,  759,  761,
        774,  786,  788,  800,  804,  837,  849,  852,  863,  865,  867,
        870,  871,  880,  885,  886,  894,  909,  915,  919,  920,  931,
        936,  940,  945,  978,  985,  997,  999, 1014, 1015, 1018, 1029,
       1030, 1055, 1057, 1066, 1077, 1098, 1099, 1100, 1101, 1127, 1138,
       1168, 1181, 1210, 1220, 1222, 1224, 1229, 1235, 1249, 1272, 1273,
       1275, 1294, 1303, 1313, 1317, 1320, 1329, 13

In [23]:
negative_train_set

array([[ 1., 19.,  5., ..., 40.,  2.,  1.],
       [ 7., 21.,  5., ..., 40.,  2.,  1.],
       [ 1., 19.,  5., ..., 15.,  1.,  9.],
       ...,
       [ 1., 16.,  1., ..., 40.,  2.,  1.],
       [ 1., 21.,  1., ..., 40.,  2.,  1.],
       [ 3., 16.,  1., ..., 30.,  2.,  8.]])

In [24]:
#Counter(negative_outcome.T[3]) #looking at job IDs that are not getting 50K (3rd column feature)

## DiCE Counterfactuals

In [25]:
# DiCE imports
import dice_ml
from dice_ml.utils import helpers  # helper functions

In [26]:
# Getting dataset ready using pandas

x_train = pd.DataFrame(X_train, columns = ['employment type', 'qualification', 'marital status', 'career', 'pob','age', 'weekly hours', 'gender', 'race'])
x_train['income'] = y_train

x_test = pd.DataFrame(X_test, columns = ['employment type', 'qualification', 'marital status', 'career', 'pob','age', 'weekly hours', 'gender', 'race'])
x_test['income'] = y_test
x_test = x_test.drop('income', axis=1)

x_train = x_train.drop('income', axis=1)
#x_test = test_dataset.drop('income', axis=1)

In [27]:
x_train

Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race
0,2.0,22.0,1.0,1821.0,6.0,46.0,45.0,2.0,9.0
1,1.0,21.0,3.0,4850.0,12.0,45.0,50.0,2.0,1.0
2,1.0,21.0,5.0,1021.0,215.0,40.0,40.0,2.0,6.0
3,1.0,24.0,1.0,300.0,210.0,59.0,40.0,1.0,6.0
4,1.0,19.0,5.0,3401.0,6.0,23.0,40.0,2.0,1.0
...,...,...,...,...,...,...,...,...,...
156527,1.0,16.0,1.0,3645.0,6.0,29.0,40.0,2.0,1.0
156528,1.0,21.0,1.0,2640.0,6.0,42.0,40.0,2.0,1.0
156529,1.0,21.0,5.0,630.0,24.0,60.0,60.0,1.0,6.0
156530,3.0,22.0,1.0,230.0,6.0,47.0,60.0,1.0,1.0


Given the train dataset, we construct a data object for DiCE. Since continuous and discrete features have different ways of perturbation, we need to specify the names of the continuous features. DiCE also requires the name of the output variable that the ML model will predict.

In [28]:
# Step 1: dice_ml.Data
d = dice_ml.Data(dataframe=x_train, continuous_features=['age', 'weekly hours'], outcome_name='income')

In [29]:
m = dice_ml.Model(model=model, backend="sklearn")

In [30]:
exp = dice_ml.Dice(d, m, method="random")

In [31]:
e1 = exp.generate_counterfactuals(x_train[5:6], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,7.0,21.0,5.0,440.0,6.0,23.0,40.0,2.0,1.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,83.3,-,-,-,1
1,1.0,-,-,-,-,71.0,-,-,-,1
2,-,-,4.0,-,-,92.3,-,-,-,1
3,-,-,-,-,-,61.8,-,-,-,1
4,-,-,-,-,-,34.4,55.0,-,-,1


### A Sample Material

#### Material and its Counterfactual

In [32]:
e1 = exp.generate_counterfactuals(x_test[23:24], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,6.0,21.0,1.0,440.0,207.0,45.0,20.0,2.0,6.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,-,39.8,-,-,1.0
1,-,-,-,-,-,-,70.0,-,-,1.0
2,-,-,-,-,-,-,59.2,-,-,1.0
3,-,-,-,-,-,70.2,-,1.0,-,1.0
4,-,-,-,-,-,-,78.7,-,-,1.0


### Material NNs and Corresponding Counterfactuals

In [33]:
model.predict(np.array(NUN_finder(np.array(x_test[23:24]))).reshape(1,-1))

array([ True])

In [34]:
model.predict((np.array(x_test[23:24])).reshape(1,-1))

array([False])

In [35]:
explanation_generator((np.array(x_test[23:24])).reshape(1,-1))[4][0]

array([72214, 36303, 82230, 13715, 91697,  3293, 20683, 79366,  9667,
       33736,  3960, 40618, 69556,  2663, 17934, 31037, 14349, 68810,
       58221, 36300, 70238,  6279, 66701, 60287, 31950, 32986, 52426,
       59920, 30402, 30508], dtype=int64)

In [36]:
indices_cf_example = (np.where(y_train == False)[0][72214], np.where(y_train == False)[0][36303], np.where(y_train == False)[0][82230], np.where(y_train == False)[0][13715], np.where(y_train == False)[0][91697]) 

In [37]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[0]:indices_cf_example[0]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,6.0,21.0,1.0,800.0,207.0,52.0,20.0,2.0,6.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,-,42.2,-,-,1
1,-,-,-,-,-,-,34.8,-,-,1
2,-,-,-,-,-,-,69.6,-,-,1
3,-,-,3.0,-,-,-,44.8,-,-,1
4,-,-,-,-,-,-,63.6,-,-,1


In [38]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[1]:indices_cf_example[1]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,6.0,22.0,1.0,440.0,207.0,32.0,20.0,2.0,6.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,69.2,-,-,-,1
1,-,-,-,2861.0,-,-,50.7,-,-,1
2,-,-,-,-,-,-,97.0,-,-,1
3,-,-,-,-,-,-,82.3,-,-,1
4,-,-,-,-,-,55.2,44.1,-,-,1


In [39]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[2]:indices_cf_example[2]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,6.0,21.0,1.0,6410.0,207.0,50.0,20.0,2.0,6.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,-,99.1,-,-,1
1,-,-,-,-,-,-,64.9,-,-,1
2,-,-,-,-,-,-,94.4,-,-,1
3,-,-,-,-,-,-,42.5,-,-,1
4,-,-,-,-,-,79.8,66.7,-,-,1


In [40]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[3]:indices_cf_example[3]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,6.0,21.0,1.0,4760.0,207.0,49.0,20.0,2.0,6.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,7.0,-,-,-,-,-,56.9,-,-,1
1,-,-,-,-,-,70.1,69.0,-,-,1
2,-,-,-,-,-,-,67.7,1.0,-,1
3,-,-,-,1745.0,-,-,37.4,-,-,1
4,-,-,-,-,160.0,-,44.4,-,-,1


In [41]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[4]:indices_cf_example[4]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,21.0,1.0,5860.0,207.0,45.0,20.0,2.0,6.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,-,51.7,-,-,1
1,-,-,-,-,-,-,66.5,-,-,1
2,-,-,-,-,-,88.2,51.9,-,-,1
3,-,-,3.0,-,-,-,84.8,-,-,1
4,-,-,-,-,-,-,85.8,-,-,1


### Another Material --- Close to Decision Boundary

In [42]:
explanation_generator((np.array(x_test[1098:1099])).reshape(1,-1))[4][0]

array([ 1450,  6628, 69383, 78056,  4494,  4640, 72378, 18885, 29624,
       28788, 46057, 59393, 40521, 49966, 59754, 20949,  4175, 71327,
       11787, 80290,   170, 68139, 55276,  6821, 44054, 61494, 52216,
       41364, 70984, 60473], dtype=int64)

In [43]:
indices_cf_example = (np.where(y_train == False)[0][1450], np.where(y_train == False)[0][6628], np.where(y_train == False)[0][69383], np.where(y_train == False)[0][78056], np.where(y_train == False)[0][4494]) 

In [44]:
e1 = exp.generate_counterfactuals(x_test[1098:1099], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,18.0,1.0,8225.0,6.0,35.0,40.0,1.0,2.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,23.0,-,-,-,-,-,-,-,1.0
1,-,-,-,-,-,-,56.9,-,-,1.0
2,-,-,-,-,-,-,81.6,-,8.0,1.0
3,-,-,-,3010.0,-,-,-,-,-,1.0
4,6.0,-,-,-,-,62.9,-,-,-,1.0


In [45]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[0]:indices_cf_example[0]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,18.0,1.0,5260.0,6.0,35.0,40.0,1.0,2.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,-,-,62.7,-,-,-,1
1,-,-,-,-,-,79.5,-,-,-,1
2,3.0,20.0,-,-,-,-,-,-,-,1
3,-,-,-,-,-,81.8,92.5,-,-,1
4,-,-,-,2310.0,-,-,-,-,-,1


In [46]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[1]:indices_cf_example[1]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,18.0,1.0,9142.0,6.0,35.0,40.0,1.0,2.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,5120.0,-,93.6,-,-,-,1
1,-,-,-,-,41.0,-,53.4,-,-,1
2,-,-,-,-,-,89.7,-,-,1.0,1
3,-,9.0,-,1021.0,-,-,-,-,-,1
4,-,-,-,425.0,-,-,-,-,-,1


In [47]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[2]:indices_cf_example[2]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 1)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,18.0,1.0,700.0,6.0,35.0,40.0,1.0,8.0,1



Diverse Counterfactual set (new outcome: 0.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,6.0,-,-,4220.0,-,-,-,-,-,0
1,-,6.0,-,-,-,-,11.4,-,-,0
2,4.0,-,-,3610.0,-,-,-,-,-,0
3,-,-,-,3423.0,-,-,-,-,-,0
4,-,-,-,6850.0,-,-,-,-,6.0,0


In [48]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[3]:indices_cf_example[3]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,18.0,1.0,9720.0,6.0,58.0,40.0,1.0,2.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,-,-,-,8000.0,-,-,-,-,-,1
1,-,-,-,10.0,-,-,-,-,-,1
2,-,-,-,-,-,-,88.3,-,-,1
3,-,-,3.0,1200.0,-,-,-,-,-,1
4,7.0,-,-,1320.0,-,-,-,-,-,1


In [49]:
e1 = exp.generate_counterfactuals(x_train[indices_cf_example[4]:indices_cf_example[4]+1], total_CFs=5, desired_class="opposite")
e1.visualize_as_dataframe(show_only_changes=True)

Query instance (original outcome : 0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,1.0,18.0,1.0,3930.0,6.0,30.0,40.0,1.0,2.0,0



Diverse Counterfactual set (new outcome: 1.0)


Unnamed: 0,employment type,qualification,marital status,career,pob,age,weekly hours,gender,race,income
0,5.0,-,-,-,-,62.0,-,-,-,1
1,4.0,-,-,-,-,66.4,-,-,-,1
2,-,-,-,3220.0,-,-,-,-,-,1
3,-,-,-,1555.0,-,-,91.3,-,-,1
4,-,-,-,3040.0,-,-,-,-,-,1
