In this notebook, we inspect **in which way a tabular dataset as Census can be used to compute a structural causal model (or a model based on causally structured representation of features & data) to estimate wealthiness of individuals**. 

Therefore, we proceed in 4 steps:

**1. Find the 'latent' variables**
E.g. 'seniority' that could be an unobserved confounder else, if we draw a  

**2. Constitute a directed acyclic graph with all features (Census + latent-reconstituted)**
Based on the finding (and, more, interpretation) of latent features, we will build a coherent representation of systematic influences between the features of Census (here, to estimate the incomes of clients). 

**3. Build a Structural Causal Model computing their influences**

**4. We inspect if the graph-based AI indeed reflects common & expert knowledge on**
In particular, regarding the non-sense of certain inferences that should absolutely be avoided (e.g. education may influence occupation, but not the reverse).

Through Input Intervention Changes?

Or, build a model learning on (i) causal paths stated in (2) and (ii) data ? Would be (3')

In [1]:
%load_ext autoreload
%autoreload 2

import warnings
warnings.filterwarnings('ignore')

# Find the 'latent' variables (Census)
To 'complete' our causal paths representing the features used to infer the incomes of individuals, we used factor analysis. 

## General data preparation - handle categorical features
Here, we handle the categorical features through label-encoding. 

In [2]:
import sys
sys.path.append("../")

import time
from sklearn import datasets

from sklearn.preprocessing import LabelEncoder

import torch
from torch_geometric.data import Data

import tensorflow as tf

import itertools
import numpy as np
import pandas as pd

from classif_basic.data_preparation import train_valid_test_split, set_target_if_feature, automatic_preprocessing

from classif_basic.graph import table_to_graph, add_new_edge

2023-02-15 00:23:23.070487: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-15 00:23:23.879171: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2023-02-15 00:23:23.879227: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory


### Prepare data

Fix precise % of population distribution (sex: Male, Female) and % of wealthiness according to sex. In that way, we could inspect if the structure of the model (here based on a graph) integrates this "sexist" representation of the world. 

In [3]:
# preparing the dataset on clients for binary classification
from sklearn.datasets import fetch_openml
data = fetch_openml(data_id=1590, as_frame=True)

t0 = time.time()

X = data.data
Y = (data.target)# == '>50K') * 1

### Train-test-split, to prepare for 3 graphs representing data

In [4]:
model_task = "regression" #"classification"
preprocessing_cat_features = "label_encoding"

X_train, X_valid, X_train_valid, X_test, Y_train, Y_valid, Y_train_valid, Y_test = train_valid_test_split(
    X=X,
    Y=Y, 
    model_task=model_task,
    preprocessing_cat_features=preprocessing_cat_features)

## Factor Analysis
Here, we base on the sklearn implementation (https://scikit-learn.org/stable/modules/decomposition.html#fa)

Read more here, may be Bayesian approach?

In [5]:
# start with the number of already present features, to add one to find 'latent' variables
nb_features_census = X.shape[1]

In [6]:
from sklearn.decomposition import FactorAnalysis
from sklearn import datasets

my_fa = FactorAnalysis(n_components=nb_features_census-1, rotation='varimax') 

X_transformed = my_fa.fit_transform(X_valid)

In [7]:
X_valid.shape

(6228, 14)

In [8]:
X_transformed.shape

(6228, 13)

Reduce dimensionality for features with redundant information (e.g. "education" - "education-num"), to put a more straightforward graph?

In [9]:
X_valid["education"]

8218     11
39620    12
14122    11
29881     3
25989    11
         ..
42023     9
15862     1
35399    11
28047    11
1588      9
Name: education, Length: 6228, dtype: int64

In [10]:
X_valid["education-num"]

8218      9.0
39620    14.0
14122     9.0
29881     2.0
25989     9.0
         ... 
42023    13.0
15862     7.0
35399     9.0
28047     9.0
1588     13.0
Name: education-num, Length: 6228, dtype: float64

In [11]:
two_to_one_factor = FactorAnalysis(n_components=1, rotation='varimax') 

unique_col_education = two_to_one_factor.fit_transform(X_valid.filter(items = ["education","education-num"]))

But it leads to a problem of interpretability... Indeed, what do the new features mean? Increase with higher level of education? I have no clues to guess that...

In [12]:
np.unique(unique_col_education)

array([-1.13539022, -0.98097948, -0.49438924, -0.47984356, -0.1506015 ,
       -0.03115703,  0.26311877,  0.53991143,  0.76425469,  1.12846302,
        1.38777255,  1.64708208,  1.90639161,  2.04331921,  2.30262874,
        2.56193827])

Now, how to interpret it? A new column was built -> validate it through correlation with the other columns, if this correlation makes sense as the introduction of an intermediate variable (based on our domain-knowledge)? 

# High-Level Causal Representation (Model) of data - Constitute a directed acyclic graph with all features

Structural Causal Model to quantify the influences *given* directed causal paths.

Here, the goal is to **build a structural model that will then be learnt by AI** (from data, then with AI).

To better investigate (and confirm according to experience and business knowledge) the paths inside the data.

(!) discovering new relationships (and also features) will be a further step:

We base on this [introducing article](https://towardsdatascience.com/structural-equation-modeling-dca298798f4d):

"Structural Equation Models give you estimates of coefficients based on the hypothesized relationships between variables. It cannot find other relationships than those that you specify... A great way to use Structural Equation Models is to provide multiple hypothetical models, estimate each of them, and then analyze the differences between them to work towards a better and better model."

In [13]:
!pip install fsspec
!pip install s3fs
!pip install boto

!pip install semopy

You should consider upgrading via the '/work/.cache/poetry/classif-basic-DJpFP61h-py3.8/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/work/.cache/poetry/classif-basic-DJpFP61h-py3.8/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/work/.cache/poetry/classif-basic-DJpFP61h-py3.8/bin/python -m pip install --upgrade pip' command.[0m[33m


You should consider upgrading via the '/work/.cache/poetry/classif-basic-DJpFP61h-py3.8/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

In [14]:
import boto
import semopy

In cases of strong correlations, reduce to only one column? And control for native country (mostly US, hence take US and delete the column?)

In [15]:
# for comparison, a case where the correlation should be irrelevant
X_valid['education'].corr(X_valid['relationship'])

-0.020516691257224565

In [16]:
X_valid['education'].corr(X_valid['education-num'])

0.35942598961877636

In [17]:
X_valid['relationship'].corr(X_valid['marital-status'])

0.17850429871918275

In [18]:
X_valid['workclass'].corr(X_valid['occupation'])

0.30823817891355526

In [19]:
X_valid['hours-per-week'].corr(X_valid['capital-loss'])

0.06752629072839836

In [20]:
# we begin to test this Structural Equation Modelling (SEM) approach on a small sample (X_valid)

# and add the target (revenue) for future predictions
# TODO why not a regression?
data_valid = X_valid.copy()
data_valid['income'] = Y_valid

# adapt names of columns, to be compatible with semopy
list_cols = data_valid.columns.to_list()
list_cols = [feat.replace('-','_') for feat in list_cols]
data_valid.columns = list_cols
data_valid

Unnamed: 0,age,fnlwgt,education_num,capital_gain,capital_loss,hours_per_week,workclass,education,marital_status,occupation,relationship,race,sex,native_country,income
8218,59.0,172667.0,9.0,0.0,0.0,40.0,3,11,2,13,0,4,1,38,<=50K
39620,29.0,204516.0,14.0,0.0,0.0,15.0,6,12,2,9,0,4,1,38,<=50K
14122,60.0,495366.0,9.0,0.0,0.0,38.0,3,11,0,0,1,4,0,38,<=50K
29881,57.0,253914.0,2.0,0.0,0.0,35.0,5,3,2,11,0,4,1,25,<=50K
25989,19.0,128453.0,9.0,0.0,0.0,28.0,8,11,4,14,3,4,0,38,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
42023,40.0,53835.0,13.0,0.0,0.0,50.0,3,9,2,11,0,4,1,38,>50K
15862,22.0,289982.0,7.0,0.0,0.0,40.0,3,1,4,7,1,4,0,38,<=50K
35399,44.0,249332.0,9.0,0.0,0.0,40.0,3,11,2,13,0,4,1,6,<=50K
28047,44.0,204235.0,9.0,0.0,0.0,40.0,3,11,2,13,0,4,1,38,<=50K


In [21]:
model_spec = """
  # measurement part
    labour_income =~ marital_status + occupation + hours_per_week
    capital_income =~ capital_loss + capital_gain 
    
  # structural part

    relationship ~ education + sex
    marital_status ~ relationship
    
    fnlwgt ~ race + sex + native_country + age
    occupation ~ education + age + fnlwgt 
    workclass ~ occupation 
    hours_per_week ~ occupation + workclass
    capital_loss ~ age + fnlwgt + occupation
    capital_gain ~ age + fnlwgt + occupation + hours_per_week
    
    income ~ labour_income + capital_income
"""

# Instantiate the model
model = semopy.Model(model_spec)

In [22]:
# Fit the model using the data
model.fit(data_valid)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

In [None]:
# Show the results using the inspect method
model.inspect()

In [None]:
import graphviz

g = graphviz.Digraph('G')#, format=ext, engine=engine)

In [None]:
def semplot(mod, filename, 
            plot_exos=True, engine='dot', latshape='circle',
            plot_ests=True, show=False):

    inspection = mod.inspect(std_est=False)
    
    images = dict()
    
    t = filename.split('.')
    filename, ext = '.'.join(t[:-1]), t[-1]
    g = graphviz.Digraph('G', format=ext, engine=engine)
    
    g.attr(overlap='scale', splines='true')
    g.attr('edge', fontsize='12')
    g.attr('node', shape=latshape, fillcolor='#cae6df', style='filled')
    for lat in mod.vars['latent']:
        if lat in images:
            g.node(lat, label='', image=images[lat])
        else:
            g.node(lat, label=lat)
    
    g.attr('node', shape='box', style='')
    for obs in mod.vars['observed']:
        if obs in images:
            g.node(obs, label='', image=images[obs])
        else:
            g.node(obs, label=obs)

    regr = inspection[inspection['op'] == '~']
    all_vars = mod.vars['all']
    try:
        exo_vars = mod.vars['observed_exogenous']
    except KeyError:
        exo_vars = set()
    for _, row in regr.iterrows():
        lval, rval, est = row['lval'], row['rval'], row['Estimate']
        if (rval not in all_vars) or (~plot_exos and rval in exo_vars) or\
            (rval == '1'):
            continue
        if plot_ests:
            pval = row['p-value']
            label = '{:.3f}'.format(float(est))
            if pval !='-':
                label += r'\np-val: {:.2f}'.format(float(pval))
        else:
            label = str()
        g.edge(rval, lval, label=label)

    #g.render(filename, view=show)        
    #g.render(view=show)
    return g

In [None]:
semplot(model, "high_level_model.png")

## Previous Work (Causal Graph) - Reshape (by interpreting) data to a graph

From this dataset (where we introduced selectively a "sexist" effect against women), let's see how we could swith from the tabular data to a graph representation.

The point is that our features X all seem to be attributes of the clients, though we should find a way of representing their interactions between clients 

X = {race, age, sex, final weight (depends on age, sex, hispanic origin, race), education, education number, marital status, relationship, occupation, hours per week, workclass, race, sex, capital gain, capital loss, native country} 

**Nodes** 
Bank clients (by ID)

**Edges** 
Here, we should find one or several ways of connecting the clients

Should be occupation → if changes of occupation (or similar client with new occupation), which impact on the revenue? // change of football team => impact on the football rate 
(pers) actionable => predict revenue when switches to a new job??
→ may be: “hours per week” <=> inspect the change of revenue if switches to greater hours per week?

**Node Features** 
Attributs of the nodes, i.e. characteristics of the clients (here, hard to separate from what "connects" them...) 

Race, age, sex, final weight (depends on age, sex, hispanic origin, race), education, education number, marital status, relationship, hours per week, workclass, race, sex, capital gain, capital loss, native country 

**Label (here at a node-level?)** 
Income (Y = income > $50 000)

In [None]:
# compute edge by hands: create our own edge combination, to predict the income - with directed paths
# first edge joins "occupation" -> "hours-per-week"
# second edge joins "sex" -> "education"

edges_train = add_new_edge(data=X_train, previous_edge=None, list_col_names=["occupation", "hours-per-week"])
#edges_train = add_new_edge(data=X_train, previous_edge=edges_train, list_col_names=["sex","education"])

edges_valid = add_new_edge(data=X_valid, previous_edge=None, list_col_names=["occupation", "hours-per-week"])
#edges_valid = add_new_edge(data=X_valid, previous_edge=edges_valid, list_col_names=["sex","education"])

edges_test = add_new_edge(data=X_test, previous_edge=None, list_col_names=["occupation", "hours-per-week"])
#edges_test = add_new_edge(data=X_test, previous_edge=edges_test, list_col_names=["sex","education"])

# specify the feature(s) used to connect the clients in couples, i.e. to build the edge of the data graph

list_col_names = ["occupation", "hours-per-week"]#, "sex","education"]

data_train = table_to_graph(X=X_train, Y=Y_train, list_col_names=list_col_names, edges=edges_train)
data_valid = table_to_graph(X=X_valid, Y=Y_valid, list_col_names=list_col_names, edges=edges_valid)
data_test = table_to_graph(X=X_test, Y=Y_test, list_col_names=list_col_names, edges=edges_test)

In [None]:
X.columns

In [None]:
data_train

In [None]:
data_valid

In [None]:
data_test

# Train a basic Graph Neural Network on the graph-shaped data

## Build a basic convolutional GNN with torch

In [None]:
# here intervenes the quick "introduction by example" of GCN by torch
# in 'https://pytorch-geometric.readthedocs.io/en/latest/notes/introduction.html'

import torch
import torch.nn.functional as F
from torch_geometric.nn import GCNConv

class GCN(torch.nn.Module):
    def __init__(self, data):
        super().__init__()
        self.conv1 = GCNConv(data.num_node_features, 16)
        self.conv2 = GCNConv(16, data.num_classes)

    def forward(self, data):
        x, edge_index = data.x, data.edge_index

        x = self.conv1(x, edge_index)
        x = F.relu(x)
        x = F.dropout(x, training=self.training)
        x = self.conv2(x, edge_index)
        
        return F.log_softmax(x, dim=1)

In [None]:
batch_nb = 200

t_basic_1 = time.time()

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = GCN(data=data_train).to(device)
data_train = data_train.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)

model.double()

model.train()
for epoch in range(batch_nb): 
    # better with 200 batches (with only feature "occupation" as edge, 70% accuracy vs 50% accuracy with 50 batches)
    optimizer.zero_grad()
    out = model(data_train)
    loss = F.nll_loss(out, data_train.y)
    loss.backward()
    optimizer.step()

t_basic_2 = time.time()

print(f"Training of the basic GCN on Census with {batch_nb} batches took {(t_basic_2 - t_basic_1)/60} mn")

Finally, we can evaluate our model on the validation nodes. Obviously, linking the clients only through the job provides less than 70% of accuracy even on the train set. Therefore, we need to seek for other ways...

By creating an edge only with the combination of sex and education, we observe an accuracy of 61% on train that does not fall down on valid (65%). Moreover, **when the graph is directed (sex -> education), the accuracy seems to increase** without falling down valid performance: + 11% on train (76%), +2% on valid (67%), and 70% on test.  

Thanks to the training of the GCN with 200 batches, which however took 20 mn for 15_000 rows and 2 classes (and we shall admit, edge_index=[2, 10813909])

**Other observations (tests of combinations of features as edges)**

Having created our own edge index combining (sex&education) and (occupation), the training took 7 mn more (27 mn) but the performance did not improve (61% on train set...)

Adding the combination (occupation -> hours-per-week) to (sex -> education) does not improve the performances, but it decreases it (60 +-2 % on train and valid). Maybe because (i) it complexifies too much the network (ii) the model or (iii) the model's hyperparameters (batch, layers...) is too simple to catch these relations (iii) the models? 

**Constitution of couples graph data - graph networks to be tested, with input intervention changes...**

In [None]:
pred_train = model(data_train).argmax(dim=1)
nb_indivs_train = data_train.x.shape[0]

model.eval()

correct_train = (pred_train == data_train.y).sum()
acc = int(correct_train) / nb_indivs_train
print(f'Accuracy on train data: {acc:.4f}')

In [None]:
pred_valid = model(data_valid).argmax(dim=1)
nb_indivs_valid = data_valid.x.shape[0]

model.eval()

correct_valid = (pred_valid == data_valid.y).sum()
acc = int(correct_valid) / nb_indivs_valid
print(f'Accuracy on valid data: {acc:.4f}')

Let's inspect the model on test data, to assess if the stability of performance is not due to coincidence:

In [None]:
pred_test = model(data_test).argmax(dim=1)
nb_indivs_test = data_test.x.shape[0]

model.eval()

correct_test = (pred_test == data_test.y).sum()
acc = int(correct_test) / nb_indivs_test
print(f'Accuracy on test data: {acc:.4f}')

# Visual Representation of the Graph
Here, we will seek for a visual representation of the (directed acyclic?) graph. The goal is to check if it corresponds to the users' intuition - at least regarding the "non sense" causal paths. 

Here, the edges have been built with the directed path **sex -> education** (recall that the link [potentially] exists, because we voluntarily biased the data to be "sexist" regarding the distribution of incomes). Hence, the non-sense we don't want to find is an impact of education on sex. 

In [None]:
import networkx as nx

from torch_geometric.utils import to_networkx
import matplotlib.pyplot as plt

In [None]:
network_valid = to_networkx(data=data_valid)

# subax1 = plt.subplot(121)

# graph 
nx.draw(network_valid, with_labels=True, font_weight='bold')

In [None]:
list_col_names = ["occupation", "hours-per-week"]#, "sex","education"]

data_job_valid = table_to_graph(X=X_valid, Y=Y_valid, list_col_names=list_col_names, edges=edges_valid)

network_job_valid = to_networkx(data=data_job_valid)
nx.draw(network_job_valid, with_labels=True, font_weight='bold')

In [None]:
data_valid.x.shape[0]

In [None]:
# create a representation of the edge ("sex -> education") 
# with only 2 values of education and 20 individuals (min, max)

X_valid.reset_index(drop=True, inplace=True)
Y_valid.reset_index(drop=True, inplace=True)

df_education_max = X_valid.loc[X_valid["education"]==X_valid["education"].max()].iloc[:10]
#df_education_min = X_valid.loc[X_valid["education"]==X_valid["education"].min()].iloc[:10]

X_education_extreme = df_education_max#.append(df_education_min).sort_index()
Y_education_extreme = Y_valid.iloc[X_education_extreme.index]

In [None]:
# here, gain a representation with only 10 individuals 

t_graph_0 = time.time()

list_col_names = ["sex", "education"]

edges_sex_valid = add_new_edge(data=X_education_extreme, previous_edge=None, list_col_names=list_col_names)

data_sex_valid = table_to_graph(X=X_education_extreme, Y=Y_education_extreme, list_col_names=list_col_names, 
                                edges=edges_sex_valid)

network_job_valid = to_networkx(data=data_job_valid)
nx.draw(network_job_valid, with_labels=True, font_weight='bold')

t_graph_1 = time.time()

print(f"Plotting the graph with {data_sex_valid.x.shape[0]} individuals took {(t_graph_1 - t_graph_0)/60} mn")

In [None]:
# here, gain a representation with only 10 individuals (and only 'sex' as edge)

t_graph_0 = time.time()

list_col_names = ["sex"]

edges_sex_valid = add_new_edge(data=X_education_extreme, previous_edge=None, list_col_names=list_col_names)

data_sex_valid = table_to_graph(X=X_education_extreme, Y=Y_education_extreme, list_col_names=list_col_names, 
                                edges=edges_sex_valid)

network_job_valid = to_networkx(data=data_job_valid)
nx.draw(network_job_valid, with_labels=True, font_weight='bold')

t_graph_1 = time.time()

print(f"Plotting the graph with {data_sex_valid.x.shape[0]} individuals took {(t_graph_1 - t_graph_0)/60} mn")

In [None]:
# here, gain a representation with only 10 individuals (and only 'sex' as edge)

t_graph_0 = time.time()

list_col_names = ["sex"]

edges_sex_valid = add_new_edge(data=X_education_extreme, previous_edge=None, list_col_names=list_col_names)

data_sex_valid = table_to_graph(X=X_education_extreme, Y=Y_education_extreme, list_col_names=list_col_names, 
                                edges=edges_sex_valid)

network_job_valid = to_networkx(data=data_job_valid)
nx.draw(network_job_valid)

t_graph_1 = time.time()

print(f"Plotting the graph with {data_sex_valid.x.shape[0]} individuals took {(t_graph_1 - t_graph_0)/60} mn")

In [None]:
# here, gain a representation with only 10 individuals (and only 'sex' as edge)

t_graph_0 = time.time()

list_col_names = ['capital-gain', 'capital-loss',
       'hours-per-week', 'workclass', 'education', 'marital-status',
       'occupation', 'relationship', 'race', 'sex', 'native-country',
       'clients_id'] # to take only the likely 'relevant' features 'age', 'fnlwgt', 'education-num' as node

edges_sex_valid = add_new_edge(data=X_education_extreme, previous_edge=None, list_col_names=['sex'])

data_sex_valid = table_to_graph(X=X_education_extreme, Y=Y_education_extreme, list_col_names=list_col_names, 
                                edges=edges_sex_valid)

network_job_valid = to_networkx(data=data_job_valid)
nx.draw(network_job_valid)

t_graph_1 = time.time()

print(f"Plotting the graph with {data_sex_valid.x.shape[0]} individuals took {(t_graph_1 - t_graph_0)/60} mn")

In [None]:
# here, gain a representation with only 10 individuals (and only 'sex' as edge)

t_graph_0 = time.time()

list_col_names = ['age', 'fnlwgt', 'capital-gain', 'capital-loss',
       'hours-per-week', 'workclass', 'education', 'marital-status',
       'occupation', 'relationship', 'race', 'sex', 'native-country',
       'clients_id'] # to take only the likely 'relevant' feature 'education-num' as node

edges_sex_valid = add_new_edge(data=X_education_extreme, previous_edge=None, list_col_names=['sex'])

data_sex_valid = table_to_graph(X=X_education_extreme, Y=Y_education_extreme, list_col_names=list_col_names, 
                                edges=edges_sex_valid)

network_job_valid = to_networkx(data=data_job_valid)
nx.draw(network_job_valid)

t_graph_1 = time.time()

print(f"Plotting the graph with {data_sex_valid.x.shape[0]} individuals took {(t_graph_1 - t_graph_0)/60} mn")

Obviously, we have no clear intuition of what these links do correspond with... By individual, path from the sex to the income? But there are more groups than individuals here selected (10)...

## Constitute a graph - Try to connect the features 

Here, we proceed in 2 steps (back and forth)

1. **Detect the relations**
We use the partial dependance plots to inspect the correlations (pers) sufficient? Input intervention changes?

1. **Select the causal direction**
Based on the user's experience and expertise (e.g. sex -> education, because the contrary would be logically and temporally impossible)

At a first sight, look at correlated features (!) may be some hidden correlations => experience is still required at this stage:

In [None]:
# reconstitute the dataset to check the correlations

data_train_valid = X_train_valid.copy()
data_train_valid['target'] = Y_train_valid
data_train_valid

In [None]:
f = plt.figure(figsize=(19, 15))
plt.matshow(data_train_valid.corr(), fignum=f.number)
plt.xticks(range(df.select_dtypes(['number']).shape[1]), df.select_dtypes(['number']).columns, fontsize=14, rotation=45)
plt.yticks(range(df.select_dtypes(['number']).shape[1]), df.select_dtypes(['number']).columns, fontsize=14)
cb = plt.colorbar()
cb.ax.tick_params(labelsize=14)
plt.title('Correlation Matrix', fontsize=16)

In [None]:
import seaborn as sns

sns.set(style="ticks", color_codes=True)    
g = sns.pairplot(X_train_valid.filter(items=['education-num','education']))
plt.show()

In [None]:
g = sns.pairplot(X_train_valid.filter(items=['sex','age']))
plt.show()

In [None]:
from sklearn.inspection import PartialDependenceDisplay

# detect the relations: show the changes in predictions for the combinations of 2 features
fig, ax = plt.subplots(figsize=(8, 6))
f_names = [('sex', 'education')]
# Similar to previous PDP plot except we use tuple of features instead of single feature
disp4 = PartialDependenceDisplay.from_estimator(model, X_valid, f_names, ax=ax)
plt.show()