### What about the sociodemographic background

So far the linear regression models take in account almost entirely the features `salary beginn`, `jobcatagory` and `education degree` as the most significant ones. But what about `gender` and the `minority` classification? 

In [15]:
import pandas as pd
import numpy as np

import seaborn as sns
import plotly.express as plx

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import statsmodels.api as sms
import scipy.stats as stats

In [16]:
%store -r df

In [3]:
df.groupby("GENDER_DENOTING").mean()["SALARY"]

GENDER_DENOTING
Female    26031.921296
Male      41441.782946
Name: SALARY, dtype: float64

How can this difference in the salary mean between female and male salary be explained?

In [4]:
df.groupby("MINORITY_DENOTING").mean()["SALARY"]

MINORITY_DENOTING
Minority    28713.942308
White       36023.310811
Name: SALARY, dtype: float64

The same hold for the difference in salary mean between minorities and white employees.

In [17]:
df.groupby("SOCIODEMOGRAPHY_DENOTING").mean()["SALARY"]

SOCIODEMOGRAPHY_DENOTING
Minority_Female    23062.500000
Minority_Male      32246.093750
White_Female       26706.789773
White_Male         44475.412371
Name: SALARY, dtype: float64

And here we can see that white male employees are those who benefit most from the salary policy of the bank - they earn roughly more than a doubled income as non-white female classified persons do.

Looking at the median instead of the mean value only slightly changes the picture:

In [22]:
df.groupby("GENDER_DENOTING").describe()["SALARY"]["50%"]

GENDER_DENOTING
Female    24300.0
Male      32850.0
Name: 50%, dtype: float64

In [23]:
df.groupby("MINORITY_DENOTING").describe()["SALARY"]["50%"]

MINORITY_DENOTING
Minority    26625.0
White       29925.0
Name: 50%, dtype: float64

In [24]:
df.groupby("SOCIODEMOGRAPHY_DENOTING").describe()["SALARY"]["50%"]

SOCIODEMOGRAPHY_DENOTING
Minority_Female    23775.0
Minority_Male      29025.0
White_Female       24450.0
White_Male         36000.0
Name: 50%, dtype: float64