**This notebook is to demonstrate the use of [Equity Lens](https://github.com/xie1027/DEI_Toolbox)**  





 

**`Date created`**: June 2, 2022

**`Date updated`**: September 5, 2022

**`Version`**: EquityLens 0.0.1


# Install the package

In [1]:
# private
#!pip install git+ssh://git@github.com/Citi-Ventures/EquityLens.git

# public
!pip3 install git+https://github.com/Citi-Ventures/EquityLens.git 

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/Citi-Ventures/EquityLens.git
  Cloning https://github.com/Citi-Ventures/EquityLens.git to /tmp/pip-req-build-rqpa8ryx
  Running command git clone -q https://github.com/Citi-Ventures/EquityLens.git /tmp/pip-req-build-rqpa8ryx
Building wheels for collected packages: EquityLens
  Building wheel for EquityLens (setup.py) ... [?25l[?25hdone
  Created wheel for EquityLens: filename=EquityLens-0.0.1-py3-none-any.whl size=6661 sha256=166f420769f538b26aebd46b35fa822972582b076c41e7b86a2347217b4abe39
  Stored in directory: /tmp/pip-ephem-wheel-cache-6iddfl3x/wheels/58/53/d4/94daa04b8e3cef2a9918357fcdba9d69a67e0b39120213a649
Successfully built EquityLens
Installing collected packages: EquityLens
Successfully installed EquityLens-0.0.1


In [2]:
import EquityLens
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

# **Feature 1:** Evaluate a company's DEI efforts in three dimensions


In [3]:
# Private Repo
# replace this by your token
# token = 'replace-with-your-token'
# EquityLens.functions.company_dei_score('1PM Industries, Inc.', token)
# EquityLens.functions.company_dei_score('Citigroup',token)

In [5]:
# Public Repo
EquityLens.functions.company_dei_score('1PM Industries, Inc.')

## 1PM Industries, Inc. 
      Diversity and Inclusion Index, 1-100:

+----------------------------+-------+
|                            | Score |
+----------------------------+-------+
| DEI                        |   [1;31m42[0m  |
| Gender Diversity           |   [1;31m39[0m  |
| Racial Diversity           |   [1;31m44[0m  |
| Gender Attrition Inclusion |   [1;31m41[0m  |
| Racial Attrition Inclusion |   [1;32m55[0m  |
| Gender Resource Disparity  |   [1;31m39[0m  |
| Racial Resource Disparity  |   [1;32m59[0m  |
+----------------------------+-------+


In [6]:
EquityLens.functions.company_dei_score('Citigroup')

## Citigroup Inc. 
      Diversity and Inclusion Index, 1-100:

+----------------------------+-------+
|                            | Score |
+----------------------------+-------+
| DEI                        |   [1;32m98[0m  |
| Gender Diversity           |   [1;32m92[0m  |
| Racial Diversity           |   [1;32m95[0m  |
| Gender Attrition Inclusion |   [1;32m74[0m  |
| Racial Attrition Inclusion |   [1;32m80[0m  |
| Gender Resource Disparity  |   [1;32m92[0m  |
| Racial Resource Disparity  |   [1;32m99[0m  |
+----------------------------+-------+


In [7]:
EquityLens.functions.company_dei_score('facebook')

## Facebook, Inc. 
      Diversity and Inclusion Index, 1-100:

+----------------------------+-------+
|                            | Score |
+----------------------------+-------+
| DEI                        |   [1;32m98[0m  |
| Gender Diversity           |   [1;32m62[0m  |
| Racial Diversity           |   [1;32m98[0m  |
| Gender Attrition Inclusion |   [1;32m74[0m  |
| Racial Attrition Inclusion |   [1;32m80[0m  |
| Gender Resource Disparity  |   [1;32m62[0m  |
| Racial Resource Disparity  |   [1;32m97[0m  |
+----------------------------+-------+




---






# **Feature 2:** Evaluate data samples' representation, Generate proxies for protected class indicators

In [8]:
company_list = ['FLWS', 'ATNF', 'RETC', 'ONCP', 'RTNB', 'C']
EquityLens.functions.sample_dei_score(company_list)

### Diversity and Inclusion Median Scores, 1-100:

+----------------------------+-------+
|                            | Score |
+----------------------------+-------+
| Gender Diversity           |   [1;32m73[0m  |
| Racial Diversity           |   [1;31m35[0m  |
| Gender Attrition Inclusion |   [1;31m49[0m  |
| Racial Attrition Inclusion |   [1;31m46[0m  |
| Gender Resource Disparity  |   [1;32m73[0m  |
| Racial Resource Disparity  |   [1;31m43[0m  |
+----------------------------+-------+


<font color='green'>Pre-processing: Potential biases in the dataset

1.   Lack of racial diversity
2.   Attrition rates have large variance across groups
3.   Dispairy in racial resources
   
</font> 









# **Feature 3:** Compute DEI scores

This function takes a list of a group aggregated statistics and calculates its DEI score based on a specific definition of Diversity and Inclusion.

Input: 
1.   *var_list*(list) : a list of group numbers 
2.   *definition*(str): 'variety',  'separation', 'disparity'
3.   *industry*(str): industry name

Output: DEI score



In [9]:
var_list = [0.11853823, 0.061011526, 0.114244634, 2.617285942]
print(EquityLens.functions.compute_score(var_list, 'variety' , 'Finance'))
print(EquityLens.functions.compute_score(var_list, 'separation' , 'Tech'))
print(EquityLens.functions.compute_score(var_list, 'disparity' , 'Health Care'))

0.8119773001900322
1.2599489516147993
1.7312458715272572


**Next step: calculate the index within its industry and compare it with the baseline**

# **Feature 4**: EquityLens

## **Equity Algorithm:**

1. Given a protect class: P
2. Calculate the top 10% of the features with statistically significant difference in mean
3. Equity Correction, 'premium' or 'swap'


*   *Premium*: add back the difference (premium) to the feature among the unprivileged groups
*   *Swap*: counterfactual transformation: replace the values among the unprivileged group with their closest neighbor in the privileged group (matching: knn, synthetic control)

## **Example: German Credit Data**

In [10]:
import pandas as pd
import numpy as np
from scipy import stats
from IPython.display import Markdown, display

In [11]:
# use german credit data: https://archive.ics.uci.edu/ml/datasets/statlog+(german+credit+data)
german = pd.read_csv('https://raw.githubusercontent.com/Citi-Ventures/EquityLens/main/EquityLens/datasets/german.csv')

In [12]:
# simply feature engineering, convert category var --> numeric
# pretected: sex: 0 male,  1 female
german['sex'] = german['Personal status and sex'].apply(lambda x: 0 if x == 'A91' or x == 'A93' or x == 'A94' 
                                                        else (1 if x == 'A92' or x == 'A95' else np.nan))

german['foreign_worker_n'] = german['foreign worker'].apply(lambda x: 1 if x == 'A201' else(0 if x == 'A202' else(np.nan)))
german['checking_n'] = german['checking'].apply(lambda x: 0 if x == 'A14' else(1 if x == 'A11' 
                                                                             else(2 if x == 'A12' 
                                                                                  else(3 if x == 'A13' else np.nan))))
german['employment_since_n'] = german['employment since'].apply(lambda x: 0 if x == 'A71' else(1 if x == 'A72' 
                                                                             else(2 if x == 'A73' 
                                                                                  else(3 if x == 'A74' 
                                                                                       else (4 if x =='A75' else np.nan)))))
german['Property_n'] = german['Property'].apply(lambda x: 0 if x == 'A124' else(1 if x == 'A123' 
                                                                             else(2 if x == 'A122' 
                                                                                  else(3 if x == 'A121' 
                                                                                       else np.nan))))
german['Credit_history_n'] = german['Credit history'].apply(lambda x: 0 if x == 'A34' else(1 if x == 'A33' 
                                                                             else(2 if x == 'A32' 
                                                                                  else(3 if x == 'A31' 
                                                                                       else (4 if x =='A30' else np.nan)))))



**1. T-test on the protected class**

In [13]:
# label_maps': 1.0: 'Good Credit', 2.0: 'Bad Credit'
protected_class = 'sex'
outcome_var = 'label'
df = german

p_val, diff = EquityLens.functions.ttest_var(protected_class, outcome_var, df)
display(Markdown("#### Original dataset"))
print("T-test result between unprivileged and privileged groups:  p-value = %g" % ( round(p_val,2)),  '; Diff:', round(diff,2))

#### Original dataset

T-test result between unprivileged and privileged groups:  p-value = 0.02 ; Diff: 0.07


**2. Equity Lens: Identify the bias and Root Cause Analysis**

In [14]:
import warnings
warnings.filterwarnings("ignore")

protected_class = 'sex'
features = ['checking_n', 'Duration in month', 'Credit_history_n',  'Credit amount', 'employment_since_n', 'Installment rate ',  
            'Present residence since', 'Property_n', 'Number of existing credits', 'Number of people',  'foreign_worker_n']
df = german

new_features = EquityLens.functions.Equity_Lens(protected_class, features, df)

The differences in sex are significant
Most differences in sex come from:

   Number of people
employment_since_n


**Root Cause Analysis**

In [15]:
display(Markdown("####Top features that show variations:"))
for key in new_features.keys():
    print(key)
    german[key] = new_features[key]

####Top features that show variations:

Number of people_new
employment_since_n_new


**3. Equity Correction (premium): Counterfactual Prediction**

In [16]:
p_val, diff = EquityLens.functions.ttest_var('sex', 'employment_since_n', german)
display(Markdown("#### Original dataset in protected class {}:".format('[Sex]')))
print("ttest_ind_from_stats:  p-value = %g" % ( round(p_val,2)),  '\nDiff:', round(diff,2))

#### Original dataset in protected class [Sex]:

ttest_ind_from_stats:  p-value = 0 
Diff: -0.51


**Use the new feature**

In [17]:
p_val, diff = EquityLens.functions.ttest_var('sex', 'employment_since_n_new', german)
display(Markdown("#### Transformed dataset in protected class {}:".format('[Sex]')))
print("ttest_ind_from_stats:  p-value = %g" % ( round(p_val,2)),  '\nDiff:', round(diff,2))

#### Transformed dataset in protected class [Sex]:

ttest_ind_from_stats:  p-value = 1 
Diff: 0.0


**<h1><center>Voilà!</center></h1>**


