<a href="https://colab.research.google.com/github/alexandershopski/equalitychecker/blob/main/AD_Hackathon_Team_9.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Questions to answer, given hiring funnel data
1. How biased is the hiring funnel?
2. Which steps in the flow introduces most of the bias?
3. What can I do to reduce the bias introduced by this step?

In [None]:
import pandas as pd
from scipy import stats
from plotly import graph_objects as go

In [None]:
# Data courtesy of Omer Koren, CEO of Webiks: https://webiks.com/
# Trans & Non-Binary data added for illustration

recruitment_finnel_dict = {'Fullstack Dev':
                           {'Male':
                            {'CV': 1448,
                            'Phone Interview': 40,
                           'Professional Interview 1': 23,
                           'Professional Interview 2': 13,
                           'CEO Interview': 11,
                           'Offered': 8,
                           'Signed': 4},
                            'Female':
                            {'CV': 493,
                            'Phone Interview': 24,
                           'Professional Interview 1': 12,
                           'Professional Interview 2': 2,
                           'CEO Interview': 2,
                           'Offered': 2,
                           'Signed': 1},
                            'Non-Binary':
                            {'CV': 50,
                            'Phone Interview': 2,
                           'Professional Interview 1': 1,
                           'Professional Interview 2': 1,
                           'CEO Interview': 1,
                           'Offered': 0,
                           'Signed': 0},
                            'Trans': {'CV': 50,
                            'Phone Interview': 2,
                           'Professional Interview 1': 1,
                           'Professional Interview 2': 1,
                           'CEO Interview': 1,
                           'Offered': 1,
                           'Signed': 1},
                           },
                          'Data Scientist':
                           {'Male':
                            {'CV': 22,
                            'Phone Interview': 7,
                           'Professional Interview 1': 1,
                           'Professional Interview 2': 0,
                           'CEO Interview': 3,
                           'Offered': 0,
                           'Signed': 0},
                            'Female':
                            {'CV': 5,
                            'Phone Interview': 0,
                           'Professional Interview 1': 0,
                           'Professional Interview 2': 0,
                           'CEO Interview': 0,
                           'Offered': 0,
                           'Signed': 0}
                           }
}

In [None]:
fullstack_flow_df = pd.DataFrame(recruitment_finnel_dict['Fullstack Dev'])

fullstack_flow_df.eval("Percent_Female = Female * 100.0 / (Female + Male)")

Unnamed: 0,Male,Female,Non-Binary,Trans,Percent_Female
CV,1448,493,50,50,25.399279
Phone Interview,40,24,2,2,37.5
Professional Interview 1,23,12,1,1,34.285714
Professional Interview 2,13,2,1,1,13.333333
CEO Interview,11,2,1,1,15.384615
Offered,8,2,0,1,20.0
Signed,4,1,0,1,20.0


1. We can see that only 20% of the hires (1/5) are Female, despite the fact that 25.4% of applicants were Female.
2. We can also see that there is no representation of Non-Binary Genders in the dataset (which either speaks to the data collection process or to lack of participation from non-binary people at the process)
3. What are the reasons for disqualifying male vs female?

In [None]:
def generate_funnel(flow_df):
  fig = go.Figure()

  for gender in flow_df.columns:
    fig.add_trace(go.Funnel(
        name = gender,
        y = flow_df.index,
        x = flow_df[gender],
        textinfo = 'value+percent previous',
    ))

  fig.show()

In [None]:
generate_funnel(fullstack_flow_df)

In [None]:
def hiring_prob(flow_df, gender):
  top_of_funnel = fullstack_flow_df[gender][0]
  end_of_funnel = fullstack_flow_df[gender][-1]

  return end_of_funnel * 100.0 / top_of_funnel

In [None]:
for gender in fullstack_flow_df.columns:
  print(f"Prob. of getting hired, given that a {gender} person sent a CV: {round(hiring_prob(fullstack_flow_df, gender),2)}%")

Prob. of getting hired, given that a Male person sent a CV: 0.28%
Prob. of getting hired, given that a Female person sent a CV: 0.2%
Prob. of getting hired, given that a Non-Binary person sent a CV: 0.0%
Prob. of getting hired, given that a Trans person sent a CV: 2.0%


In [None]:
def compute_score(flow_df, marginalized_group, hegemonic_group):
  marg_group_prob = hiring_prob(flow_df, marginalized_group)
  heg_group_prob = hiring_prob(flow_df, hegemonic_group)

  return marg_group_prob * 100.0 / heg_group_prob

In [None]:
check_disc = 'Female'
baseline_reference = 'Male'

discrimination_score = compute_score(fullstack_flow_df, check_disc, baseline_reference)
print(f"Your score: {round(discrimination_score,2)}")
print(f"How to interpret your score? In your hiring flow {check_disc} candidates who are sending a CV are {round(100-discrimination_score,2)}%±5% less likely to get hired when compared to {baseline_reference} candidates with similar credentials.")

Your score: 73.43
How to interpret your score? In your hiring flow Female candidates who are sending a CV are 26.57%±5% less likely to get hired when compared to Male candidates with similar credentials.


# Remaining questions
1. How many "eligible" candidates are there prior to the CV stage? How different is the drop off in answering the job ad?
2. What happenes in "Proffesional Interview 2"?

## Biases in job ads (based on the [Gender Decoder](http://gender-decoder.katmatfield.com/))

### Methodology
1. Get ad 
2. If ad is not in English: translate to English (via Google Translate)
3. Feed ad to Gender Decoder and identify biases

### Use case 1: DS @ the Ministry of Justice
1. Ad: [DS @ the Ministry of Justice](https://www.linkedin.com/jobs/view/2228989246/?refId=2c659612-86f0-3204-854d-b67d06d53b1a)
2. Result: the Ministry of Justice ad is **strongly masculine-coded**) (see [full report](http://gender-decoder.katmatfield.com/results/e2fcad8c-bea6-4539-99f1-0e14c6589d50))

### Use case 2: Computer Vision Engineer @ Webiks
1. Ad: [Computer Vision Engineer @ Webiks](https://www.linkedin.com/jobs/view/2227800643/?refId=3845504191604142176566&trackingId=19bD1N3ahvrh%2BBsENcDXkw%3D%3D)
2. Result: the Webiks ad is **subtly feminine-coded**) (see [full report](http://gender-decoder.katmatfield.com/results/b844b2db-2d94-4e51-a4b3-aae06f00c37f)