# Week 4 Activity: Reflection on a Hiring Filter Algorithm 
Activity adapted from exercise developed by [Evan Peck](https://evanpeck.github.io/)

## Scenario: Moogle’s Hiring Filter
Imagine you are working for Moogle, a well-known tech company that receives tens of thousands of job applications from graduating seniors every year. Since the company receives too many job applications for HR to individually assess in a reasonable amount of time, you are asked to create a program that algorithmically analyzes applications and selects the ones most worth passing onto HR.



### Applicant Data
It’s difficult to create these first-pass cuts, so Moogle designs their application forms to get some numerical data about their applicants’ education. Job applicants must enter the grades they received in 6 core CS courses, as well as their overall GPA. For your convenience, this will be stored in a python `list` that you can access. 

For example, a student who received the following scores. . .
- Intro to CS: 100
- Data Structures: 95
- Software Engineering: 80
- Algorithms: 89
- Computer Organization: 91
- Operative Systems: 75
- Overall GPA: 83

. . . would result in the following list: `[100, 95, 80, 89, 91, 75, 83]`. You can assume that index `0` is always Intro to CS, `1` is always Data Structures, and so on.

Because you are processing many applications, your program will receive a list of lists. For example, this would be the information for 3 applicants:

`[ [100, 95, 80, 89, 91, 75, 83], [75, 80, 85, 90, 85, 88, 90], [85, 70, 99, 100, 81, 82, 91] ]`

Remember that a list is a collection of items that you can access by selecting the index of the item you want. You can also change a list, iterate over a list, and more. For this exercise we will be focusing on how to access items within a list and iterate over a list so that you can access each item individually. Let's practice that again here. 

In [5]:
## Given this example list below, how would you access the last item in the list?

example_list = ['apple', 'banana', 'cherry', 'blueberry', 'kiwi', 'mango']

# Your code here

last_fruit = example_list[-1]
print(last_fruit)

mango


In [6]:
## Okay given, that same list, how could you select the third item in the list?

# Your code here 

third_fruit = example_list[2]
print(third_fruit)

cherry


In [13]:
## Finally let's try to do something with the items in the list. Let's say we want to print each item in the list 
# along with how many characters are in that item. For example, for "apple" we would want to print "apple has 5 characters". 
# How could we do that?

# Your code here

for fruit in example_list:
    print(fruit, "has", len(fruit), "characters")

apple has 5 characters
banana has 6 characters
cherry has 6 characters
blueberry has 9 characters
kiwi has 4 characters
mango has 5 characters


Okay back to the task at hand!

Your job is to:
1. Determine how you are going to select the top applicants to pass onto HR.
2. Given a list of applicant data (a list of lists), write code to identify a new list of worthwhile candidates.

### The Data 

Before we use the entire dataset of applications, we're going to write and test our code using a much smaller sample of the dataset. This will be saved in `sample_data` and contain only ten applicant lists. Notice how this is just a list of lists with each list being a unique applicant. 

In [16]:
sample_data = [[93, 89, 63, 88, 60, 73, 80], [100, 63, 57, 96, 58, 71, 78], [81, 91, 99, 78, 57, 87, 86], [81, 73, 100, 57, 91, 60, 66], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [55, 74, 68, 55, 69, 94, 80], [64, 77, 75, 92, 77, 72, 83], [95, 58, 92, 62, 77, 64, 59], [94, 78, 84, 83, 68, 63, 76]]

In [17]:
len(sample_data)

10

### Algorithms 

Your supervisor at Moogle has some ideas on how they'd like to filter out applicants, but they're not sure how useful each idea is. We are first going to write algorithms to select applicants based on a variety of decisions. For each one, take note of how many applicants are passed onto the next stage of the application process. 

1. Selects applicants that have an overall GPA above 80
2. Selects applicants that have no grade below 65
3. Selects applicants that have at least 3 grades above 80
4. Selects applicants that have an average of the six classes above 85
5. Your own algorithm to select applicants 

#### Scenario 1: Applicants that have an overall GPA above 80. 

Let's walk through this one together. We want to select just the students who have an overall GPA above 80. Which index in the list contains the GPA?

In [18]:
gpa_index = -1 # which index contains the GPA?

In [22]:
## Now let's write a loop to go through each applicant in the sample data and check if their GPA is above 80. 
# If it is, we will add them to a new list called selection.

selection = list() ## create a empty list to hold the selected applicants

for applicant in sample_data:  ## loop through each applicant in the sample data
    gpa = applicant[gpa_index] ## get the GPA for the applicant using the gpa_index variable you defined above
    if gpa > 80: ## check if the GPA is above 80
        selection.append(applicant) ## if it is, add the applicant to the selection list

print(f"Number of applicants selected: {len(selection)}")


Number of applicants selected: 4


#### How many applicants did we select with this filter? 


    We selected 4 applicants.


What percentage of our total sample size is that?


    That's 40% of our total sample size.




#### Scenario 2: Applicants that have no grade below a 65. 
Let's try another potential algorithm where we select applicants that have no grade below a 65. 

Remember we want to collect our selected applicants in a list. 

In [38]:
## YOUR CODE HERE 
gpa_index = -1
below_65 = False

selection = list() ## create a list to hold the selected applicants

for application in sample_data: ## loops through every applicant in sample data 
    grades = application[0:gpa_index]
    
    for grade in grades:
       if grade < 65:
           below_65 = True
           break
       else:
           below_65 = False
           
    if below_65 == False:
        selection.append(application)
    

print(f"Number of applicants selected: {len(selection)}")

selection

Number of applicants selected: 0


[]

How many applicants did we select with this filter? 


    No Applicants


What percentage of our total sample size is that?


    0%



#### Scenario 3: Applicants that have a least 3 grades above an 80.

**Hint**: A counter can be useful here! 

In [47]:
gpa_index = -1

selection = list() ## create a list to hold the selected applicants

for application in sample_data: ## loops through every applicant in sample data 
    grades = application[0:gpa_index]
    counter = 0
    
    for grade in grades:
       if grade > 80:
           counter += 1
           
    if counter >= 3:
        selection.append(application)
    

print(f"Number of applicants selected: {len(selection)}")

selection

Number of applicants selected: 6


[[93, 89, 63, 88, 60, 73, 80],
 [81, 91, 99, 78, 57, 87, 86],
 [81, 73, 100, 57, 91, 60, 66],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90],
 [94, 78, 84, 83, 68, 63, 76]]

###### How many applicants did we select with this filter? 


    6 applicants


What percentage of our total sample size is that?


    60% of the total sample size



#### Scenario 4: Applicants that have an average of the six classes above 85.

In [49]:
## Your code here
gpa_index = -1

selection = list() ## create a empty list to hold the selected applicants

for applicant in sample_data:  ## loop through each applicant in the sample data
    gpa = applicant[gpa_index] ## get the GPA for the applicant using the gpa_index variable you defined above
    if gpa > 85: ## check if the GPA is above 85
        selection.append(applicant) ## if it is, add the applicant to the selection list

print(f"Number of applicants selected: {len(selection)}")
selection


Number of applicants selected: 3


[[81, 91, 99, 78, 57, 87, 86],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90]]

#### How many applicants did we select with this filter? 


    3 applicants


What percentage of our total sample size is that?


    30% of total sample size



Now let's take a look at the entire dataset of 5,000 applications. This is saved in another file so we're going to go ahead and load this into memory and then take a look at the data. Notice that it's formatted exactly the same as the sample data as a list of lists!

In [58]:
from applications import *

In [59]:
#applications

In the space below, work in group to decide what types of criteria you want to use to write an algorithm to select applicants. First test it with the sample data then run it with the entire set of applicants.  

A useful piece of code that will give you the percentage of applicants you kept is: 

`print("Your algorithm kept", round(len(selection)/len(applications)*100), "percent of applicants")`

In [62]:
## your algorithm - your code here. 
gpa_index = -1
below_75 = False

selection = list() ## create a list to hold the selected applicants

for application in applications: 
    grades = application[0:gpa_index]
    
    for grade in grades:
       if grade < 75:
           below_75 = True
           break
       else:
           below_75 = False
           
    if below_75 == False:
        selection.append(application)

print("Percentage of applicants with at least one grade over 75:", round(len(selection)/len(applications)*100), "percent of applicants")
selection

Percentage of applicants with grades over 75: 3 percent of applicants


[[82, 96, 79, 89, 87, 93, 61],
 [87, 100, 93, 82, 87, 81, 83],
 [93, 91, 83, 75, 94, 81, 89],
 [92, 85, 90, 99, 87, 85, 79],
 [93, 99, 92, 91, 88, 91, 100],
 [79, 95, 95, 99, 87, 75, 57],
 [96, 91, 85, 91, 92, 80, 93],
 [98, 94, 95, 95, 76, 99, 90],
 [98, 84, 98, 77, 91, 94, 94],
 [100, 75, 92, 81, 79, 99, 83],
 [83, 79, 90, 82, 82, 91, 60],
 [94, 78, 79, 100, 76, 76, 91],
 [77, 83, 88, 91, 95, 81, 94],
 [93, 82, 78, 97, 92, 86, 81],
 [87, 99, 87, 80, 89, 95, 61],
 [95, 80, 84, 79, 85, 78, 99],
 [90, 96, 76, 94, 75, 82, 55],
 [96, 92, 96, 87, 84, 98, 100],
 [98, 80, 77, 99, 76, 77, 87],
 [88, 86, 77, 82, 91, 99, 61],
 [75, 95, 80, 82, 85, 80, 78],
 [84, 76, 90, 85, 85, 87, 82],
 [82, 78, 84, 98, 75, 89, 84],
 [99, 95, 80, 90, 77, 84, 63],
 [81, 89, 100, 89, 79, 77, 66],
 [75, 97, 76, 77, 97, 96, 66],
 [79, 75, 88, 99, 92, 95, 70],
 [78, 100, 97, 99, 75, 85, 71],
 [94, 90, 87, 100, 89, 94, 83],
 [95, 100, 98, 89, 84, 87, 90],
 [92, 76, 98, 75, 77, 82, 97],
 [76, 80, 78, 86, 75, 86, 88],

Questions to Answer: 
1. What criteria did you choose to select finalists? How did you choose that criteria?

I chose to filter out applicants with at least one grade over 75 given that this is a challenging course load and overall, reflects well on the other grades. I looped over the grades for each applicant and when there was one grade over 75, it included that applicant on the list.


2. Roughly what percentage of applicants does your algorithm pass on as finalists? Is that enough? If Moogle asked you to take a more aggressive approach with your algorithm, are there any tradeoffs?

Only 3% are finalists, I think that's pretty good overall given that it's a big number of applicants, and though 3% sounds low, it is a good amount of finalists. If I had to take a more aggressive approach, there would be a lower percentage of applicants chosen which isn't too great. 

___
While our data seemed to perfectly reported and without any inconsitencies, the world is less perfect. Consider the following scenarios: 

___
*Story 1*: Misread the Instructions
What if an excellent applicant thinks they should put in letter grades?

`[‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’]`

. . . or how about their grades on 4-point scale?

`[4, 3.9, 4, 4, 3.95, 4, 3.9]`

___
*Story 2*: Bad Assumptions
What if one of your applicants skipped Intro to Computer Science? When they saw your form, they froze, and decided that putting -1 in the input field would make it obvious. . .

`[-1, 95, 99, 94, 96, 98, 95]`

___
*Story 3*: Mistake in the Input
What if one of your applicants accidentally put in a number > 100?

`[681, 68, 73, 70, 81, 91, 59]`

That might seem easy enough for a program to catch, but what if they accidentally dropped a 0?

`[100, 100, 100, 100, 100, 100, 10]`

A person would catch that mistake easily, does your algorithm?

___
*Story 4*: The Awful Semester
What if your applicant had a medical emergency one semester? Or a personal tragedy?

`[95, 93, 50, 91, 98, 90, 90]`

___
*Story 5*: Inverse Trajectories
What if one of your applicants came from an underprivileged background and really struggled at the beginning of college. . . but showed extraordinary growth by the end?

`[65, 75, 85, 95, 100, 100, 80]`

What if one of your applicants came to college with extraordinary potential? They easily aced their first few classes and then gradually grew apathetic about their education - getting nothing but barely-passing grades by the time they were a senior?

`[100, 100, 95, 85, 75, 65, 80]`

Does your algorithm treat them equally?

___

Complete the following questions reflecting on these scenarios:

3. What systemic advantages/disadvantages are your algorithms likely to amplify?

The systematic disadvantages my algorithm is likely to amplify is the fact that it choosen an applicant based off of one grade and that may not be reflected on the other grades, it might've been their lowest or highest grade. 



4. If you had access to additional data beyond grades (e.g., extracurricular activities, internships, letters of recommendation), how might you incorporate it into your selection process? Would it make your algorithm fairer or introduce new biases?


I would like to see the number of letters of recommendations and look at the positive characters within those letters to get a better understanding of how that student is described. I think it may introduce new biases however given that some students may not have been able to take advantage of the same opportunities or in as many opportunities as other students due to family responsibilities. 



5. How do current hiring filter algorithms work? What problems do they encounter? How do these algorithms broadly compare to the ones we wrote today? (Some example articles discussing hiring filters linked below in citations but you're not limited to these examples) 


In the example of the Amazon recruiting tool, there was a bias against women given that more male applicants were in the pool and the algorithm trained itself to believe that male applicants were superior to female applicants. This is a huge problem because the algorithm learns through patterns and these patterns reflect discrimination in workplaces which he hiring process is an important step in working against this.


6. What does it mean to design a fair algorithm?


Designing a fair algorithm involves taking into account every aspect of natural patterns in the world that are placed as a cause of underresourced communities or simply a lack of a certain group of people in specific fields that may train the algorithm to think of one group better fit for one job than another taking into account these patterns more than other factors. Fair algorithms should understand the various factors and challenges that people face which is case-by-case dependent. 


Citations: 

Stacy A. Doore, Casey Fiesler, Michael S. Kirkpatrick, Evan Peck, and Mehran Sahami. 2020. Assignments that Blend Ethics and Technology. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ‘20). Association for Computing Machinery, New York, NY, USA, 475–476. DOI:https://doi.org/10.1145/3328778.3366994

[Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G/)

[Hiring Algorithms Are Not Neutral](https://hbr.org/2016/12/hiring-algorithms-are-not-neutral)

[Can an Algorithm Hire Better Than a Human?](https://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html)

[Algorithms in Hiring](https://blog.learningcollider.org/algorithms-in-hiring-6760ea8869b)

[Exploration-based algorithms can improve hiring quality and diversity](https://mitsloan.mit.edu/ideas-made-to-matter/exploration-based-algorithms-can-improve-hiring-quality-and-diversity)

[AI hiring tools may be filtering out the best job applicants](https://www.bbc.com/worklife/article/20240214-ai-recruiting-hiring-software-bias-discrimination)

[Challenges for mitigating bias in algorithmic hiring](https://www.brookings.edu/articles/challenges-for-mitigating-bias-in-algorithmic-hiring/)

