# Week 3 Activity: Reflection on a Hiring Filter Algorithm 
Activity adapted from exercise developed by [Evan Peck](https://evanpeck.github.io/)

## Scenario: Moogle’s Hiring Filter
Imagine you are working for Moogle, a well-known tech company that receives tens of thousands of job applications from graduating seniors every year. Since the company receives too many job applications for HR to individually assess in a reasonable amount of time, you are asked to create a program that algorithmically analyzes applications and selects the ones most worth passing onto HR.



### Applicant Data
It’s difficult to create these first-pass cuts, so Moogle designs their application forms to get some numerical data about their applicants’ education. Job applicants must enter the grades they received in 6 core CS courses, as well as their overall GPA. For your convenience, this will be stored in a python `list` that you can access. 

For example, a student who received the following scores. . .
- Intro to CS: 100
- Data Structures: 95
- Software Engineering: 80
- Algorithms: 89
- Computer Organization: 91
- Operative Systems: 75
- Overall GPA: 83

. . . would result in the following list: `[100, 95, 80, 89, 91, 75, 83]`. You can assume that index `0` is always Intro to CS, `1` is always Data Structures, and so on.

Because you are processing many applications, your program will receive a list of lists. For example, this would be the information for 3 applicants:

`[ [100, 95, 80, 89, 91, 75, 83], [75, 80, 85, 90, 85, 88, 90], [85, 70, 99, 100, 81, 82, 91] ]`

### Your Task
Your job is to:
1. Determine how you are going to select the top applicants to pass onto HR.
2. Given a list of applicant data (a list of lists), write code to identify a new list of worthwhile candidates.

### The Data 

Before we use the entire dataset of applications, we're going to write and test our code using a much smaller sample of the dataset. This will be saved in `sample_data` and contain only ten applicant lists. Notice how this is just a list of lists with each list being a unique applicant. 

In [1]:
sample_data = [[93, 89, 63, 88, 60, 73, 80], [100, 63, 57, 96, 58, 71, 78], [81, 91, 99, 78, 57, 87, 86], [81, 73, 100, 57, 91, 60, 66], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [55, 74, 68, 55, 69, 94, 80], [64, 77, 75, 92, 77, 72, 83], [95, 58, 92, 62, 77, 64, 59], [94, 78, 84, 83, 68, 63, 76]]

In [4]:
len(sample_data)

10

Now let's take a look at the entire dataset of 5,000 applications. This is saved in another file so we're going to go ahead and load this into memory and then take a look at the data. Notice that it's formatted exactly the same as the sample data as a list of lists!

In [38]:
from applications import *

In [39]:
applications

[[93, 89, 63, 88, 60, 73, 80],
 [100, 63, 57, 96, 58, 71, 78],
 [81, 91, 99, 78, 57, 87, 86],
 [81, 73, 100, 57, 91, 60, 66],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90],
 [55, 74, 68, 55, 69, 94, 80],
 [64, 77, 75, 92, 77, 72, 83],
 [95, 58, 92, 62, 77, 64, 59],
 [94, 78, 84, 83, 68, 63, 76],
 [82, 96, 79, 89, 87, 93, 61],
 [63, 92, 79, 86, 58, 79, 69],
 [87, 73, 62, 59, 77, 94, 82],
 [92, 60, 81, 85, 61, 58, 81],
 [99, 66, 98, 60, 96, 80, 91],
 [56, 76, 76, 88, 73, 72, 91],
 [77, 75, 83, 60, 95, 75, 95],
 [55, 90, 80, 90, 78, 99, 70],
 [59, 57, 61, 69, 93, 88, 96],
 [80, 76, 91, 71, 89, 78, 59],
 [96, 66, 91, 95, 55, 77, 90],
 [68, 77, 70, 79, 59, 88, 97],
 [93, 78, 78, 71, 58, 92, 72],
 [84, 71, 69, 99, 63, 100, 67],
 [79, 81, 74, 91, 66, 89, 62],
 [80, 88, 80, 60, 81, 72, 66],
 [70, 63, 57, 88, 81, 61, 92],
 [81, 100, 86, 97, 72, 71, 58],
 [79, 99, 70, 72, 76, 66, 70],
 [94, 97, 68, 55, 88, 91, 70],
 [66, 89, 97, 66, 90, 71, 100],
 [61, 55, 76, 56, 59, 73, 71],
 [5

In [5]:
len(applications)

5000

### Algorithms 

Now we're going to write algorithms to select applicants based on a variety of decisions. For each one, take note of how many applicants are passed onto the next stage of the application process. Think through the decisions being made and how these decisions might enforce or reduce systemic and cultural oppression. 

1. Selects applicants that have an overall GPA above 80
2. Selects applicants that have no grade below 65
3. Selects applicants that have at least 5 grades above 80
4. Selects applicants that have an average of the six classes above 85
5. Your own algorithm to select applicants 

In [16]:
## selection Scenario 1 = GPA above 80
selection = list() #creatinng a new var that will become the final list of applicants

#for applicant in sample:data 
#   print(applicant[6]) prints out the applicants

for applicant in sample_data: ##loops through each applicant in dataset
    if applicant[6] > 80: #this only selects applicants with a GPA above 80
        print(applicant) #only prints the scores of applicants that meet this criteria 
        selection += [applicant] #adds these applicants to a list of list (selectGrade)

#print finalized selections
print(selection)

    

[81, 91, 99, 78, 57, 87, 86]
[86, 89, 64, 81, 69, 93, 92]
[78, 63, 88, 95, 59, 98, 90]
[64, 77, 75, 92, 77, 72, 83]
[[81, 91, 99, 78, 57, 87, 86], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [64, 77, 75, 92, 77, 72, 83]]


How many applicants did we select with this filter? What percentage of our total sample size is that?  

Let's try another potential algorithm where we select applicants that have no grade below a 65. 

In [None]:
## select Scenario 2 - no grade below 65

selection = list() ## create a list to hold the selected applicants

for applicant in sample_data: 
    aboveSixtyFive = True
    for grade in applicant[0:6]: ## check each grade for an applicant 
        if grade < 65: ## if applicant has grade below 65
            aboveSixtyFive = False ## it will set applicant to false if not get grade below 65
    if aboveSixtyFive == True: 
        selection += [applicant] 

print(selection)


Now let's try to create a filter for selecting applicants that have at least five grades above 85. **Hint**: A counter can be useful here! 

In [None]:
# Scenario 3 - at least 5 grades above 85
selectApp = list()

for applicant in sample_data: 
    above85 = 0 ## set zero before doing any looping
    for grade in applicant[0:6]: ## loop through each grade in an applicant
        if grade > 85:
            above85 += 1 #counting by adding 1 
    if above85 >= 5: 
        selectApp += [applicant]

selectApp

Finally, let's write a algorithm that selects applicants if they have an average grade of at least 85 across their six classes. 

In [32]:
##Scenario 4 - average grade of at least 85 across six classes 

selection = list()

for applicant in sample_data:
    above85 = 0
    for grade in applicant[0:6]: #goes through each grade from class and adds to above85 var
        above85 += grade 
    avgGrade = above85/len(applicant[0:6]) #finding the average of the six classes 
    if avgGrade >= 85:
        selection += [applicant] #want to add in a list

selection

[]

In the space below, work in group to decide what types of criteria you want to use to write an algorithm to select applicants. First test it with the sample data then run it with the entire set of applicants.  

A useful piece of code that will give you the percentage of applicants you kept is: 

`print("Your algorithm kept", round(len(selection)/len(applications)*100), "percent of applicants")`

In [41]:
## Personal Hiring Algorithm 

## Criteria 1: Any person with an overall GPA less than 80 is automatically cut
finalist = list()

for applicant in applications:
    if applicant[6] > 80:
        finalist += [applicant]

## Criteria 2: Applicant must have at least 2 grades at least 90
finalist = list()

for applicant in applications:
    above90 = 0
    for grade in applicant:
        if grade >= 90:
            above90 += 1
    if above90 >= 2:
        finalist += [applicant]
    
 
## Criteria 3: Last three classes taken are weighted 10% higher, average grade of the 6 classes must be at least 85
finalist = list() 

for applicant in applications:
    avg = 0
    count = 0
    for grade in applicant[0:6]: # looping only through the grades, disregarding GPA
        if count < 3:
            avg += grade # adds regular grade to total 
        if count >= 3: 
            avg += (grade*1.1) #add weight of 10%
        count += 1
    appAvg = avg/6 #finding the average of each applicant with the weighted grades
    if appAvg >= 85: 
        finalist += [applicant]

## See list of which applicants made it to the end
print(finalist) 


## printing to see how many applicants left 
print("Your algorithm kept", round(len(finalist)/len(applications)*100), "percent of applicants")




[[81, 91, 99, 78, 57, 87, 86], [82, 96, 79, 89, 87, 93, 61], [99, 66, 98, 60, 96, 80, 91], [55, 90, 80, 90, 78, 99, 70], [84, 71, 69, 99, 63, 100, 67], [81, 100, 86, 97, 72, 71, 58], [94, 97, 68, 55, 88, 91, 70], [94, 78, 70, 88, 76, 93, 70], [87, 100, 93, 82, 87, 81, 83], [69, 96, 70, 90, 91, 91, 70], [97, 78, 81, 92, 61, 91, 84], [93, 58, 97, 85, 92, 85, 79], [85, 85, 93, 77, 100, 59, 100], [92, 60, 84, 100, 94, 63, 60], [87, 67, 92, 88, 98, 76, 76], [75, 99, 67, 79, 86, 93, 95], [72, 76, 99, 88, 88, 76, 56], [96, 85, 61, 81, 87, 90, 98], [88, 68, 100, 69, 91, 77, 77], [93, 91, 83, 75, 94, 81, 89], [70, 75, 82, 86, 97, 84, 76], [74, 61, 100, 91, 84, 89, 96], [76, 89, 78, 98, 70, 97, 77], [100, 95, 61, 99, 91, 98, 75], [69, 65, 100, 70, 87, 97, 71], [57, 83, 99, 95, 92, 79, 62], [95, 96, 96, 85, 60, 69, 79], [90, 81, 71, 89, 65, 94, 66], [95, 99, 85, 98, 59, 93, 73], [92, 85, 90, 99, 87, 85, 79], [88, 70, 99, 92, 67, 96, 89], [91, 57, 88, 96, 90, 90, 80], [89, 66, 92, 69, 79, 99, 79],

Questions to Answer: 
1. What criteria did you choose to select finalists? How did you choose that criteria?

I chose to have three criteria for my hiring algorithm. First I decided that anyone with a GPA less than 80 was immediately removed. This was chosen because I thought that GPA, which encompases all classes taken, should still be high so that you are only looking at applicants that overall did well not only in their datascience courses but in college overall. I think that most companies want someone who can be good a multiple things and GPA may reflect that to some degree. 

Then I made another filter that specified that in at least 2 classes, applicants had to get at least a 90. This criteria was made to see if applicants could do exceedingly well in multiple courses. This did not specify which courses, just that in two or more courses they needed to get a 90. 

My final criteria specified that applicants, when you looked at the average of all of their classes, they had to get 85. I also weighted the last three classes so that their grade was 10% more impactful than the first three classes. The reason I chose this was because I figured the later classes you take are not only more specialized but also more difficult, so if you do well on them then it should count more. Furthermore, they are older at this point so perhaps their recent class work may be more indicative of how they would perform at this company. 

2. Roughly what percentage of applicants does your algorithm pass on as finalists? Is that enough? If Moogle asked you to take a more aggressive approach with your algorithm, are there any tradeoffs?

Only about 28% of the original applicants made it through and that might be on the low side. However most companies only want a small percentage of the original applicant pool so it may be fairly reflective of how people are weeded out. I think because this algorithm is aggressive it does lose a lot of nuance. I think this alorithm really only favors applicants who were able to maintain high grades accross college consistently without a lot of wiggle room. I think while you will get people who on paper did well in school you might have to wonder if these grades are genuine and consider whether grades are the only thing you are looking for in an applicant. 

___
While our data seemed to perfectly reported and without any inconsitencies, the world is less perfect. Consider the following scenarios: 

___
*Story 1*: Misread the Instructions
What if an excellent applicant thinks they should put in letter grades?

`[‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’]`

. . . or how about their grades on 4-point scale?

`[4, 3.9, 4, 4, 3.95, 4, 3.9]`

___
*Story 2*: Bad Assumptions
What if one of your applicants skipped Intro to Computer Science? When they saw your form, they froze, and decided that putting -1 in the input field would make it obvious. . .

`[-1, 95, 99, 94, 96, 98, 95]`

___
*Story 3*: Mistake in the Input
What if one of your applicants accidentally put in a number > 100?

`[681, 68, 73, 70, 81, 91, 59]`

That might seem easy enough for a program to catch, but what if they accidentally dropped a 0?

`[100, 100, 100, 100, 100, 100, 10]`

A person would catch that mistake easily, does your algorithm?

___
*Story 4*: The Awful Semester
What if your applicant had a medical emergency one semester? Or a personal tragedy?

`[95, 93, 50, 91, 98, 90, 90]`

___
*Story 5*: Inverse Trajectories
What if one of your applicants came from an underprivileged background and really struggled at the beginning of college. . . but showed extraordinary growth by the end?

`[65, 75, 85, 95, 100, 100, 80]`

What if one of your applicants came to college with extraordinary potential? They easily aced their first few classes and then gradually grew apathetic about their education - getting nothing but barely-passing grades by the time they were a senior?

`[100, 100, 95, 85, 75, 65, 80]`

Does your algorithm treat them equally?

___

Complete the following questions reflecting on these scenarios:

3. What systemic advantages/disadvantages are your algorithms likely to amplify?

My algorithm likely only pulled applicants who consistently got high grades throughout college. This means that those who have the luxury to go to better funded colleges or not worry about part times jobs would likely succeed more in this algorithm because they would have a leg up in getting better grades and more time to dedicate to their classes. So my algorithm likely amplifies economic advantages both from the stance of better schools but also gives an advantage to students who are able to focus soley on school/afford tutors in comparison to those who may be juggling more responsibilites. Since grades are the only threshold, the algorithm is assuming that those with good grades will automatically make for a better candidate. This assumption is also biased because we know that grades are not the only metric for considering how successful a person will be at a particular job. 


4. What does it mean to design a fair algorithm?

In all honesty, I don't think there is a way to fully design a fair algorithm because a computer program cannot acertain things like context. I think even algorithms that try to be more inclusive can still fall short simply because it is difficult to account for every type of person. Algorithms that try to account for both academic achievement and social circumstance are a good starting place, but ultimately it should be up to people to make some of the more crucial decisions. As noted in the article written by Mann and O'Neil, the hiring process should be overseen by an algorithm-informed person who can make sure bias is not being perpetuated in the algorithm and being more mindful of how the algorithm is sorting applicants. The combination of both human and computer labor is likely needed to lessen inherent bias in the hiring process. 

5. If you had access to additional data beyond grades (e.g., extracurricular activities, internships, letters of recommendation), how might you incorporate it into your selection process? Would it make your algorithm fairer or introduce new biases?

I think that if you had more information, especially regarding not only extracurricular activies but things like part time jobs and other potential pressures it could be more fair. I think information beyond grades would be helfpul to incorportate early in the process, even if it is more tedious for job recruiters, because it allows you to see the context of an applicant. You can consider what other responsibilities they have and how they may fair when handling different tasks. Though I do think that this could introduce more bias depending on how much you weight certain activies. For example, those who can afford to do non-paid internships that may be more prestigious may look more impressive than someone who had to work at a retail store over the summer to make enough money for the next semester. In that way, extracurriculars and jobs can still be discriminatory. 

6. How do current hiring filter algorithms work? What problems do they encounter? How do these algorithms broadly compare to the ones we wrote today? (Some example articles discussing hiring filters linked below in citations but you're not limited to these examples) 

It seems that current hiring algorithms are typically trained on previous hiring criteria. This can be problematic because often times companies already have established hiring biases that are only applified as they accidentally train their computer alogrithm to replicate these same biases. For example, the Amazon hiring being biased again female applicants because of Amazon's historic male-dominated hiring practices. This is also common with race as well, with non-white sounding names being subject to bias by hiring algorithms because of historic biased hiring practices of only hiring white people. I think these algorithms and the one created in class have some similarities especially regarding concrete decision making. The algorithm created in class and for homework soley focuses on academic performance which in itself can be biased. Furthermore, like the algorithms used by various companies, our algorithm has no ability to consider other contextual things like other jobs, activities, or even letters of recommendation, which means that you are only choosing people based on a few numbers, which introduces new biases as discussed previously. 


Citations: 

Stacy A. Doore, Casey Fiesler, Michael S. Kirkpatrick, Evan Peck, and Mehran Sahami. 2020. Assignments that Blend Ethics and Technology. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ‘20). Association for Computing Machinery, New York, NY, USA, 475–476. DOI:https://doi.org/10.1145/3328778.3366994

[Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G/)

[Hiring Algorithms Are Not Neutral](https://hbr.org/2016/12/hiring-algorithms-are-not-neutral)

[Can an Algorithm Hire Better Than a Human?](https://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html)

[Algorithms in Hiring](https://blog.learningcollider.org/algorithms-in-hiring-6760ea8869b)

[Exploration-based algorithms can improve hiring quality and diversity](https://mitsloan.mit.edu/ideas-made-to-matter/exploration-based-algorithms-can-improve-hiring-quality-and-diversity)

[AI hiring tools may be filtering out the best job applicants](https://www.bbc.com/worklife/article/20240214-ai-recruiting-hiring-software-bias-discrimination)

[Challenges for mitigating bias in algorithmic hiring](https://www.brookings.edu/articles/challenges-for-mitigating-bias-in-algorithmic-hiring/)

