# Week 3 Activity: Reflection on a Hiring Filter Algorithm 
Activity adapted from exercise developed by [Evan Peck](https://evanpeck.github.io/)

## Scenario: Moogle’s Hiring Filter
Imagine you are working for Moogle, a well-known tech company that receives tens of thousands of job applications from graduating seniors every year. Since the company receives too many job applications for HR to individually assess in a reasonable amount of time, you are asked to create a program that algorithmically analyzes applications and selects the ones most worth passing onto HR.



### Applicant Data
It’s difficult to create these first-pass cuts, so Moogle designs their application forms to get some numerical data about their applicants’ education. Job applicants must enter the grades they received in 6 core CS courses, as well as their overall GPA. For your convenience, this will be stored in a python `list` that you can access. 

For example, a student who received the following scores. . .
- Intro to CS: 100
- Data Structures: 95
- Software Engineering: 80
- Algorithms: 89
- Computer Organization: 91
- Operative Systems: 75
- Overall GPA: 83

. . . would result in the following list: `[100, 95, 80, 89, 91, 75, 83]`. You can assume that index `0` is always Intro to CS, `1` is always Data Structures, and so on.

Because you are processing many applications, your program will receive a list of lists. For example, this would be the information for 3 applicants:

`[ [100, 95, 80, 89, 91, 75, 83], [75, 80, 85, 90, 85, 88, 90], [85, 70, 99, 100, 81, 82, 91] ]`

### Your Task
Your job is to:
1. Determine how you are going to select the top applicants to pass onto HR.
2. Given a list of applicant data (a list of lists), write code to identify a new list of worthwhile candidates.

### The Data 

Before we use the entire dataset of applications, we're going to write and test our code using a much smaller sample of the dataset. This will be saved in `sample_data` and contain only ten applicant lists. Notice how this is just a list of lists with each list being a unique applicant. 

In [12]:
sample_data = [[93, 89, 63, 88, 60, 73, 80], [100, 63, 57, 96, 58, 71, 78], [81, 91, 99, 78, 57, 87, 86], [81, 73, 100, 57, 91, 60, 66], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [55, 74, 68, 55, 69, 94, 80], [64, 77, 75, 92, 77, 72, 83], [95, 58, 92, 62, 77, 64, 59], [94, 78, 84, 83, 68, 63, 76]]

In [13]:
len(sample_data)

10

Now let's take a look at the entire dataset of 5,000 applications. This is saved in another file so we're going to go ahead and load this into memory and then take a look at the data. Notice that it's formatted exactly the same as the sample data as a list of lists!

In [14]:
from applications import *

In [15]:
applications

[[93, 89, 63, 88, 60, 73, 80],
 [100, 63, 57, 96, 58, 71, 78],
 [81, 91, 99, 78, 57, 87, 86],
 [81, 73, 100, 57, 91, 60, 66],
 [86, 89, 64, 81, 69, 93, 92],
 [78, 63, 88, 95, 59, 98, 90],
 [55, 74, 68, 55, 69, 94, 80],
 [64, 77, 75, 92, 77, 72, 83],
 [95, 58, 92, 62, 77, 64, 59],
 [94, 78, 84, 83, 68, 63, 76],
 [82, 96, 79, 89, 87, 93, 61],
 [63, 92, 79, 86, 58, 79, 69],
 [87, 73, 62, 59, 77, 94, 82],
 [92, 60, 81, 85, 61, 58, 81],
 [99, 66, 98, 60, 96, 80, 91],
 [56, 76, 76, 88, 73, 72, 91],
 [77, 75, 83, 60, 95, 75, 95],
 [55, 90, 80, 90, 78, 99, 70],
 [59, 57, 61, 69, 93, 88, 96],
 [80, 76, 91, 71, 89, 78, 59],
 [96, 66, 91, 95, 55, 77, 90],
 [68, 77, 70, 79, 59, 88, 97],
 [93, 78, 78, 71, 58, 92, 72],
 [84, 71, 69, 99, 63, 100, 67],
 [79, 81, 74, 91, 66, 89, 62],
 [80, 88, 80, 60, 81, 72, 66],
 [70, 63, 57, 88, 81, 61, 92],
 [81, 100, 86, 97, 72, 71, 58],
 [79, 99, 70, 72, 76, 66, 70],
 [94, 97, 68, 55, 88, 91, 70],
 [66, 89, 97, 66, 90, 71, 100],
 [61, 55, 76, 56, 59, 73, 71],
 [5

In [16]:
len(applications)

5000

### Algorithms 

Now we're going to write algorithms to select applicants based on a variety of decisions. For each one, take note of how many applicants are passed onto the next stage of the application process. Think through the decisions being made and how these decisions might enforce or reduce systemic and cultural oppression. 

1. Selects applicants that have an overall GPA above 80
2. Selects applicants that have no grade below 65
3. Selects applicants that have at least 5 grades above 80
4. Selects applicants that have an average of the six classes above 85
5. Your own algorithm to select applicants 

In [17]:
selection = list()

for app in sample_data: 
    if app[6] > 80:
        selection += [app]

print(len(selection))
print(selection)

4
[[81, 91, 99, 78, 57, 87, 86], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [64, 77, 75, 92, 77, 72, 83]]


How many applicants did we select with this filter? What percentage of our total sample size is that?  

Let's try another potential algorithm where we select applicants that have no grade below a 65. 

In [18]:
selection = list() ## create a list to hold the selected applicants


for app in sample_data: ## loops through every applicant in sample data 
    above65 = True ## check if there is a grade below 65
    for grade in app[0:6]: ## doesn't check gpa
        if grade < 65:
            above65 = False
    if above65:
        selection.append(app)

print(len(selection))
print(selection)

0
[]


Now let's try to create a filter for selecting applicants that have at least five grades above 85. **Hint**: A counter can be useful here! 

In [19]:
selection = list() ## create a list to hold the selected applicants

for app in sample_data:
    above85 = 0
    for grade in app[0:6]:
        if grade > 85:
            above85 += 1
    if above85 >= 5:
        selection.append(app)

print(len(selection))
print(selection)
    

0
[]


Finally, let's write a algorithm that selects applicants if they have an average grade of at least 85 across their six classes. 

In [25]:
selection = list() ## create a list to hold the selected applicants

for app in sample_data:
    gradeSum = 0 ## stores six grades added together
    for grade in app[0:6]:
        gradeSum += grade
    avg = gradeSum/6 ## calculate average grade
    if avg >= 85:
        selection += [app]

print(len(selection))
print(selection)

0
[]


In the space below, work in group to decide what types of criteria you want to use to write an algorithm to select applicants. First test it with the sample data then run it with the entire set of applicants.  

A useful piece of code that will give you the percentage of applicants you kept is: 

`print("Your algorithm kept", round(len(selection)/len(applications)*100), "percent of applicants")`

In [29]:
selection = list() ## create a list to hold the selected applicants

for app in applications:
    gradeSum = 0 ## stores six grades added together
    minGrade = 100 ## stores an app's lowest grade
    for grade in app[0:6]: ## calculate min grade
        if grade < minGrade:
            minGrade = grade
    minGradeFound = False ## for case where there's 2 or more equal min grades
    for grade in app[0:6]: ## sum up grades excluding the min grade
        if grade == minGrade:
            if minGradeFound == False:
                minGradeFound = True
                continue
        gradeSum += grade
    avg = gradeSum/5 ## calculate average grade for highest 5 grades
    if avg >= 80:
        selection += [app]

print(len(selection))
print("Your algorithm kept", round(len(selection)/len(applications)*100), "percent of applicants")
print(selection)

2846
Your algorithm kept 57 percent of applicants
[[93, 89, 63, 88, 60, 73, 80], [81, 91, 99, 78, 57, 87, 86], [81, 73, 100, 57, 91, 60, 66], [86, 89, 64, 81, 69, 93, 92], [78, 63, 88, 95, 59, 98, 90], [94, 78, 84, 83, 68, 63, 76], [82, 96, 79, 89, 87, 93, 61], [99, 66, 98, 60, 96, 80, 91], [77, 75, 83, 60, 95, 75, 95], [55, 90, 80, 90, 78, 99, 70], [80, 76, 91, 71, 89, 78, 59], [96, 66, 91, 95, 55, 77, 90], [93, 78, 78, 71, 58, 92, 72], [84, 71, 69, 99, 63, 100, 67], [79, 81, 74, 91, 66, 89, 62], [80, 88, 80, 60, 81, 72, 66], [81, 100, 86, 97, 72, 71, 58], [94, 97, 68, 55, 88, 91, 70], [66, 89, 97, 66, 90, 71, 100], [55, 100, 77, 93, 63, 73, 81], [70, 98, 69, 81, 84, 70, 98], [94, 78, 70, 88, 76, 93, 70], [72, 77, 91, 97, 59, 89, 68], [97, 88, 60, 63, 75, 94, 58], [92, 58, 97, 83, 84, 70, 95], [79, 64, 88, 96, 55, 82, 65], [90, 84, 55, 81, 80, 89, 93], [88, 58, 93, 98, 58, 76, 60], [60, 80, 97, 64, 98, 79, 74], [87, 100, 93, 82, 87, 81, 83], [78, 89, 85, 55, 72, 85, 75], [80, 88, 88, 

Questions to Answer: 
1. What criteria did you choose to select finalists? How did you choose that criteria?
    
    I chose to drop the lowest grade, then select applicants who's highest 5 grades had an average of 80 or higher. I chose this criteria because I wanted to give applicants that had one bad grade but otherwise did well a chance, since there are a variety of circumstances that may have caused one low grade (health problems, personal emergancy, bad professor, etc). I implemented this by first finding the lowest grade, then going through the grades again and excluding the lowest grade from the summation. I used a boolean to mark when the lowest grade was found so I did not exclude multiple classes if there were multiple equivalent lowest grades.

2. Roughly what percentage of applicants does your algorithm pass on as finalists? Is that enough? If Moogle asked you to take a more aggressive approach with your algorithm, are there any tradeoffs?

    My algorithm kept about 57% of applicants. I think this is enough because it is slightly above half, which seems reasonable to me. It also improved upon the previous algorithm that looked at the average of all six classes, which only kept around 30% of applicants when checking for an average of at least 80. If Moogle wanted a more aggressive algorithm that took fewer applicants, a potential tradeoff would be having to rely on potentially more biased metrics to determine who to exclude, leading to the rejection of qualified applicants and upholding of systems of oppression.

___
While our data seemed to perfectly reported and without any inconsitencies, the world is less perfect. Consider the following scenarios: 

___
*Story 1*: Misread the Instructions
What if an excellent applicant thinks they should put in letter grades?

`[‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’, ‘A’]`

. . . or how about their grades on 4-point scale?

`[4, 3.9, 4, 4, 3.95, 4, 3.9]`

___
*Story 2*: Bad Assumptions
What if one of your applicants skipped Intro to Computer Science? When they saw your form, they froze, and decided that putting -1 in the input field would make it obvious. . .

`[-1, 95, 99, 94, 96, 98, 95]`

___
*Story 3*: Mistake in the Input
What if one of your applicants accidentally put in a number > 100?

`[681, 68, 73, 70, 81, 91, 59]`

That might seem easy enough for a program to catch, but what if they accidentally dropped a 0?

`[100, 100, 100, 100, 100, 100, 10]`

A person would catch that mistake easily, does your algorithm?

___
*Story 4*: The Awful Semester
What if your applicant had a medical emergency one semester? Or a personal tragedy?

`[95, 93, 50, 91, 98, 90, 90]`

___
*Story 5*: Inverse Trajectories
What if one of your applicants came from an underprivileged background and really struggled at the beginning of college. . . but showed extraordinary growth by the end?

`[65, 75, 85, 95, 100, 100, 80]`

What if one of your applicants came to college with extraordinary potential? They easily aced their first few classes and then gradually grew apathetic about their education - getting nothing but barely-passing grades by the time they were a senior?

`[100, 100, 95, 85, 75, 65, 80]`

Does your algorithm treat them equally?

___

Complete the following questions reflecting on these scenarios:

3. What systemic advantages/disadvantages are your algorithms likely to amplify?

    The algorithms are likely to amplify the struggles of oppressed groups because they do take into account the story behind the data. For example, story 5 shows someone who struggled in their CS classes at first but improved over time. Drawing from my own experience, I was able to take AP computer science in high school because my socio-economic status allowed me to live in a wealthier school district that offered good CS classes. Because of that, intro CS in college was much easier for me because I already knew a lot of the topics from high school. Someone who did not have the priviledge of living in a wealthier school district would not have this advantage and may do worse in intro CS. If the algorithm only considers the raw numbers and does not drop the lowest grade like the one I designed did, it would overlook an applicant who may be equally or more skilled than someone like me.

4. What does it mean to design a fair algorithm?

    A fair algorithm should take measures to avoid reinforcing biases perpetuated by systems of oppression, ensuring that applicants are given equitable consideration. It should not advantage or disadvantage certain groups, and should also consider a wide variety of factors from the data to try and get as complete of a picture as possible.  

5. If you had access to additional data beyond grades (e.g., extracurricular activities, internships, letters of recommendation), how might you incorporate it into your selection process? Would it make your algorithm fairer or introduce new biases?

    I would incorporate it by giving higher weights to applicants with more relevant experience, internships, recommendations, etc. This could give more opportunities to those who struggle in an accademic setting but show their qualifications in other ways. However, there are also additional biases that could come into play, such as determining what kind of experience is "better" than others. Additionally, oppression plays a role in who is able to get interships and other experience. If other biased algorithms are used to select applicants for internships, then favoring those with internships reinforces this systematic oppression. An applicant may also not have time for extracurriculars/internships or be unable to afford them, which the algorithm would likely not take into consideration.

6. How do current hiring filter algorithms work? What problems do they encounter? How do these algorithms broadly compare to the ones we wrote today? (Some example articles discussing hiring filters linked below in citations but you're not limited to these examples) 

    One common type is filtering algorithms, which involve the recruiter inputting a set of traits they want in an employee, then the algorithm ranks applications based on how well they fit the criteria, and sometimes filters out applications based on knockout criteria. This is most similar to what we wrote, and like our algorithms, has the problem of being very rigid and not considering what other factors contributed to the data. Other algorithms use machine learning to score applicants by learning from past chosen applicants and their performance. However, this can perpetuate biases present in past hiring decisions and exclude oppressed groups, like Amazon's AI recruiting algorithm that taught itself to prefer male applicants.

Citations: 

Stacy A. Doore, Casey Fiesler, Michael S. Kirkpatrick, Evan Peck, and Mehran Sahami. 2020. Assignments that Blend Ethics and Technology. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (SIGCSE ‘20). Association for Computing Machinery, New York, NY, USA, 475–476. DOI:https://doi.org/10.1145/3328778.3366994

[Amazon scraps secret AI recruiting tool that showed bias against women](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G/)

[Hiring Algorithms Are Not Neutral](https://hbr.org/2016/12/hiring-algorithms-are-not-neutral)

[Can an Algorithm Hire Better Than a Human?](https://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html)

[Algorithms in Hiring](https://blog.learningcollider.org/algorithms-in-hiring-6760ea8869b)

[Exploration-based algorithms can improve hiring quality and diversity](https://mitsloan.mit.edu/ideas-made-to-matter/exploration-based-algorithms-can-improve-hiring-quality-and-diversity)

[AI hiring tools may be filtering out the best job applicants](https://www.bbc.com/worklife/article/20240214-ai-recruiting-hiring-software-bias-discrimination)

[Challenges for mitigating bias in algorithmic hiring](https://www.brookings.edu/articles/challenges-for-mitigating-bias-in-algorithmic-hiring/)

