<a href="https://colab.research.google.com/github/SneakyAsH311/Mentor-Match/blob/master/MentorMatch.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Mentor Match:
Using a data-driven approach to match mentors to students with similiar interests will improve the mentorship program!







# Introduction:
Mentoring is a crucial component of the P-TECH 9-14 model. Through mentoring, industry professionals are invited into the school
community. They offer students with meaningful academic, workplace learning and social/emotional support. Likewise, mentoring gives
students with an adult role model and a guide who works in the field they are studying. Mentors can also provide emotional support,
encouragement, and meaningful feedback on coursework.

## Problem:
While mentors have an important role in the p-tech model, the way mentors are chosen for students has quite a few problems. Mentors are matched almost completely randomly and are matched manually by the program manager. This is not only very time consuming for the program manager, but it isn't a scalable solution for the inevitable expansion of P-TECH. Through my experience, and my conversations with both Students and Mentors, I have found that the mentors assigned to students often have little in common with the students. Through my research, I have found that this causes unfulfilling mentor-student relationships and reduce overall engagement from both parties.

### Problem(Summarized):
The method of matching mentors at P-TECH is unscalable and can be improved. Poorly matched mentors are  having a negative effect in the mentor-mentee relationships.

##Solution:

Using a data-driven approach to match mentors to students with similiar interests will improve the quality of the mentorship program! Giving students mentors who have similiar interests is much more intuitive for a relationship that has major focus on guidance. Also an algorithmic approach to this save program manager valueable time!

The steps to a better mentor matching framework are outlined below:



1.   Survey
2.   Matching algorithm
3.   Happy mentor and students :)



In [0]:
%pip install names
import pandas as pd
import numpy as np
import names

Collecting names
[?25l  Downloading https://files.pythonhosted.org/packages/44/4e/f9cb7ef2df0250f4ba3334fbdabaa94f9c88097089763d8e85ada8092f84/names-0.3.0.tar.gz (789kB)
[K    100% |████████████████████████████████| 798kB 22.9MB/s 
[?25hBuilding wheels for collected packages: names
  Building wheel for names (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/f9/a5/e1/be3e0aaa6fa285575078fa2aafd9959b45bdbc8de8a6803aeb
Successfully built names
Installing collected packages: names
Successfully installed names-0.3.0


# Step 1: Survey

*This is probably the most important step*

We need to ask every person(Students and Mentors) the right questions regarding what we want to get out of the mentor-mentee relationship. If it is career advice career focused questions might be the best andif it is school advice subject-focused questions might be best! 

 

Create student and mentor datasets in the form of:
{Name : [rating1, rating2, rating3, rating4, rating5, ... ratingN]} for N categories 

## Example survey questions:
### Enter name: Abdullah Saleh
### How interested are you in the following topics in a scale from 1-5; 1 being least interested, 5 being the most:

1. Computer Science: 5

2. Buisness: 5

3. History: 1

4. Design: 3

5. Biology: 3

6. Physics: 5

7. Chemistry: 4









---

## output: {Abdullah Saleh: [5,5,1,3,3,5,4]}








---

# Mock Data:
random names - random scores

In this example we will create mock data for 300 mentors and 300 students!


In [0]:

# Create an array of size 300 for mentors and 600 for students with a random int value between 1 and 5
mentor = np.random.randint(1,6,size=[300,5])
student = np.random.randint(1,6,size=[600,5])

def create_dataset(data):
    dataset = {}
    for i in data:
        dataset[names.get_full_name()] = i
    return dataset
mentor_dataset = create_dataset(mentor)
student_dataset = create_dataset(student)
mentor_dataset


    

{'Aaron Gomes': array([2, 3, 3, 5, 2]),
 'Aaron Saunders': array([5, 4, 3, 1, 3]),
 'Adam Butler': array([2, 2, 2, 2, 4]),
 'Aileen Newman': array([1, 1, 2, 2, 1]),
 'Alice Gates': array([4, 5, 3, 2, 3]),
 'Alice Tanner': array([2, 1, 1, 2, 2]),
 'Alvin Buford': array([5, 2, 3, 5, 5]),
 'Alvin Carlyle': array([2, 2, 2, 4, 2]),
 'Alvina Plues': array([4, 2, 1, 5, 4]),
 'Amy Williams': array([4, 1, 1, 2, 5]),
 'Andrew Cooper': array([3, 1, 4, 1, 5]),
 'Ann Martinez': array([2, 3, 1, 3, 1]),
 'Ann Rosenthal': array([5, 3, 1, 4, 2]),
 'Anna Baker': array([3, 1, 4, 4, 1]),
 'Antonio Macias': array([4, 4, 5, 4, 1]),
 'Arthur Clark': array([3, 1, 3, 4, 2]),
 'Ashley Abernethy': array([3, 2, 5, 4, 3]),
 'Ashley Martone': array([5, 3, 4, 3, 5]),
 'August Littlefield': array([1, 4, 2, 3, 4]),
 'Barbara Green': array([3, 4, 4, 4, 3]),
 'Benjamin Bryce': array([3, 2, 5, 1, 3]),
 'Berry Romero': array([1, 3, 5, 1, 2]),
 'Bessie Bradshaw': array([3, 5, 2, 2, 4]),
 'Bethany Guedjian': array([3, 4, 4,

# Step 2: Math, baby!
Now that we have the survey data, how can we get meaningful insight from it? If we think about the array of interest scores as a set of points that represent the persons overall interests as a **coordinate** then we can look for the closest coordinate and assume that that is representitive of the person with the closest interests!

So how do we find the closest person? 


we establishes that we should represent each person as a coordinate of their interests and from there

we can look at the distance between all the other coordinates on the and pick the one with that is closest

There are many metrics to measure distance but we will use the straight line distance from each point as ours. This is called Euclidean Distance and can be written out as:

$\sqrt{\sum_{i=1}^n (x_i-y_i)^2}$


*Feel free to try other distance or similiarity metrics*

Now that we have a way to measure the distance between each persons interest coordinate, we can find the closest person to each person by making a list of the distances of a person to every other person in the list. In this case we can record the distance from a mentor to every other student and then sort them in acending order. Since we have a sorted list of the closest students, lets call the neighbors, the closest student would be the first person in the list, the second closest would be the 2nd person on the list, and so on. In general we can find any number of the nearest neighbors this way!

In this case we only look for the closest student on the list but you can play around with the ammount of nearest neighbors you look for.


Google these for more info:
Euclidean Distance

K-Nearest-Neighbors








In [0]:
def distance(pers1, pers2):
  squared_difference = 0
  for i in range(len(pers1)):
    squared_difference += (pers1[i] - pers2[i]) ** 2
  final_distance = squared_difference ** 0.5
  return final_distance

def knn(unknown, dataset, k=1):
  distances = []
  #Looping through all points in the dataset
  for name in dataset:
    scores = dataset[name]
    distance_to_point = distance(scores, unknown)
    #Adding the distance and point associated with that distance
    distances.append([distance_to_point, name])
  distances.sort()
  #Taking only the k closest points
  neighbors = distances[0:k]
  return neighbors[0]
  
knn([5,5,5,5,5], mentor_dataset)
        
        

[1.4142135623730951, 'David Womack']

# Matching process


The matching process is quite simple. We go through the dataset of mentors and match every mentor to the student closest to them. We check if any student doesnt have a mentor and then we match the mentors again until all the students have been matched!


Steps:

make an empty dictionary recording mentors and their mentee(s);

While there are students that are not matched to a mentor:
  For each mentor:
    match them to the student that is the most similiar 


In [0]:
matched_list = {}
for i in mentor_dataset:
    matched_list[i] = []
while(len(student_dataset) != 0):
    for i in mentor_dataset:
        if len(student_dataset) != 0:
            matched_student = knn(mentor_dataset[i], student_dataset)[1]
            matched_list[i].append(matched_student)
            del student_dataset[matched_student]
        else:
            matched_list[i].append('')

print(matched_list)



        
    

{'Sarah Shafer': ['Shirley Lawrence', 'Christina Piere', 'Jon Wojcik'], 'Diane Mccants': ['Laurence France', 'Joseph Brockhaus', 'Stephen Morales'], 'Connie Watson': ['Lisa Lang', 'Elizabeth Brayman', ''], 'Robert Lorenzen': ['Michael Plowman', 'Cindy Ness', ''], 'George White': ['Don Wilburn', 'Ethel Howell', ''], 'Larry Lacroix': ['Dedra Blais', 'Luvenia Silver', ''], 'Edward Cramer': ['Annmarie Reasoner', 'Michael Dietz', ''], 'Neva Milburn': ['Sharon Castellanos', 'Otto Huerta', ''], 'Shiela Frank': ['Darlene Clifton', 'Sharon Boyd', ''], 'Shannon Troendle': ['Davis Finke', 'Denise Butler', ''], 'Greg Erdmann': ['Adrian Morris', 'Susan Stewart', ''], 'Mary Cormier': ['Joan Parks', 'Emil Morrison', ''], 'Reginald Spindler': ['Bradley Southard', 'James Cason', ''], 'Maria Overbaugh': ['Tyrone Davis', 'Ophelia Toney', ''], 'Carlos Stroud': ['Jonathan Gonzalez', 'Lidia Demery', ''], 'Eleanor Perryman': ['Patricia Luckman', 'Ophelia Kostura', ''], 'Bonnie Shute': ['Eric Williams', 'Andr

In [0]:
df = pd.DataFrame(matched_list)
# writer = pd.ExcelWriter('output.xlsx',engine='xlsxwriter')
# df.to_html()
# writer.save()
df

Unnamed: 0,Aaron Gomes,Aaron Saunders,Adam Butler,Aileen Newman,Alice Gates,Alice Tanner,Alvin Buford,Alvin Carlyle,Alvina Plues,Amy Williams,...,Tommy Harner,Tony Watkins,Trisha Salazar,Vanessa Holmes,Wayne Turbacuski,William Hsu,William Metz,William Sands,Willie Hollingsworth,Wilton Lewis
0,Neal Bratton,Mary Sexton,Thomas Mcdowell,James Benjamin,Ralph Hart,Barbara Rich,Eleanor Turner,Carmen Bentley,Donna Debartolo,Dennis Lum,...,Luis Gould,Jessie Clough,John Mccasland,Darlene Aguirre,Patrick Odom,Alberto Leverett,Eric Barnes,Esther Grotts,Joel Armstrong,Frank Castaneda
1,Gordon Stiner,Jerry Lira,Randy Gaddie,Maggie Wilson,Tara Sullivan,Claudia Thomas,Salvador Espino,Monica Ronson,Mary Klena,Glenn Pullen,...,Kenneth Lane,Kira Strom,Robert Tully,Cecile Whiteis,Jeffrey Dietz,Paulette Olson,Lorrie Amaya,Ruben Harris,Thomas Trim,Jose Otero
2,,,,,,,,,,,...,,,,,,,,,,


# How can you make this better?

The mentorship problem is important to P-TECH and I am curious as to how you could make this better. I invite anyone interested in this problem to improve this work or try different aproaches!

There is ample area for improvement in the mentor program and I am excited to see other approaches to this aswell.

Feel free to play wit this notebook! Play with some of the distance or similiarity metrics to see if you can get better results. 


