# CSUEB Data Science Club Fall 2020 Project

This project is for undergraduate and graduate students who are looking for an extracurricular project to sharpen their data science skills. The problem is based off a real mentorship program being offered by the San Francisco Professional Chapter of ALPFA in partnership with the CSUEB Student Chapter of ALPFA. The program was launched over the summer 2020 and will have recurring periodic enrollment moving forward, this project seeks to automate the matching of mentors/mentees, a process which is being done manually. The tasks will be broken up into three sections, beginning with the creation of our mock survey results below. The first task: Mentee Ranking, will be solved at our club's second live event later this semester, and the last task: Stable Matching, will be solved at our club's final event at the end of the semester. Direct any questions to info.csueb.dsc@gmail.com. Happy problem solving!

#### We begin by importing some basic packages

In [2]:
import random
import pandas as pd

This is a function to generate a list of 10 random numbers from 0 to a specified number "num":

In [3]:
def surveyCol(num):
    return [random.randint(0, num) for _ in range(10)]

This is script to create objects from a class called "Participants," these objects are structured like a dictionary with key-value pairs but will need to be converted to a "dict" type for us to perform dictionary operations on them.

In [4]:
class Participant:
    def __init__(self, name):
        self.name = name
        self.primary = surveyCol(5)
        self.ideal_match = surveyCol(5)
        self.level_of_importance = surveyCol(2)

Here is our lists of participating mentors and mentees.

In [5]:
mentor_names = ['Jose', 'Amanda', 'Francisco', 'Megan', 'Phil', 'Carla']
mentee_names = ['Chris', 'Kevin', 'Rachel', 'Monica', 'Emily', 'William']

These are two functions, the first takes dictionary structured objects converts them to a dictionary format, and will be called in our second function. "surveyGroup" takes a list of strings as an argument and creates a "Participant" object from each string, and calls the "convert" function on each object. A new list of dictionaries is returned.

In [6]:
def convert(dict):
    dict = dict.__dict__
    return dict

def surveyGroup(list):
    user_list = []
    for i in range(len(list)):
        user_list.append(convert(Participant(list[i])))
    return user_list

Here we pass our lists of participating mentor and mentee names to the above functions and get our new lists with each name, primary survey answers, ideal matches survey answers and a level of importance survey responses as keys with their respective values.

In [7]:
mentors = surveyGroup(mentor_names)
mentees = surveyGroup(mentee_names)

Here we print out our newly created lists

In [8]:
for mentor in mentors:
    print(mentor)

{'name': 'Jose', 'primary': [1, 3, 2, 3, 4, 2, 1, 3, 5, 0], 'ideal_match': [1, 4, 4, 2, 4, 3, 4, 1, 5, 1], 'level_of_importance': [1, 2, 2, 0, 2, 1, 0, 0, 0, 1]}
{'name': 'Amanda', 'primary': [1, 1, 1, 2, 4, 2, 2, 1, 2, 5], 'ideal_match': [0, 3, 5, 4, 5, 4, 5, 2, 5, 4], 'level_of_importance': [1, 1, 1, 0, 1, 2, 2, 1, 0, 1]}
{'name': 'Francisco', 'primary': [2, 2, 1, 4, 1, 1, 4, 1, 5, 2], 'ideal_match': [0, 4, 5, 1, 0, 1, 0, 4, 3, 0], 'level_of_importance': [2, 0, 2, 1, 2, 2, 1, 2, 2, 0]}
{'name': 'Megan', 'primary': [3, 1, 2, 3, 0, 2, 5, 4, 3, 3], 'ideal_match': [4, 5, 1, 5, 4, 3, 0, 3, 5, 0], 'level_of_importance': [1, 1, 1, 2, 2, 0, 0, 1, 1, 2]}
{'name': 'Phil', 'primary': [1, 2, 1, 5, 5, 1, 3, 4, 5, 2], 'ideal_match': [2, 1, 4, 1, 3, 1, 1, 3, 1, 3], 'level_of_importance': [0, 0, 1, 1, 1, 0, 0, 1, 1, 0]}
{'name': 'Carla', 'primary': [3, 5, 0, 4, 0, 0, 3, 5, 0, 1], 'ideal_match': [0, 5, 3, 0, 3, 4, 4, 2, 5, 1], 'level_of_importance': [0, 2, 2, 2, 1, 1, 1, 2, 2, 1]}


In [9]:
for mentee in mentees:
    print(mentee)

{'name': 'Chris', 'primary': [0, 0, 5, 1, 4, 2, 4, 1, 0, 5], 'ideal_match': [1, 4, 3, 4, 3, 4, 2, 3, 5, 4], 'level_of_importance': [0, 0, 1, 1, 0, 2, 0, 1, 0, 0]}
{'name': 'Kevin', 'primary': [0, 0, 2, 1, 1, 5, 2, 3, 5, 4], 'ideal_match': [0, 0, 2, 1, 2, 3, 2, 5, 2, 2], 'level_of_importance': [2, 0, 1, 2, 2, 2, 2, 0, 2, 0]}
{'name': 'Rachel', 'primary': [1, 3, 4, 2, 0, 0, 5, 3, 0, 5], 'ideal_match': [3, 0, 5, 5, 2, 5, 0, 4, 5, 2], 'level_of_importance': [1, 2, 2, 2, 0, 1, 1, 0, 0, 0]}
{'name': 'Monica', 'primary': [0, 0, 1, 3, 5, 1, 0, 1, 1, 1], 'ideal_match': [3, 4, 1, 4, 2, 2, 0, 2, 2, 1], 'level_of_importance': [1, 1, 2, 1, 0, 2, 1, 0, 0, 0]}
{'name': 'Emily', 'primary': [1, 0, 1, 1, 4, 0, 1, 4, 3, 0], 'ideal_match': [5, 1, 5, 0, 1, 5, 2, 5, 4, 4], 'level_of_importance': [1, 0, 1, 0, 1, 2, 1, 1, 2, 1]}
{'name': 'William', 'primary': [3, 2, 1, 0, 5, 1, 2, 2, 5, 5], 'ideal_match': [0, 3, 5, 2, 5, 1, 1, 4, 5, 5], 'level_of_importance': [2, 0, 1, 1, 2, 0, 2, 2, 0, 2]}


#### In this step we want to convert our list items(dictionaries), to data frames to make them easier to work with in performing analysis. This is a problem because the first key is not like the others in that it is a string and not a list of 10 integers. We remove it with the pop() function.

In [10]:
for mentor in mentors:
    mentor.pop('name')

Now we must assign the corresponding survey response key-value pairs to a variable with the participating mentor's name:

In [11]:
Jose = pd.DataFrame.from_dict(mentors[0])
Amanda = pd.DataFrame.from_dict(mentors[1])
Francisco = pd.DataFrame.from_dict(mentors[2])
Megan = pd.DataFrame.from_dict(mentors[3])
Phil = pd.DataFrame.from_dict(mentors[4])
Carla = pd.DataFrame.from_dict(mentors[5])

We repeat this process for the mentees:

In [12]:
for mentee in mentees:
    mentee.pop('name')

In [13]:
Chris = pd.DataFrame.from_dict(mentees[0])
Kevin = pd.DataFrame.from_dict(mentees[1])
Monica = pd.DataFrame.from_dict(mentees[2])
Rachel = pd.DataFrame.from_dict(mentees[3])
Emily = pd.DataFrame.from_dict(mentees[4])
William = pd.DataFrame.from_dict(mentees[5])

Finally we create a list of data frames to make parsing through them for analysis more efficient:

In [14]:
df_mentors = [Jose, Amanda, Francisco, Megan, Phil, Carla]
df_mentees = [Chris, Kevin, Monica, Rachel, Emily, William]

### Task 1: Create a compatibility ranking system for mentors & mentees and return a dictionary with the name of each mentor as the value and a sorted list of mentees matched from most compatible to least compatible. 

In [15]:
#Your code here
#Tip: Use the geometric mean of the mentor/mentee survey scores to determine compatibility score 
# used for ranking potential matches.

In [16]:
print(Jose['primary'])

0    1
1    3
2    2
3    3
4    4
5    2
6    1
7    3
8    5
9    0
Name: primary, dtype: int64


In [17]:
print(Chris['ideal_match'])

0    1
1    4
2    3
3    4
4    3
5    4
6    2
7    3
8    5
9    4
Name: ideal_match, dtype: int64


This function takes two dataframes as arguements, the first of which is whose 'ideal_match' we will match against the second's 'primary' column. We then return a new list of values according to the matches or near matches from column value comparison: 1 if the values match exactly and 0.5 if they are off by 1.

In [18]:
def listComp(df1, df2):
    n = 2
    list_match = []
    l_of_imp = []
    list_of_diffs = df1.ideal_match - df2.primary
    for i in list_of_diffs:
        if abs(i) == 0:
            list_match.append(1)
        elif abs(i) == 1:
            list_match.append(0.5)
        else:
            list_match.append(0)
            
    for j in df1.level_of_importance:
        if j == 2:
            l_of_imp.append(n**2)
        elif j == 1:
            l_of_imp.append(n)
        else:
            l_of_imp.append(0)
    return [list_match[k] * l_of_imp[k] for k in range(len(list_match))]
    

In [19]:
# Testing the function
listComp(Jose, Chris)

[1.0, 0, 2.0, 0.0, 4, 1.0, 0, 0, 0, 0]

In [20]:
listComp(Chris, Jose)

[0, 0.0, 1.0, 1.0, 0.0, 0, 0.0, 2, 0, 0]

This function takes two lists as arguements and returns their geometric mean, or the squared root of the product of the two lists.

In [21]:
def geoMean(list1, list2):
    match_score1 = sum(list1)/len(list1)
    match_score2 = sum(list2)/len(list2)
    return (match_score1*match_score2)**0.5

In [22]:
# Here we test the fucntion
geoMean(listComp(Jose, Chris), listComp(Chris, Jose))

0.5656854249492381

This function takes a dataframe as an arguement and returns the name as a string.

In [23]:
def get_df_name(df):
    name = [x for x in globals() if globals()[x] is df][0]
    return name

This function takes two dataframes as arguements combining the previous functions we have created and returns a dictionary with the first dataframe's matches sorted in descending order.

In [24]:
def matching(df_list1, df_list2):
    dict1 = {}
    dict2 = {}
    for i in df_list1:
        for j in df_list2:
            dict2[get_df_name(j)] = round(geoMean(listComp(i, j), listComp(j, i)), 2)
        dict1[get_df_name(i)] = sorted(dict2.items(), key=lambda x: x[1], reverse=True)
    return dict1

Here we test the function for the list of mentors matched to mentees and vice versa.

In [25]:
optimal_mentor_matches = matching(df_mentors, df_mentees)

In [26]:
optimal_mentee_matches = matching(df_mentees, df_mentors)

In [27]:
optimal_mentor_matches

{'Jose': [('Rachel', 0.67),
  ('Chris', 0.57),
  ('William', 0.47),
  ('Emily', 0.46),
  ('Kevin', 0.28),
  ('Monica', 0.28)],
 'Amanda': [('Kevin', 1.02),
  ('William', 0.85),
  ('Rachel', 0.63),
  ('Monica', 0.45),
  ('Emily', 0.24),
  ('Chris', 0.0)],
 'Francisco': [('Rachel', 0.95),
  ('Emily', 0.77),
  ('Monica', 0.62),
  ('Kevin', 0.55),
  ('Chris', 0.49),
  ('William', 0.0)],
 'Megan': [('Emily', 0.74),
  ('Rachel', 0.73),
  ('William', 0.63),
  ('Kevin', 0.55),
  ('Chris', 0.35),
  ('Monica', 0.28)],
 'Phil': [('Monica', 0.49),
  ('Kevin', 0.45),
  ('William', 0.45),
  ('Emily', 0.4),
  ('Rachel', 0.37),
  ('Chris', 0.32)],
 'Carla': [('Rachel', 0.53),
  ('William', 0.49),
  ('Kevin', 0.47),
  ('Monica', 0.45),
  ('Emily', 0.4),
  ('Chris', 0.37)]}

In [28]:
optimal_mentee_matches

{'Chris': [('Jose', 0.57),
  ('Francisco', 0.49),
  ('Carla', 0.37),
  ('Megan', 0.35),
  ('Phil', 0.32),
  ('Amanda', 0.0)],
 'Kevin': [('Amanda', 1.02),
  ('Francisco', 0.55),
  ('Megan', 0.55),
  ('Carla', 0.47),
  ('Phil', 0.45),
  ('Jose', 0.28)],
 'Monica': [('Francisco', 0.62),
  ('Phil', 0.49),
  ('Amanda', 0.45),
  ('Carla', 0.45),
  ('Jose', 0.28),
  ('Megan', 0.28)],
 'Rachel': [('Francisco', 0.95),
  ('Megan', 0.73),
  ('Jose', 0.67),
  ('Amanda', 0.63),
  ('Carla', 0.53),
  ('Phil', 0.37)],
 'Emily': [('Francisco', 0.77),
  ('Megan', 0.74),
  ('Jose', 0.46),
  ('Phil', 0.4),
  ('Carla', 0.4),
  ('Amanda', 0.24)],
 'William': [('Amanda', 0.85),
  ('Megan', 0.63),
  ('Carla', 0.49),
  ('Jose', 0.47),
  ('Phil', 0.45),
  ('Francisco', 0.0)]}

### Task 2: Based on the sorted list of potential matches pair every mentor with their best available mentee match.

In [29]:
#Your code here
#Tip: Use the "Stable Matching" Algorithm.

We can start by looking at the first indexed value of our first mentor.

In [30]:
print(optimal_mentor_matches['Jose'][0])

('Rachel', 0.67)


Now we can create a list of all the possible matches, 36 in all. However, though they are sorted, they are sorted according to each individual mentor.

In [31]:
for key, value in optimal_mentor_matches.items():
    for k, v in value:
        print(key, k, v)

Jose Rachel 0.67
Jose Chris 0.57
Jose William 0.47
Jose Emily 0.46
Jose Kevin 0.28
Jose Monica 0.28
Amanda Kevin 1.02
Amanda William 0.85
Amanda Rachel 0.63
Amanda Monica 0.45
Amanda Emily 0.24
Amanda Chris 0.0
Francisco Rachel 0.95
Francisco Emily 0.77
Francisco Monica 0.62
Francisco Kevin 0.55
Francisco Chris 0.49
Francisco William 0.0
Megan Emily 0.74
Megan Rachel 0.73
Megan William 0.63
Megan Kevin 0.55
Megan Chris 0.35
Megan Monica 0.28
Phil Monica 0.49
Phil Kevin 0.45
Phil William 0.45
Phil Emily 0.4
Phil Rachel 0.37
Phil Chris 0.32
Carla Rachel 0.53
Carla William 0.49
Carla Kevin 0.47
Carla Monica 0.45
Carla Emily 0.4
Carla Chris 0.37


What we are going to need a method to sort all of the possible matches, so we can write a function to append all the possible matches from the nested dictionaries to one large master list. 

In [36]:
list1 = []
for key, value in optimal_mentor_matches.items():
    for k, v in value:
        for i in k:
            list2 = []
            list2.append(v)
            list2.append(key)
            list2.append(k)
        list1.append(list2)

Now we can simply run the sort() function on the list.

In [37]:
list1.sort(key = lambda i: i[0], reverse=True)
list1

[[1.02, 'Amanda', 'Kevin'],
 [0.95, 'Francisco', 'Rachel'],
 [0.85, 'Amanda', 'William'],
 [0.77, 'Francisco', 'Emily'],
 [0.74, 'Megan', 'Emily'],
 [0.73, 'Megan', 'Rachel'],
 [0.67, 'Jose', 'Rachel'],
 [0.63, 'Amanda', 'Rachel'],
 [0.63, 'Megan', 'William'],
 [0.62, 'Francisco', 'Monica'],
 [0.57, 'Jose', 'Chris'],
 [0.55, 'Francisco', 'Kevin'],
 [0.55, 'Megan', 'Kevin'],
 [0.53, 'Carla', 'Rachel'],
 [0.49, 'Francisco', 'Chris'],
 [0.49, 'Phil', 'Monica'],
 [0.49, 'Carla', 'William'],
 [0.47, 'Jose', 'William'],
 [0.47, 'Carla', 'Kevin'],
 [0.46, 'Jose', 'Emily'],
 [0.45, 'Amanda', 'Monica'],
 [0.45, 'Phil', 'Kevin'],
 [0.45, 'Phil', 'William'],
 [0.45, 'Carla', 'Monica'],
 [0.4, 'Phil', 'Emily'],
 [0.4, 'Carla', 'Emily'],
 [0.37, 'Phil', 'Rachel'],
 [0.37, 'Carla', 'Chris'],
 [0.35, 'Megan', 'Chris'],
 [0.32, 'Phil', 'Chris'],
 [0.28, 'Jose', 'Kevin'],
 [0.28, 'Jose', 'Monica'],
 [0.28, 'Megan', 'Monica'],
 [0.24, 'Amanda', 'Emily'],
 [0.0, 'Amanda', 'Chris'],
 [0.0, 'Francisco', 'W

Now we write a function that takes our master list in as the only argument, creates three empty lists, and appends the top match to the list that is returned, and adds the names of the mentors and mentees that have been matched to the other empty lists. This assures us that once a participant that has been matched they cannot be matched with anyone else, and everyone gets matched according to their best available match. 

In [65]:
def stableMatch(list):
    mr_matched = []
    me_matched = []
    final_match = []
    for i in list:
        if i[1] not in mr_matched and i[2] not in me_matched:
            print(i[1], 'will be mentoring '+i[2]+', their compatibility score was',i[0])
            mr_matched.append(i[1])
            me_matched.append(i[2])

Now we pass our master list to get our best matches.

In [66]:
stableMatch(list1)

Amanda will be mentoring Kevin, their compatibility score was 1.02
Francisco will be mentoring Rachel, their compatibility score was 0.95
Megan will be mentoring Emily, their compatibility score was 0.74
Jose will be mentoring Chris, their compatibility score was 0.57
Phil will be mentoring Monica, their compatibility score was 0.49
Carla will be mentoring William, their compatibility score was 0.49


The tutorial ended here, but further analysis is possible. 