# CSUEB Data Science Club Fall 2020 Project

This project is for undergraduate and graduate students who are looking for an extracurricular project to sharpen their data science skills. The problem is based off a real mentorship program being offered by the San Francisco Professional Chapter of ALPFA in partnership with the CSUEB Student Chapter of ALPFA. The program was launched over the summer 2020 and will have recurring periodic enrollment moving forward, this project seeks to automate the matching of mentors/mentees, a process which is being done manually. The tasks will be broken up into three sections, beginning with the creation of our mock survey results below. The first task: Mentee Ranking, will be solved at our club's second live event later this semester, and the last task: Stable Matching, will be solved at our club's final event at the end of the semester. Direct any questions to info.csueb.dsc@gmail.com. Happy problem solving!

#### We begin by importing some basic packages

In [1]:
import random
import pandas as pd

This is a function to generate a list of 10 random numbers from 0 to a specified number "num":

In [2]:
def surveyCol(num):
    return [random.randint(0, num) for _ in range(10)]

This is script to create objects from a class called "Participants," these objects are structured like a dictionary with key-value pairs but will need to be converted to a "dict" type for us to perform dictionary operations on them.

In [3]:
class Participant:
    def __init__(self, name):
        self.name = name
        self.primary = surveyCol(5)
        self.ideal_match = surveyCol(5)
        self.level_of_importance = surveyCol(2)

Here is our lists of participating mentors and mentees.

In [4]:
mentor_names = ['Jose', 'Amanda', 'Francisco', 'Megan', 'Phil', 'Carla']
mentee_names = ['Chris', 'Kevin', 'Rachel', 'Monica', 'Emily', 'William']

These are two functions, the first takes dictionary structured objects converts them to a dictionary format, and will be called in our second function. "surveyGroup" takes a list of strings as an argument and creates a "Participant" object from each string, and calls the "convert" function on each object. A new list of dictionaries is returned.

In [5]:
def convert(dict):
    dict = dict.__dict__
    return dict

def surveyGroup(list):
    user_list = []
    for i in range(len(list)):
        user_list.append(convert(Participant(list[i])))
    return user_list

Here we pass our lists of participating mentor and mentee names to the above functions and get our new lists with each name, primary survey answers, ideal matches survey answers and a level of importance survey responses as keys with their respective values.

In [6]:
mentors = surveyGroup(mentor_names)
mentees = surveyGroup(mentee_names)

Here we print out our newly created lists

In [7]:
for mentor in mentors:
    print(mentor)

{'name': 'Jose', 'primary': [2, 1, 3, 0, 1, 3, 5, 5, 4, 3], 'ideal_match': [1, 1, 3, 2, 2, 0, 5, 0, 5, 0], 'level_of_importance': [0, 1, 2, 2, 1, 2, 2, 0, 1, 2]}
{'name': 'Amanda', 'primary': [1, 1, 2, 5, 5, 2, 0, 4, 4, 3], 'ideal_match': [0, 0, 2, 5, 1, 1, 3, 5, 3, 3], 'level_of_importance': [1, 2, 2, 2, 2, 0, 2, 2, 1, 0]}
{'name': 'Francisco', 'primary': [0, 0, 1, 5, 2, 2, 4, 0, 2, 2], 'ideal_match': [5, 4, 4, 0, 4, 1, 1, 3, 3, 3], 'level_of_importance': [2, 0, 1, 1, 0, 0, 2, 2, 0, 0]}
{'name': 'Megan', 'primary': [5, 1, 0, 3, 2, 2, 1, 3, 1, 1], 'ideal_match': [5, 0, 2, 2, 2, 2, 0, 0, 0, 2], 'level_of_importance': [2, 0, 0, 1, 0, 1, 0, 1, 0, 1]}
{'name': 'Phil', 'primary': [1, 0, 5, 0, 5, 4, 5, 2, 3, 0], 'ideal_match': [1, 3, 4, 4, 3, 4, 3, 2, 2, 3], 'level_of_importance': [2, 2, 2, 1, 2, 1, 2, 2, 2, 1]}
{'name': 'Carla', 'primary': [4, 5, 2, 3, 5, 5, 2, 2, 3, 0], 'ideal_match': [1, 2, 0, 1, 4, 3, 3, 3, 5, 2], 'level_of_importance': [1, 1, 2, 0, 2, 0, 0, 1, 1, 0]}


In [8]:
for mentee in mentees:
    print(mentee)

{'name': 'Chris', 'primary': [5, 4, 4, 3, 4, 4, 1, 2, 1, 4], 'ideal_match': [3, 3, 4, 2, 2, 3, 0, 3, 5, 5], 'level_of_importance': [0, 1, 2, 1, 0, 2, 1, 2, 2, 2]}
{'name': 'Kevin', 'primary': [5, 1, 3, 0, 4, 1, 2, 5, 3, 0], 'ideal_match': [4, 0, 5, 5, 2, 5, 2, 4, 0, 1], 'level_of_importance': [0, 2, 2, 1, 0, 2, 0, 2, 1, 1]}
{'name': 'Rachel', 'primary': [5, 2, 0, 5, 1, 1, 2, 1, 3, 5], 'ideal_match': [2, 2, 1, 5, 1, 4, 0, 0, 3, 2], 'level_of_importance': [2, 1, 0, 1, 0, 2, 2, 0, 1, 0]}
{'name': 'Monica', 'primary': [2, 0, 1, 3, 2, 2, 4, 2, 0, 5], 'ideal_match': [4, 4, 3, 1, 1, 4, 1, 4, 3, 3], 'level_of_importance': [0, 2, 1, 1, 2, 2, 0, 2, 0, 2]}
{'name': 'Emily', 'primary': [0, 3, 5, 1, 5, 4, 4, 0, 2, 0], 'ideal_match': [0, 0, 3, 5, 2, 2, 4, 1, 1, 0], 'level_of_importance': [1, 2, 1, 2, 1, 0, 0, 0, 0, 0]}
{'name': 'William', 'primary': [0, 1, 1, 1, 2, 1, 0, 0, 4, 4], 'ideal_match': [2, 1, 1, 4, 5, 4, 3, 0, 1, 5], 'level_of_importance': [1, 2, 1, 0, 0, 2, 0, 1, 1, 2]}


#### In this step we want to convert our list items(dictionaries), to data frames to make them easier to work with in performing analysis. This is a problem because the first key is not like the others in that it is a string and not a list of 10 integers. We remove it with the pop() function.

In [9]:
for mentor in mentors:
    mentor.pop('name')

Now we must assign the corresponding survey response key-value pairs to a variable with the participating mentor's name:

In [10]:
Jose = pd.DataFrame.from_dict(mentors[0])
Amanda = pd.DataFrame.from_dict(mentors[1])
Francisco = pd.DataFrame.from_dict(mentors[2])
Megan = pd.DataFrame.from_dict(mentors[3])
Phil = pd.DataFrame.from_dict(mentors[4])
Carla = pd.DataFrame.from_dict(mentors[5])

We repeat this process for the mentees:

In [11]:
for mentee in mentees:
    mentee.pop('name')

In [12]:
Chris = pd.DataFrame.from_dict(mentees[0])
Kevin = pd.DataFrame.from_dict(mentees[1])
Monica = pd.DataFrame.from_dict(mentees[2])
Rachel = pd.DataFrame.from_dict(mentees[3])
Emily = pd.DataFrame.from_dict(mentees[4])
William = pd.DataFrame.from_dict(mentees[5])

Finally we create a list of data frames to make parsing through them for analysis more efficient:

In [13]:
df_mentors = [Jose, Amanda, Francisco, Megan, Phil, Carla]
df_mentees = [Chris, Kevin, Monica, Rachel, Emily, William]

### Task 1: Create a compatibility ranking system for mentors & mentees and return a dictionary with the name of each mentor as the value and a sorted list of mentees matched from most compatible to least compatible. 

In [None]:
#Your code here
#Tip: Use the geometric mean of the mentor/mentee survey scores to determine compatibility score 
# used for ranking potential matches.

In [14]:
print(Jose['primary'])

0    2
1    1
2    3
3    0
4    1
5    3
6    5
7    5
8    4
9    3
Name: primary, dtype: int64


In [15]:
print(Chris['ideal_match'])

0    3
1    3
2    4
3    2
4    2
5    3
6    0
7    3
8    5
9    5
Name: ideal_match, dtype: int64


This function takes two dataframes as arguements, the first of which is whose 'ideal_match' we will match against the second's 'primary' column. We then return a new list of values according to the matches or near matches from column value comparison: 1 if the values match exactly and 0.5 if they are off by 1.

In [16]:
def listComp(df1, df2):
    n = 2
    list_match = []
    l_of_imp = []
    list_of_diffs = df1.ideal_match - df2.primary
    for i in list_of_diffs:
        if abs(i) == 0:
            list_match.append(1)
        elif abs(i) == 1:
            list_match.append(0.5)
        else:
            list_match.append(0)
            
    for j in df1.level_of_importance:
        if j == 2:
            l_of_imp.append(n**2)
        elif j == 1:
            l_of_imp.append(n)
        else:
            l_of_imp.append(0)
    return [list_match[k] * l_of_imp[k] for k in range(len(list_match))]
    

In [17]:
# Testing the function
listComp(Jose, Chris)

[0, 0, 2.0, 2.0, 0, 0, 0, 0, 0, 0]

In [18]:
listComp(Chris, Jose)

[0.0, 0, 2.0, 0, 0.0, 4, 0, 0, 2.0, 0]

This function takes two lists as arguements and returns their geometric mean, or the squared root of the product of the two lists.

In [19]:
def geoMean(list1, list2):
    match_score1 = sum(list1)/len(list1)
    match_score2 = sum(list2)/len(list2)
    return (match_score1*match_score2)**0.5

In [20]:
# Here we test the fucntion
geoMean(listComp(Jose, Chris), listComp(Chris, Jose))

0.5656854249492381

This function takes a dataframe as an arguement and returns the name as a string.

In [21]:
def get_df_name(df):
    name = [x for x in globals() if globals()[x] is df][0]
    return name

This function takes two dataframes as arguements combining the previous functions we have created and returns a dictionary with the first dataframe's matches sorted in descending order.

In [22]:
def matching(df_list1, df_list2):
    dict1 = {}
    dict2 = {}
    for i in df_list1:
        for j in df_list2:
            dict2[get_df_name(j)] = round(geoMean(listComp(i, j), listComp(j, i)), 2)
        dict1[get_df_name(i)] = sorted(dict2.items(), key=lambda x: x[1], reverse=True)
    return dict1

Here we test the function for the list of mentors matched to mentees and vice versa.

In [23]:
optimal_mentor_matches = matching(df_mentors, df_mentees)

In [24]:
optimal_mentee_matches = matching(df_mentees, df_mentors)

In [25]:
optimal_mentor_matches

{'Jose': [('Rachel', 1.02),
  ('William', 0.85),
  ('Kevin', 0.69),
  ('Emily', 0.63),
  ('Chris', 0.57),
  ('Monica', 0.57)],
 'Amanda': [('Monica', 1.1),
  ('Kevin', 0.98),
  ('Rachel', 0.95),
  ('William', 0.73),
  ('Emily', 0.63),
  ('Chris', 0.0)],
 'Francisco': [('Kevin', 0.79),
  ('Chris', 0.49),
  ('Emily', 0.49),
  ('William', 0.46),
  ('Monica', 0.42),
  ('Rachel', 0.28)],
 'Megan': [('Chris', 0.63),
  ('Kevin', 0.59),
  ('William', 0.53),
  ('Monica', 0.42),
  ('Rachel', 0.35),
  ('Emily', 0.35)],
 'Phil': [('Chris', 1.04),
  ('Kevin', 0.94),
  ('Emily', 0.89),
  ('Monica', 0.85),
  ('Rachel', 0.74),
  ('William', 0.59)],
 'Carla': [('Kevin', 0.5),
  ('Monica', 0.49),
  ('Rachel', 0.45),
  ('Chris', 0.39),
  ('William', 0.39),
  ('Emily', 0.2)]}

In [26]:
optimal_mentee_matches

{'Chris': [('Phil', 1.04),
  ('Megan', 0.63),
  ('Jose', 0.57),
  ('Francisco', 0.49),
  ('Carla', 0.39),
  ('Amanda', 0.0)],
 'Kevin': [('Amanda', 0.98),
  ('Phil', 0.94),
  ('Francisco', 0.79),
  ('Jose', 0.69),
  ('Megan', 0.59),
  ('Carla', 0.5)],
 'Monica': [('Amanda', 1.1),
  ('Phil', 0.85),
  ('Jose', 0.57),
  ('Carla', 0.49),
  ('Francisco', 0.42),
  ('Megan', 0.42)],
 'Rachel': [('Jose', 1.02),
  ('Amanda', 0.95),
  ('Phil', 0.74),
  ('Carla', 0.45),
  ('Megan', 0.35),
  ('Francisco', 0.28)],
 'Emily': [('Phil', 0.89),
  ('Jose', 0.63),
  ('Amanda', 0.63),
  ('Francisco', 0.49),
  ('Megan', 0.35),
  ('Carla', 0.2)],
 'William': [('Jose', 0.85),
  ('Amanda', 0.73),
  ('Phil', 0.59),
  ('Megan', 0.53),
  ('Francisco', 0.46),
  ('Carla', 0.39)]}

### Task 2: Based on the sorted list of potential matches pair every mentor with their best available mentee match.

In [None]:
#Your code here
#Tip: Use the "Stable Matching" Algorithm.