# Pairing script for VAMP

## 1. Set up

- Load csv file into a dataframe
    - Gender, Location, and Nationality are loaded as categorical
    - The cell below is only when the python script is run locally and not in Google Collab. See next section 1.1
- Initialize variables
    - *teamNumberIndex* keeps track of which team is currently being build
    - *teamSize* is the amount of students in a team
    - *df_teams* will be the dataframe where we save the teams computed.

In [1]:
import math
import pandas as pd
import numpy as np

#The following 3 lines deals with displaying the results neatly
pd.options.display.max_columns = None
pd.options.display.max_rows = None
pd.options.display.width = None 

#Initialize variables
teamNumberIndex = 0
teamSize = 4

#Load the data (This line will fail if script is run on Google Colaboraty, see next section)
df = pd.read_csv('MOCK_DATA.csv', dtype={'Gender' : 'category', 'Location' : 'category', 'Nationality' : 'category'})

#Create a dataframe to store team results
df_teams = pd.DataFrame(columns=['first_name', 'last_name', 'Team'])

#Add a new collumn which keeps track of which student has been put into team so far
df['studentID'] = np.arange(len(df))
#del df['id']

### 1.1 Uploading files to google collab
The previous cell only loads the data using. If the script is run on Google Collab, then it is first necessary to upload the file and then load it into a data frame


In [None]:
from google.colab import files
#Upload the file
uploaded = files.upload()

In [None]:
import io
#Load the files into a dataframe
df = pd.read_csv(io.BytesIO(uploaded['MOCK_DATA.csv']), dtype={'Gender' : 'category', 'Location' : 'category', 'Nationality' : 'category'})

## 2. Understand the data

The cells below are commands use to understand the data. For cate

In [2]:
df.Age.describe()

count    100.000000
mean      22.130000
std        2.376888
min       18.000000
25%       20.000000
50%       22.000000
75%       23.000000
max       29.000000
Name: Age, dtype: float64

In [3]:
df['Location'].value_counts()

South west    23
Center        22
South east    20
North west    20
North east    15
Name: Location, dtype: int64

In [4]:
df.Location.describe()

count            100
unique             5
top       South west
freq              23
Name: Location, dtype: object

In [5]:
df.Nationality.describe()

count       100
unique       40
top       China
freq         16
Name: Nationality, dtype: object

## 3. Search for a team
The following were the criteria for a team

- Gender diversity
- Diversity in nationalities
- Student who live live near each other should be paired up

Getting students from the same location is done easily in pandas. The operation is described as removing rows (students) from the dataframe which satisfies a boolean. In other words, only consider a subset of students who are in the same location. The following python line creates a new dataframe with all of the students who are in the same location.

*df_condition = df[df.Location == 'South east']*

The other two conditions is harder to satisfy. An randomized approached will be taken where a user manually checks whether the students randomly matched is 'satisfiable.' If not, then re-run the cell and another set up students will be randomly matched up within the same location. 


In [6]:


print("There are {0} student left".format(len(df))) 

df_condition = df[df.Location == 'South east']            

    
#Select 4 random to be put into a group from the new dataframe which includes only students from South east
selectedStudents = np.random.choice(df_condition['studentID'].values, teamSize, replace = False)
team = df.loc[selectedStudents]
print("\n \n Random search gave the following potential team \n")
print(team)





There are 100 student left

 
 Random search gave the following potential team 

     first_name last_name  Gender    Location Nationality  Age  studentID
64  Mariejeanne  Lindmark  Female  South east  Kazakhstan   22         64
46       Lynett   Postans  Female  South east       China   24         46
21         Mari      Jory  Female  South east       China   22         21
20         Boyd     Ingle    Male  South east       China   23         20


## 4. Team creation

If the above team attributes looks acceptable then we will.

- Adds the team of students to our final dataframe containing all of our teams (*df_teams*)
- Remove the team of students from the dataframe object which includes all of the unteamed students



In [7]:
teamNumberIndex = teamNumberIndex + 1

#Remove students from the dataframe containing all of the unteamed students
df = df.drop(selectedStudents)

#Create a new collumn where we denote the team number
team['Team'] = [teamNumberIndex]*teamSize
#Remove uncesserary collumns from the final team output
#team = team.drop(['Gender', 'Nationality', 'Age', 'Location', 'studentID'],axis=1)


#Add the found team to our dataframe containing all of the teams
df_teams = df_teams.append(team)

print("Teams found so far \n ")
print(df_teams)

print("\n \n unteamed students \n ")
print(df)

Teams found so far 
 
     first_name last_name Team  Gender    Location Nationality   Age  \
64  Mariejeanne  Lindmark    1  Female  South east  Kazakhstan  22.0   
46       Lynett   Postans    1  Female  South east       China  24.0   
21         Mari      Jory    1  Female  South east       China  22.0   
20         Boyd     Ingle    1    Male  South east       China  23.0   

    studentID  
64       64.0  
46       46.0  
21       21.0  
20       20.0  

 
 unteamed students 
 
    first_name   last_name  Gender    Location  \
0        Ellis      Mossop    Male      Center   
1        Neale     Millins    Male  South west   
2       Leelah    Baudains  Female  North west   
3      Mattias      Grimme    Male  North west   
4       Lucila     Wornham  Female  South east   
5     Christen       Scare  Female  North west   
6      Jsandye    Dwelling  Female  North west   
7       Tracey      Downey  Female  North east   
8        Lukas      Casari    Male      Center   
9      Doral

## 5. Output results 
- Output results to an excel file
- Excel file will be posted to a folder called 'results' (the **results** folder is located in the same folder as the jupyter notebook)

In [8]:
df_teams.to_excel("results/teams.xls")

## 6. Conclusion
Re-run section 3 and 4 until all students have been give into a team. If there is uneven amount of students left unteamed, then just group these students manually.
