# Grouping script for VAMP

## 1. Set up

- Load csv file into a dataframe
    - Gender, Location, and Nationality are loaded as categorical
- Initialize variables
    - *teamNumberIndex* keeps track of which team is currently being build
    - *teamSize* is the amount of students in a team
    - *df_teams* will be the dataframe where we save the teams computed.

In [1]:
import math
import pandas as pd
import numpy as np

teamNumberIndex = 0
teamSize = 4
df_teams = pd.DataFrame(columns=['first_name', 'last_name', 'Team'])

#Load the data
df = pd.read_csv('MOCK_DATA.csv', dtype={'Gender' : 'category', 'Location' : 'category', 'Nationality' : 'category'})

#Create a dataframe to store team results

#Add a new collumn which keeps track of which student has been put into team so far
df['studentID'] = np.arange(len(df))
#del df['id']

## 2. Understand the data

The cells below are commands use to understand the data. For cate

In [2]:
df.Age.describe()

count    100.000000
mean      22.130000
std        2.376888
min       18.000000
25%       20.000000
50%       22.000000
75%       23.000000
max       29.000000
Name: Age, dtype: float64

In [3]:
df['Location'].value_counts()

South west    23
Center        22
South east    20
North west    20
North east    15
Name: Location, dtype: int64

In [4]:
df.Location.describe()

count            100
unique             5
top       South west
freq              23
Name: Location, dtype: object

In [5]:
df.Nationality.describe()

count       100
unique       40
top       China
freq         16
Name: Nationality, dtype: object

## 3. Search for a team
The following were the criteria for a team

- Gender diversity
- Diversity in nationalities
- Student who live live near each other should be paired up

Getting students from the same location is done easily in pandas. The operation is described as removing rows (students) from the dataframe which satisfies a boolean. In other words, only consider a subset of students who are in the same location. The following python line creates a new dataframe with all of the students who are in the same location.

*df_condition = df[df.Location == 'South east']*

The other two conditions is harder to satisfy. An randomized approached will be taken where a user manually checks whether the students randomly matched is 'satisfiable.' If not, then re-run the cell and another set up students will be randomly matched up within the same location. 


In [6]:


print("There are {0} student left".format(len(df))) 

df_condition = df[df.Location == 'South east']            

    
#Select 4 random to be put into a group from the new dataframe which includes only students from South east
selectedStudents = np.random.choice(df_condition['studentID'].values, teamSize, replace = False)
team = df.loc[selectedStudents]
print("\n \n Random search gave the following potential team \n")
print(team)





There are 100 student left

 
 Random search gave the following potential team 

   first_name  last_name  Gender    Location    Nationality  Age  studentID
91   Jessalyn   Thickens  Female  South east     Kazakhstan   23         91
53      Sibyl     Ambrus    Male  South east  United States   22         53
40    Filippo  Llewellyn    Male  South east         Greece   25         40
65     Elliot     McGuff    Male  South east          Chile   20         65


## 4. Team creation

If the above team attributes looks acceptable then we will.

- Adds the team of students to our final dataframe containing all of our teams (*df_teams*)
- Remove the team of students from the dataframe object which includes all of the unteamed students



In [7]:
teamNumberIndex = teamNumberIndex + 1

#Remove students from the dataframe containing all of the unteamed students
df = df.drop(selectedStudents)

#Create a new collumn where we denote the team number
team['Team'] = [teamNumberIndex]*teamSize
#Remove uncesserary collumns from the final team output
team = team.drop(['Gender', 'Nationality', 'Age', 'Location', 'studentID'],axis=1)


#Add the found team to our dataframe containing all of the teams
df_teams = df_teams.append(team)

print("Teams found so far \n ")
print(df_teams)

Teams found so far 
 
   first_name  last_name Team
91   Jessalyn   Thickens    1
53      Sibyl     Ambrus    1
40    Filippo  Llewellyn    1
65     Elliot     McGuff    1


## 5. Output results 
- Output results to an excel file
- Excel file will be posted to a folder called 'results' (the **results** folder is located in the same folder as the jupyter notebook)

In [8]:
df_teams.to_excel("results/teams.xls")

## 6. Conclusion
Re-run section 3 and 4 until all students have been give into a team. If there is uneven amount of students left unteamed, then just group these students manually.