# QMUL Careers Team Python Script
### A detailed guide on how to run this programme.



Each text box describe what the code does in the cell below it.

The first cell imports some of the software that the programme needs, if you click the small play button above, or press Shift and Enter, it will run the cell.

In [75]:
import pandas as pd
import openpyxl
import numpy as np
import os
import ipywidgets as widgets

We need the excel workbook that the programme will use, and within that the excel spreadsheet with the student information. Please name the spreadsheet with the application information: "data" and save it on your computer.

The video below shows how to add this to the programme.

In [22]:
from IPython.display import Video; from ipywidgets import interactive, IntSlider

vid = Video(filename="media/add_applications.mov",data="", width=800, height = 600)
display(vid)



Next, point the programme at the excel sheet just uploaded.

In [23]:
def get_sheet():
    for file in os.listdir('.'):
        if file.endswith(".xlsx"):
            return file

filename = get_sheet()

Read the spreadsheet in, and perform some data cleaning.

In [26]:
df = pd.read_excel(filename,"data")


Any missing scores will be replaced with the average score of the dataset.

In [27]:
df['Total score'].fillna(df['Total score'].mean(), inplace=True)

Some exploratory data analysis in the next 2 cells

In [104]:
df['Gender'].value_counts()

F    42
M    18
Name: Gender, dtype: int64

In [106]:
df['YearOfStudy'].value_counts()

2    32
3    20
1     5
4     3
Name: YearOfStudy, dtype: int64

Take only the columns important to establishing the groups. 

In [89]:
group_df = df[[
'email', 
'Course', 
'YearOfStudy',
'Gender', 
'Bursary holder', 
'Client meetings delivery', 
'Total score',
'Personality type']]

'email', 
'Course', 
'YearOfStudy',
'Gender', 
'Bursary holder', 
'Client meetings delivery', 
'Total score',
'Personality type'

In [90]:
group_df = group_df.assign(groupno='')


You can decide which variables you would like to prioritize your sort by.

In [99]:
the_vars = widgets.SelectMultiple(
    options=[
'Course', 
'YearOfStudy',
'Gender', 
'Bursary holder', 
'Client meetings delivery', 
'Personality type'],
    value=['Gender'],
    #rows=10,
    description='Scores',
    disabled=False
)
the_vars

SelectMultiple(description='Scores', index=(2,), options=('Course', 'YearOfStudy', 'Gender', 'Bursary holder',â€¦

The script will always prioritise total score first and foremost.

In [101]:
to_sort = list(the_vars.value)
to_sort.insert(0,'Total score')

['Total score', 'Gender']


#### The Sorting Algorithm

In [108]:
def assign_group(df):
    df.sort_values(by=to_sort, ascending=False, inplace=True)
    count = 1
    up = True
    for i,row in group_df.iterrows():

        if count <= round(len(df)/6,0) and up == True:
            df.at[i,'groupno'] = count
            count += 1

        elif count > round(len(df)/6,0) and up == True:
            df.at[i,'groupno'] = count-1
            count-=1
            up = False

        if count > 1 and up == False:
            df.at[i,'groupno'] = count
            count -= 1
        elif count == 1 and up == False:
            df.at[i,'groupno'] = count
            up = True
    
    return df

assigned = assign_group(group_df)

    

More data analysis for the different groups

In [109]:
assigned.groupby('groupno').mean()

Unnamed: 0_level_0,YearOfStudy,Total score
groupno,Unnamed: 1_level_1,Unnamed: 2_level_1
1,2.166667,7.0
2,2.833333,7.083333
3,2.5,7.25
4,2.5,7.416667
5,2.166667,7.333333
6,2.333333,7.25
7,2.5,7.25
8,2.166667,7.083333
9,2.166667,7.028736
10,2.166667,7.028736


In [110]:
assigned.pivot_table(index=['groupno'], columns=['YearOfStudy'], aggfunc='size', fill_value=0)


YearOfStudy,1,2,3,4
groupno,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,0,5,1,0
2,0,2,3,1
3,0,4,1,1
4,0,3,3,0
5,1,3,2,0
6,0,4,2,0
7,0,3,3,0
8,2,2,1,1
9,1,3,2,0
10,1,3,2,0


In [111]:
assigned.pivot_table(index=['groupno'], columns=['Gender'], aggfunc='size', fill_value=0)


Gender,F,M
groupno,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,1
2,4,2
3,4,2
4,4,2
5,3,3
6,5,1
7,6,0
8,5,1
9,3,3
10,3,3


### Send the groups back into the original spreadsheet

In [112]:
subset = assigned[['email','groupno']]

final = pd.merge(df,subset, how='inner', on='email')



Write them into the sheet for the download.

In [113]:
book = openpyxl.load_workbook(filename)
writer = pd.ExcelWriter(filename, engine = 'openpyxl')
writer.book = book
final.to_excel(writer, sheet_name='New Sheet Groups')
writer.save()
writer.close()