# QMUL Careers Team Python Script
This is a detailed guide on how to run this programme.


## Excel File Housekeeping



We need the excel workbook that the programme will use, and within that the excel spreadsheet with the student information. 

Please name the file: <b>'applications.xlsx'</b> with the application information on a spreadsheet called: <b>data </b> and save it on your computer.


It is really important that the column headers which are used for sorting are named correctly. Before you upload the spreadsheet please ensure these columns match these headers, you can paste them into the spreadsheet as they appear below:

student number <br>
year of study<br>
gender<br>
faculty<br>
total <i> (referring to total score) </i><br>
course<br>
personality type<br>

The video below shows how to add this to the programme.

Each text box describe what the code does in the cell below it.

The first cell imports some of the software that the programme needs, if you click the small play button above, or press Shift and Enter, it will run the cell.

In [None]:
import pandas as pd
import openpyxl
import numpy as np
import os
import ipywidgets

In [None]:
from IPython.display import Video; from ipywidgets import interactive, IntSlider

vid = Video(filename="media/add_applications.mov",data="", width=800, height = 600)
display(vid)

Next, point the programme at the excel sheet just uploaded.

In [None]:
def get_sheet():
    for file in os.listdir('.'):
        if file.endswith("applications.xlsx"):
            myfile = file
            print(f'{file} has been selected.')
        else:
            pass           
    try:
        return myfile
    except:
        print('No file has been uploaded')
    
filename = get_sheet()

Read the spreadsheet in, and perform some data cleaning.

In [None]:
df = pd.read_excel(filename,"data")
for i in df.columns:
    df.rename(columns={i:i.lower()}, inplace=True)

criteria = [ 
'student number',
'year of study',
'gender',
'faculty',
'total',
'course',
'personality type']

Please use the dropdown menu to select the criteria to sort the groups.

In [None]:
mywidge1 = ipywidgets.widgets.SelectMultiple(
    options=criteria,
    value=['total'],
    rows=len(criteria),
    disabled=False
)

class SelectMultipleInteract(ipywidgets.widgets.HBox):

    def __init__(self):
        self.W1 = ipywidgets.widgets.SelectMultiple(
        options=criteria,
        value=['total'],
        rows=len(criteria),
        disabled=False
)

        self.selectors = [self.W1]
        super().__init__(children=self.selectors)
        self._set_observes()

    def _set_observes(self):
        for widg in self.selectors:
            widg.observe(self._observed_function, names='value')

    def _observed_function(self, widg):
        for widg in self.selectors:
            # print(widg.get_interact_value())
            return list(widg.get_interact_value())

mywidge = SelectMultipleInteract()
mywidge

In [None]:
options = mywidge._observed_function(mywidge1)
for_sorting = ['total']
for_sorting = for_sorting+[x for x in options if x != 'total']
print(f'The algorithm will sort using {for_sorting} from left to right')

Any missing scores will be replaced with the average score of the dataset.

In [None]:
df['total'].fillna(df['total'].mean(), inplace=True)

Some exploratory data analysis in the next 2 cells

In [None]:
df['gender'].value_counts()

In [None]:
df['year of study'].value_counts()

Take only the columns important to establishing the groups. 

In [None]:
group_df = df[criteria]

In [None]:
group_df = group_df.assign(groupno='')



## The Sorting Algorithm

In [None]:
def assign_group(df):
    df.sort_values(by=for_sorting, ascending=False, inplace=True)
    count = 1
    up = True
    for i,row in group_df.iterrows():

        if count <= round(len(df)/5,0) and up == True:
            df.at[i,'groupno'] = count
            count += 1

        elif count > round(len(df)/5,0) and up == True:
            df.at[i,'groupno'] = count-1
            count-=1
            up = False

        if count > 1 and up == False:
            df.at[i,'groupno'] = count
            count -= 1
            
        elif count == 1 and up == False:
            df.at[i,'groupno'] = count
            up = True
    
    return df

assigned = assign_group(group_df)

    

## Data Analysis

In [None]:
assigned.head()

In [None]:
assigned.groupby(['groupno'])[['total']].mean().round(2).min()

In [None]:
assigned.pivot_table(index=['groupno'], columns=['year of study'], aggfunc='size', fill_value=0)


In [None]:
assigned.pivot_table(index=['groupno'], columns=['gender'], aggfunc='size', fill_value=0)


### Create a new spreadsheet to download

In [None]:
subset = assigned[['student number','groupno']]

final = pd.merge(df,subset, how='inner', on='student number')



In [None]:
final.to_excel("grouped_student_applications.xlsx",sheet_name='Grouped Students') 

In [None]:
from IPython.display import Video; from ipywidgets import interactive, IntSlider

vid = Video(filename="media/download_file.mov",data="", width=800, height = 600)
display(vid)

## Make sure you download your new spreadsheet, you can copy and paste it into your previous working document.