# Comparator computation

This computes the Euclidean distance for each establishment from a base establishment of a consistent type (Academies, Maintained schools, SEN). Each establishment is given a weight against every other establishment and the top 60 for each establishment creates the comparator set for that establishment. This is repeated for all establishments in the establishment types set and then finally across all establishments across all establishment types.

In [7]:
import numpy as np
import time 
import pickle
import pandas as pd
import comparator_sets as comparators
import glob 
import os

start_time = time.time()
# Create and clean directory
from pathlib import Path
Path("output/comparator-sets").mkdir(parents=True, exist_ok=True)

files = glob.glob("output/comparator-sets/*")
for f in files:
    os.remove(f)

# Prepare Academy and School Data

Here we prepare the academy and maintained school data by filling in missing values in NumberOfPupils, % Free School Meals and, % Sen with the mean (at this time). 

In [8]:
academy_data = comparators.prepare_data(pd.read_csv("output/pre-processing/academies.csv"))
ms_data = comparators.prepare_data(pd.read_csv("output/pre-processing/maintained_schools.csv", low_memory=False))
all_schools = pd.concat([academy_data, ms_data])

with open('output/comparator-sets/schools.pkl', 'wb') as school_file:
     pickle.dump(all_schools, school_file, protocol=pickle.HIGHEST_PROTOCOL)
     school_file.close()
     

# Compute the pupil and building comparators

This creates the comparators sets across both academy and maintained schools

In [9]:
pupil_comparators = comparators.compute_comparator_matrix(all_schools, comparators.compute_pupils_comparator)
building_comparators = comparators.compute_comparator_matrix(all_schools, comparators.compute_buildings_comparator)

Save to disk

In [10]:
with open('output/comparator-sets/pupil_comparators.pkl', 'wb') as pupil_file:
     pickle.dump(pupil_comparators, pupil_file, protocol=pickle.HIGHEST_PROTOCOL)
     pupil_file.close()
     
with open('output/comparator-sets/building_comparators.pkl', 'wb') as build_file:
     pickle.dump(building_comparators, build_file, protocol=pickle.HIGHEST_PROTOCOL)
     build_file.close()

Below is an example of extracting a school by name to show how the data structures work

In [11]:
target_school = 'Glebe Primary School'

comparator_set = comparators.get_comparator_set_by(lambda s: s['EstablishmentName'] == target_school, all_schools, pupil_comparators)
comparator_set[['URN']]

Unnamed: 0,URN
0,145110
3,117156
6,136967
7,117400
16,142036
1,118393
2,101601
4,134681
5,132251
8,105766


# Example using a custom comparator set

The example below selects a set of URN's based on a defined filter. And the 

In [12]:
# Custom specify some selection criteria for schools.
target_urn = 145110
custom_comparator_schools = all_schools[(all_schools['PFI School'] == 'PFI School') | (all_schools.index == target_urn)]
custom_comparators = comparators.compute_custom_comparator('PFI Comparator', custom_comparator_schools, comparators.compute_pupils_comparator)
cust_set = comparators.get_comparator_set_by(lambda s: s.index == target_urn, all_schools, custom_comparators, is_custom=True, comparator_key='PFI Comparator')

cust_set[['URN', 'Percentage SEN', 'Percentage Free school meals']]

Unnamed: 0,URN,Percentage SEN,Percentage Free school meals
0,145110,5.164319,18.7
58,145856,0.613497,29.4
59,139491,1.86722,33.0
1,140242,2.347418,17.6
2,140362,2.449889,20.9
3,138160,2.163462,16.1
4,142807,1.398601,16.1
5,137012,3.067485,16.0
6,139113,1.923077,21.2
7,148019,3.537736,22.8


### Timing Keep at the bottom

In [13]:
print(f'Processing Time: {time.time() - start_time} seconds')

Processing Time: 51.56545090675354 seconds
