# Comparator computation

This computes the Euclidean distance for each establishment from a base establishment of a consistent type (Academies, Maintained schools, SEN). Each establishment is given a weight against every other establishment and the top 60 for each establishment creates the comparator set for that establishment. This is repeated for all establishments in the establishment types set and then finally across all establishments across all establishment types.

In [1]:
import numpy as np
import pandas as pd
import comparator_sets as comparators
import glob 
import os

# Create and clean directory
from pathlib import Path
Path("output/comparator-sets").mkdir(parents=True, exist_ok=True)

files = glob.glob("output/comparator-sets/*")
for f in files:
    os.remove(f)

# Prepare Academy and School Data

Here we prepare the academy and maintained school data by filling in missing values in NumberOfPupils, % Free School Meals and, % Sen with the mean (at this time). 

In [2]:
academy_data = comparators.prepare_data(pd.read_csv("output/pre-processing/academies.csv"))
ms_data = comparators.prepare_data(pd.read_csv("output/pre-processing/maintained_schools.csv", low_memory=False))
all_schools = pd.concat([academy_data, ms_data])

# Compute the pupil and building comparators

This creates the comparators sets across both academy and maintained schools

In [3]:
pupil_comparators = comparators.compute_comparator_matrix(all_schools, comparators.compute_pupils_comparator)
building_comparators = comparators.compute_comparator_matrix(all_schools, comparators.compute_buildings_comparator)

Save to disk

In [4]:
import pickle
with open('output/comparator-sets/pupil_comparators.pkl', 'wb') as outfile:
     pickle.dump(pupil_comparators, outfile, protocol=pickle.HIGHEST_PROTOCOL)
     
with open('output/comparator-sets/building_comparators.pkl', 'wb') as outfile:
     pickle.dump(building_comparators, outfile, protocol=pickle.HIGHEST_PROTOCOL)

Below is an example of extracting a school by name to show how the data structures work

In [5]:
target_school = 'Glebe Primary School'

comparator_set = comparators.get_comparator_set_by(lambda s: s['EstablishmentName'] == target_school, all_schools, pupil_comparators)
comparator_set[['URN']]

Unnamed: 0,URN
0,145110
3,117156
6,136967
7,117400
16,142036
1,118393
2,101601
4,134681
5,132251
8,105766


# Example using a custom comparator set

The example below selects a set of URN's based on a defined filter. And the 

In [7]:
# Custom specify some selection criteria for schools.
target_urn = 145110
custom_comparator_schools = all_schools[(all_schools['PFI School'] == 'PFI School') | (all_schools.index == target_urn)]
custom_comparators = comparators.compute_custom_comparator('PFI Comparator', custom_comparator_schools, comparators.compute_pupils_comparator)
cust_set = comparators.get_comparator_set_by(lambda s: s.index == target_urn, all_schools, custom_comparators, is_custom=True, comparator_key='PFI Comparator')

cust_set[['URN', 'Percentage SEN', 'Percentage Free school meals']]

Unnamed: 0,URN,Percentage SEN,Percentage Free school meals
0,145110,5.164319,18.7
58,145856,0.613497,29.4
59,139491,1.86722,33.0
1,140242,2.347418,17.6
2,140362,2.449889,20.9
3,138160,2.163462,16.1
4,142807,1.398601,16.1
5,137012,3.067485,16.0
6,139113,1.923077,21.2
7,148019,3.537736,22.8


# Computing RAG

RAG 

Unnamed: 0,URN,Company Registration Number,Academy Trust UPIN,Academy UKPRN,Academy UPIN,Valid to,Date left or closed if in period,Number of Academies in Trust,Number of pupils,LA (code),...,"Total Expenditure E01 to E29 and E31 to E32 minus I9, I10, I16 and I17",Total Expenditure E01 to E32,Revenue Reserve B01 plus B02 plus B06,In-year Balance Total Income (I01 to I18) minus Total Expenditure (E01 to E32),% of pupils known to be eligible for free school meals (Performa,School Balance,School Financial Position,Partial Years Present,Did Not Submit,Distance
count,30.0,30.0,30.0,30.0,30.0,0.0,0.0,30.0,30.0,30.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,30.0
mean,141274.666667,8041527.0,136419.5,10058240.0,129371.633333,,,16.033333,372.266667,780.4,...,,,,,,,,,,0.047423
std,2692.629907,950567.2,1253.082537,5380.045,8840.358334,,,13.220891,86.725064,234.310522,...,,,,,,,,,,0.025316
min,136849.0,6182612.0,134889.0,10039240.0,119849.0,,,1.0,167.0,320.0,...,,,,,,,,,,0.0
25%,139207.5,7695641.0,135294.25,10058260.0,123050.5,,,6.0,330.5,861.0,...,,,,,,,,,,0.032868
50%,140703.0,8033790.0,135947.0,10059360.0,128684.5,,,9.0,415.5,879.5,...,,,,,,,,,,0.043422
75%,142752.5,8628471.0,137509.5,10060550.0,132196.5,,,29.0,431.5,941.0,...,,,,,,,,,,0.06119
max,148019.0,10482810.0,139699.0,10064190.0,162519.0,,,44.0,459.0,941.0,...,,,,,,,,,,0.11349
