In [1]:
from pathlib import Path
import sys  

# Get my_package directory path from Notebook
parent_dir = str(Path().resolve().parents[0])
print(parent_dir)
# Add to sys.path

path_set = set(sys.path)
if parent_dir not in path_set:
    sys.path.insert(0, parent_dir)

print(sys.path)

/Users/colinbull/appdev/dfe/sfb/education-benchmarking-and-insights/data-pipeline
['/Users/colinbull/appdev/dfe/sfb/education-benchmarking-and-insights/data-pipeline', '/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python312.zip', '/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12', '/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/lib-dynload', '', '/Users/colinbull/Library/Caches/pypoetry/virtualenvs/fbit-data-pipeline-aJYNke-B-py3.12/lib/python3.12/site-packages']


# VMFI Data processing pipeline

This workbook aims to emulate the current data processing pipeline that occurs in VMFI pipeline. The logic and processing is largely based on the following document [Insights data portal - Data sources and sql analysis](https://educationgovuk.sharepoint.com.mcas.ms/:w:/r/sites/VMFI/_layouts/15/Doc.aspx?sourcedoc=%7B38C1DC37-7CDB-48B8-9E22-284F4F311C0B%7D&file=1.%20Insights%20portal%20-%20data%20sources%20and%20sql%20analysis%20v010%20-%20Copy.docx&action=default&mobileredirect=true) and will stay true to this document even if the existing stored procedures are doing something different. This will form the basis of a gap analysis going forward. 

All data loaded in the following workbook comes from the set of CSV files in the `data` folder alongside this workbook. These datasets are for the most part from the list at the start of the linked document. However, because there is additional standing data required to fully implement the pipeline then this data has been exported from the development VMFI pipeline database. These files are currently: 

| File name | DB Table |
|:----------|----------|
|standing_data_cdc.csv | standing_data.cdc |

In [2]:
import src.pipeline.pre_processing as pre_processing
import src.pipeline.output_schemas as output_schemas
import pandas as pd
import time
import glob
import os

In [3]:
# Create and clean directory
from pathlib import Path
Path("output/pre-processing").mkdir(parents=True, exist_ok=True)

files = glob.glob("output/pre-processing/*")
for f in files:
    os.remove(f)

In [4]:
start_time = time.time()
current_year = 2022

## CDC data load and preparation

School buildings condition dataset. Based on the surveys performed throughout 2018-2019.

The data in the file `data/standing_data_cdc.csv` is just an export of the data in `standing_data.cdc` table. Without the Year and Import ID fields. In future this will likely have to be read directly from the source database as per [this document.](https://educationgovuk.sharepoint.com.mcas.ms/:w:/r/sites/VMFI/_layouts/15/Doc.aspx?sourcedoc=%7B38C1DC37-7CDB-48B8-9E22-284F4F311C0B%7D&file=1.%20Insights%20portal%20-%20data%20sources%20and%20sql%20analysis%20v010%20-%20Copy.docx&action=default&mobileredirect=true) 

In [5]:
cdc = pre_processing.prepare_cdc_data('data/cdc.csv', current_year)

In [6]:
#cdc.to_csv('output/pre-processing/cdc.csv')
cdc

Unnamed: 0_level_0,Total Internal Floor Area,Age Average Score
URN,Unnamed: 1_level_1,Unnamed: 2_level_1
100150,2803.0,48.358188
100162,2105.0,133.162945
100164,2934.0,97.0
100166,2040.0,91.705882
105304,1602.0,35.752809
...,...,...
144913,3111.0,16.704275
144917,2620.0,78.412214
105623,3382.0,7.0
144918,4733.0,19.009296


## School Census data load

*Pupil Census* - DfE data collection providing information about school and pupil characteristics, for example percentage of pupils claiming free school`z meals, or having English as their second language. 

*Workforce census* - Single reference for all school workforce statistics based on staff working in publicly funded schools in England.

The following code loads both the workforce and pupil census data and preforms an `inner` join by URN on the data sets.

In [7]:
census = pre_processing.prepare_census_data('data/census_workforce.xlsx', 'data/census_pupils.csv')

#todo - add logic for highest / lowest age from census data

In [8]:
#census.to_csv('output/pre-processing/census.csv')
census

Unnamed: 0_level_0,region_name,district_administrative_name,ward_name,full time pupils,Percentage claiming Free school meals,Percentage Free school meals,number of pupils whose first language is known or believed to be other than English,Total School Workforce (Headcount),Total Number of Teachers in the Leadership Group (Headcount),Total Number of Teachers (Headcount),...,Total Number of Auxiliary Staff (Headcount),Total School Workforce (Full-Time Equivalent),Total Number of Teachers in the Leadership Group (Full-time Equivalent),Total Number of Teachers (Full-Time Equivalent),Total Number of Teaching Assistants (Full-Time Equivalent),FullTimeOther,Total Number of Auxiliary Staff (Full-Time Equivalent),Pupil: Teacher Ratio (Full-Time Equivalent of qualified and unqualified teachers),Teachers with Qualified Teacher Status (%) (Headcount),Number of Vacant Teacher Posts
URN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
141334,East Midlands,Nottingham,Bilborough,325.0,33.8,52.3,93.0,48,3,15,...,13,34.17,2.64,13.11,10.29,4.82,5.95,24.8,100.000000,0
141396,East Midlands,Nottingham,Aspley,599.0,23.4,60.3,236.0,118,4,39,...,34,82.47,4.00,34.00,29.55,10.13,8.79,18.3,100.000000,0
141397,East Midlands,Nottingham,Bilborough,465.0,33.2,47.7,127.0,105,5,27,...,27,72.81,4.24,24.55,31.84,6.55,9.87,19.7,100.000000,0
142223,East Midlands,Nottingham,Wollaton West,1050.0,5.1,8.7,343.0,156,5,56,...,47,99.66,4.16,47.12,33.07,6.57,12.90,23.0,100.000000,0
144396,East Midlands,Nottingham,Bulwell,196.0,56.7,64.8,29.0,37,2,13,...,11,25.57,2.00,11.39,7.36,4.00,2.82,18.1,100.000000,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
104642,North West,Liverpool,Church,421.0,2.4,2.6,14.0,52,4,18,...,18,34.47,3.60,15.80,6.27,5.22,7.18,26.6,100.000000,0
104643,North West,Liverpool,Cressington,426.0,3.5,8.5,13.0,68,3,19,...,19,39.89,3.00,17.40,11.74,3.34,7.41,24.7,100.000000,0
104645,North West,Liverpool,Tuebrook and Stoneycroft,237.0,32.9,33.8,43.0,37,3,13,...,10,26.47,3.00,12.40,6.78,3.19,4.10,19.1,92.307692,0
104646,North West,Liverpool,St Michael's,185.0,29.9,31.9,20.0,29,2,12,...,5,22.36,2.00,12.00,6.24,1.44,2.68,15.8,100.000000,0


## Special Education Needs (SEN) data load and preparation

Special educational needs dataset. Contains information about the number of pupils, who require various SEN provisions. This loads the `SEN` data, which originates from [here](https://explore-education-statistics.service.gov.uk/find-statistics/special-educational-needs-in-england#dataDownloads-1)

In [9]:
sen = pre_processing.prepare_sen_data('data/sen.csv')

In [10]:
#sen.to_csv("output/pre-processing/sen.csv")
sen

Unnamed: 0_level_0,Total pupils,EHC plan,Percentage SEN,Primary Need SPLD,Primary Need MLD,Primary Need SLD,Primary Need PMLD,Primary Need SEMH,Primary Need SLCN,Primary Need HI,...,Percentage Primary Need SLD,Percentage Primary Need PMLD,Percentage Primary Need SEMH,Percentage Primary Need SLCN,Percentage Primary Need HI,Percentage Primary Need VI,Percentage Primary Need MSI,Percentage Primary Need PD,Percentage Primary Need ASD,Percentage Primary Need OTH
URN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100000,271.0,8.0,2.952030,2.0,4.0,0.0,0.0,9.0,31.0,2.0,...,0.0,0.000000,3.321033,11.439114,0.738007,0.000000,0.738007,0.000000,3.321033,0.369004
100001,739.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
100002,269.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
100003,1045.0,0.0,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
100005,136.0,2.0,1.470588,0.0,0.0,0.0,1.0,0.0,0.0,0.0,...,0.0,0.735294,0.000000,0.000000,0.000000,0.000000,0.000000,0.735294,16.176471,0.735294
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
149557,41.0,3.0,7.317073,2.0,0.0,0.0,0.0,0.0,1.0,0.0,...,0.0,0.000000,0.000000,2.439024,0.000000,0.000000,0.000000,0.000000,4.878049,0.000000
149632,1291.0,58.0,4.492641,31.0,15.0,0.0,0.0,20.0,25.0,8.0,...,0.0,0.000000,1.549187,1.936483,0.619675,0.309837,0.077459,0.542215,2.013943,1.781565
149633,86.0,0.0,0.000000,2.0,1.0,0.0,0.0,1.0,1.0,0.0,...,0.0,0.000000,1.162791,1.162791,0.000000,0.000000,0.000000,0.000000,1.162791,1.162791
149635,654.0,10.0,1.529052,15.0,1.0,0.0,0.0,12.0,2.0,2.0,...,0.0,0.000000,1.834862,0.305810,0.305810,0.305810,0.000000,0.000000,0.305810,0.611621


## KS2 and KS4 processing

In [11]:
#ks2 = pre_processing.prepare_ks2_data('data/ks2.xlsx')
ks2 = pd.DataFrame({'URN':[100,200]})

In [12]:
#ks2.to_csv('output/pre-processing/ks2.csv')
ks2

Unnamed: 0,URN
0,100
1,200


In [13]:
#ks4 = pre_processing.prepare_ks4_data('data/ks4.xlsx')
ks4 = pd.DataFrame({'URN':[100,200]})

In [14]:
#ks4.to_csv('output/pre-processing/ks4.csv')
ks4

Unnamed: 0,URN
0,100
1,200


## AR Data load and preparation

This loads the Annual accounts return dataset and the corresponding mapping file. This extract only contains benchmarking section, which consists of submissions of costs, income, and balances of individual academies.

The mapping file, contains the mapping from AR4 cell references to cost categories and descriptions.

In [15]:
academy_ar = pre_processing.prepare_aar_data('data/academy_ar.xlsx')

In [16]:
#academy_ar.to_csv('output/pre-processing/academy_ar.csv')
academy_ar

Unnamed: 0_level_0,Trust UPIN,Date joined or opened if in period,London Weighting,PFI School,DFE/EFA Revenue grants (includes Coronavirus Government Funding,of which: Coronavirus Government Funding,SEN funding,Other DfE/EFA Revenue Grants,Other income - LA & other Government grants,"Government source, non-grant",...,Trust_Income from catering,Trust_Receipts from supply teacher insurance claims,Trust_Donations and/or voluntary funds,Trust_Other self-generated income,Trust_Investment income,Central Services Balance,Central Services Financial Position,Academy Financial Position,Trust Financial Position,Is PFI
Academy UPIN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
111443,137157,,Neither,Non-PFI school,7967000.0,41000.0,153000.0,262000.0,0.0,0.0,...,1063000.0,0.0,127000.0,473000.0,0.0,-1830000.0,Deficit,Deficit,Deficit,False
111451,138199,,Neither,Non-PFI school,6342000.0,80000.0,222000.0,7000.0,203000.0,0.0,...,1000.0,0.0,702000.0,0.0,0.0,-8541000.0,Deficit,Deficit,Surplus,False
111453,135112,,Neither,Non-PFI school,2798000.0,25000.0,162000.0,63000.0,0.0,0.0,...,82000.0,0.0,426000.0,0.0,0.0,-3191000.0,Deficit,Surplus,Surplus,False
111710,135428,,Neither,Non-PFI school,7685000.0,79000.0,323000.0,215000.0,83000.0,0.0,...,252000.0,0.0,19000.0,9000.0,15000.0,0.0,Deficit,Deficit,Deficit,False
113087,136879,,Neither,Non-PFI school,8021000.0,0.0,93000.0,45000.0,81000.0,0.0,...,0.0,0.0,14000.0,666000.0,26000.0,0.0,Deficit,Deficit,Deficit,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
164644,151923,2023-07-01 00:00:00,Neither,Non-PFI school,227000.0,0.0,18000.0,2000.0,0.0,0.0,...,81000.0,0.0,5000.0,99000.0,0.0,-253000.0,Deficit,Deficit,Surplus,False
164745,136351,2023-07-01 00:00:00,Neither,Non-PFI school,671000.0,15000.0,183000.0,44000.0,0.0,0.0,...,62000.0,83000.0,71000.0,267000.0,0.0,-989000.0,Deficit,Surplus,Surplus,False
164746,135065,2023-08-01 00:00:00,Neither,Non-PFI school,83000.0,0.0,127000.0,0.0,0.0,0.0,...,334000.0,0.0,525000.0,0.0,0.0,-8019000.0,Deficit,Surplus,Surplus,False
164811,135065,2023-08-01 00:00:00,Neither,Non-PFI school,54000.0,0.0,33000.0,0.0,1000.0,0.0,...,334000.0,0.0,525000.0,0.0,0.0,-8019000.0,Deficit,Deficit,Surplus,False


Create a summary table for the AR stance of each distinct academy in the table.

Now compute the trust financial position in the same manor as the individual academy position

## Academy and maintained schools data load and preparation

This reads the main GIAS data (edubasealldataYYYYMMDD file) and the associated links file (links_edubasealldataYYYYMMDD file). This is taken from the [GIAS Service](https://get-information-schools.service.gov.uk/help)

Other columns are tidied up by asserting the correct type for that column. This is tidying phase is largly because on load integer columns will be inferred to be a float as opposed to an integer.

In [18]:
schools = pre_processing.prepare_schools_data('data/gias.csv', 'data/gias_links.csv')


In [19]:
#schools.to_csv('output/pre-processing/schools.csv')
schools.sort_index()

Unnamed: 0_level_0,LA (code),LA (name),EstablishmentNumber,EstablishmentName,TypeOfEstablishment (code),TypeOfEstablishment (name),EstablishmentStatus (code),EstablishmentStatus (name),OpenDate,CloseDate,...,PreviousLA (code),PreviousLA (name),PreviousEstablishmentNumber,OfstedRating (name),MSOA (code),LSOA (code),LA Establishment Number,Has Nursery,Has Sixth Form,HeadName
URN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
100000,201,City of London,3614,The Aldgate School,2,Voluntary aided school,1,Open,NaT,NaT,...,999,,,Outstanding,E02000001,E01032739,201-3614,True,False,Miss Alexandra Allan
100001,201,City of London,6005,City of London School for Girls,11,Other independent school,1,Open,1920-01-01,NaT,...,999,,,,E02000001,E01000002,201-6005,False,True,Mrs Jenny Brown
100002,201,City of London,6006,St Paul's Cathedral School,11,Other independent school,1,Open,1939-01-01,NaT,...,999,,,,E02000001,E01032739,201-6006,False,False,
100003,201,City of London,6007,City of London School,11,Other independent school,1,Open,1919-01-01,NaT,...,999,,,,E02000001,E01032739,201-6007,False,True,Mr Alan Bird
100005,202,Camden,1048,Thomas Coram Centre,15,Local authority nursery school,1,Open,NaT,NaT,...,999,,,Outstanding,E02007115,E01000937,202-1048,True,False,Ms Perina Holness
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
402468,679,Monmouthshire,5500,King Henry viii 3-19 School,30,Welsh establishment,1,Open,2023-09-01,NaT,...,999,,,,999999999,999999999,679-5500,False,False,
402469,681,Cardiff,2333,Ysgol Gynradd Groes-Wen Primary,30,Welsh establishment,1,Open,2023-09-01,NaT,...,999,,,,W02000380,W01001729,681-2333,False,False,
402470,668,Pembrokeshire,2398,Ysgol Bro Penfro,30,Welsh establishment,4,Proposed to open,2024-09-01,NaT,...,999,,,,W02000140,W01000607,668-2398,False,False,
402471,679,Monmouthshire,2325,Ysgol Gymraeg Trefynwy,30,Welsh establishment,4,Proposed to open,2024-09-01,NaT,...,999,,,,W02000339,W01001978,679-2325,False,False,


Merge required GIAS, census, sen, cdc, PFI, and arr data with the base academy data

In [20]:
academies = pre_processing.build_academy_data('data/academy_master_list.csv', 
                                              current_year, schools, census, sen, cdc, 
                                              academy_ar, ks2, ks4)

In [21]:
#academies.to_csv('output/pre-processing/academies.csv', columns=output_schemas.academies_output)
academies.sort_values(by="Academy UPIN")

Unnamed: 0_level_0,URN,Company Registration Number,Incorporation Date,Academy Trust UPIN,Trust UKPRN,Trust Name,Academy Name,Academy UPIN,Trust Type,Date Opened,...,Other costs_Staff development and training_Per Unit,Other costs_Staff-related insurance_Per Unit,Other costs_Supply teacher insurance_Per Unit,Other costs_Rent and rates_Per Unit,Other costs_Special facilities_Per Unit,Other costs_Other insurance premiums_Per Unit,Other costs_Interest charges for loan and bank_Per Unit,Other costs_Direct revenue financing_Per Unit,Other costs_PFI charges_Per Unit,Other costs_Total_Per Unit
UKPRN,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10038597,138623,08146633,2012-07-17 00:00:00.0000000,137157,10059937,Excalibur Academies Trust,St John's Marlborough,111443,Multi Academy Trust (MAT),2012-09-01 00:00:00.0000000,...,7.013442,0.0,0.0,29.80713,0.0,21.040327,2.337814,-9.351257,0.0,65.458796
10038652,138630,08075785,2012-05-18 00:00:00.0000000,138199,10048061,The White Horse Federation,Devizes School,111451,Multi Academy Trust (MAT),2012-09-01 00:00:00.0000000,...,14.925373,0.0,0.0,2.487562,0.0,0.0,0.0,78.772803,0.0,105.306799
10057185,143005,07654902,2011-06-01 00:00:00.0000000,135112,10058819,Acorn Education Trust,Avon Valley Academy,111453,Multi Academy Trust (MAT),2016-07-01 00:00:00.0000000,...,9.925558,0.0,0.0,37.220844,0.0,22.332506,0.0,0.0,0.0,89.330025
10031360,136296,07344277,2010-08-12 00:00:00.0000000,135428,10058329,Hardenhuish School Limited,Hardenhuish School,111710,Single Academy Trust (SAT),2010-09-01 00:00:00.0000000,...,34.816248,0.0,0.0,0.0,0.0,30.30303,0.0,9.67118,0.0,74.790458
10036860,137982,07959980,2012-02-22 00:00:00.0000000,136879,10059615,Mark Education Trust,Beacon Academy,113087,Multi Academy Trust (MAT),2012-04-01 00:00:00.0000000,...,13.218771,0.0,0.0,33.046927,0.0,0.0,0.0,-148.71117,0.0,-70.720423
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10090869,149040,08515149,2013-05-02 00:00:00.0000000,137975,10060434,Scholars Academy Trust,Foxbridge Primary School,163906,Multi Academy Trust (MAT),2022-08-30 00:00:00.0000000,...,inf,,inf,,,,,,,inf
10090824,149221,08566185,2013-06-12 00:00:00.0000000,137607,10060493,Perry Hall Multi-academy Trust,Sledmere Primary School,163968,Multi Academy Trust (MAT),2017-11-01 00:00:00.0000000,...,inf,,,,,inf,,inf,,inf
10090825,149222,10192252,2016-05-20 00:00:00.0000000,139657,10064173,Connect Academy Trust,Cockington Primary School,163969,Multi Academy Trust (MAT),2013-09-01 00:00:00.0000000,...,inf,,,,,,,,,inf
10090826,149205,10265276,2016-07-06 00:00:00.0000000,139732,10064251,Coast And Vale Learning Trust,Filey School,163970,Multi Academy Trust (MAT),2015-09-01 00:00:00.0000000,...,inf,,,inf,,inf,,inf,,inf


Merge required census and cdc data to the maintained schools data set

In [23]:
# Load raw list from CSV
maintained_schools = pre_processing.build_maintained_school_data('data/maintained_schools_master_list.csv', 'data/gias_all_links.csv', current_year, schools, census, sen, cdc, ks2, ks4)

In [None]:
# maintained_schools.to_csv('output/pre-processing/maintained_schools.csv', columns=output_schemas.maintained_schools_output)
maintained_schools.dtypes.sort_index()

In [None]:
academies.dtypes.sort_index()

In [27]:
import pyodbc

pyodbc.drivers()

os.environ["DATABASE_CONNECTION_STRING"] = "Driver={ODBC Driver 18 for SQL Server};Server=localhost,1433;Database=Core;UID=sa;PWD=mystrong!Pa55word;Encrypt=no;TrustServerCertificate=yes;Connection Timeout=30"

all_school = pd.concat([academies, maintained_schools])

import src.pipeline.database as db

db.insert_school("Default", "2022", all_school)

  all_school = pd.concat([academies, maintained_schools])


IntegrityError: (pyodbc.IntegrityError) ('23000', "[23000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Cannot insert the value NULL into column 'FederationLeadUKPRN', table 'Core.dbo.School'; column does not allow nulls. INSERT fails. (515) (SQLExecDirectW)")
[SQL: INSERT INTO dbo.[School] ([UKPRN], [URN], [SchoolName], [TrustUKPRN], [TrustName], [FederationLeadUKPRN], [FederationLeadName], [LACode], [LAName], [LondonWeighting], [FinanceType], [OverallPhase], [SchoolType], [HasSixthForm], [HasNursery], [IsPFISc ... 6482 characters truncated ... ?, ?, ?, ?, ?, ?, ?, ?, ?, ?), (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)]
[parameters: (10089550, 148853, 'Evenwood Church of England Primary School', 10064612, '1Excellence Multi Academy Trust', None, None, '840', 'County Durham', 'Neither', 'Academy', None, 'Primary', 0, 1, 0, None, '', '01388832047', 'https://www.evenwood.durham.sch.uk/', '', 'Ms Debbie Hamilton', '', 10064014, 144542, 'Pentland Primary School', 10064612, '1Excellence Multi Academy Trust', None, None, '808', 'Stockton-on-Tees', 'Neither', 'Academy', None, 'Primary', 0, 1, 0, '22/09/2021', 'Good', '01642559609', 'https://pentlandprimary.org.uk', '', 'Miss Stephanie Robinson', '', 10064040, 144551, "St Mark's Church of England Primary School", 10064612 ... 1993 parameters truncated ... 'https://www.nwpa.attrust.org.uk', '', 'Mr N Bradnick-Thompson', '', 10057653, 143382, 'Phoenix Academy', 10039859, 'Academy Transformation Trust', None, None, '335', 'Walsall', 'Neither', 'Academy', None, 'Special', 0, 0, 0, '23/11/2022', 'Good', '01922712834', 'https://www.phoenix.attrust.org.uk', '', 'Miss Elyse Phillips', '', 10055379, 142594, 'Pool Hayes Academy', 10039859, 'Academy Transformation Trust', None, None, '335', 'Walsall', 'Neither', 'Academy', None, 'Secondary', 1, 0, 0, '27/04/2022', 'Good', '01902368147', 'https://www.poolhayes.attrust.org.uk', '', 'Mr Andrew Lawrence', '')]
(Background on this error at: https://sqlalche.me/e/20/gkpj)