#### Energy Efficiency Installation Checker External Scheme Raw to Base Pipeline

##### Description: 
This notebook processes ingested external energy efficiency schemes that are not on the DMS
1. reading onboarded files from this location
    - `/dbfs/mnt/datalake/Raw/PMDS/OfficialSensitive/External/`
2. only the latest submission file is read/processed (historical submissions are ignored)
3. cleaning the dataframes to standaradise null
4. standardising mandatory scheme column names to internal naming convention
<!-- 5. mapping the Installation Measures to EEL and SAP standards. -->
6. adding the scheme name (db_type) to each dataframe.
<!-- 7. separating valid and invalid records using the uprn/postcode pair join with AddressBase Premium dataset -->
<!-- 8. saving the invalid records to a quarantine folder in the raw layer in preparation for Splink address matching
    - `/mnt/datalake/Raw/PMDS/OfficialSensitive/external_quarantine/` -->
4. valid records processed. stadardised and saved in the Base layer.
    - `/mnt/datalake/Base/PMDS/OfficialSensitive/External/valid_measures/`
---
##### Inputs:
1. **Onboarding scheme** (ECO 4, BUS): Each onboarding scheme file that is ingested in the following location
    - `/dbfs/mnt/datalake/Raw/PMDS/OfficialSensitive/External/`
<!-- 2. **AddressBase Premium** catalog table: 
    - `hive_metastore.eeic_db.pmcv2_addressbase_premium` -->
<!-- 3. **EEL & SAP** lookup file: 
    - `/dbfs/mnt/datalake/Base/PMDS/NonSensitive/EEL/eel_lookups_v1.csv` -->
4. **External reference** files:
    - `/dbfs/mnt/datalake/Base/PMDS/NonSensitive/ExternalSchemes/`
    - `mandatory_column_mapping.csv` and `onboarded_scheme_column_list.csv`
---
##### Outputs: 
**valid measures** (Delta parquet files):
    - `/mnt/datalake/Base/PMDS/OfficialSensitive/External/valid_measures/`
---

##### Notes:
- The files are being read from the agreed validation schema from the front end.  If there is a mismatch between the front-end and back-end scheme then the pipeline will fail.
---


In [0]:
%run ./EEIC-Checker-External-r2b-Parent

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[2mUsing Python 3.10.12 environment at: /local_disk0/.ephemeral_nfs/envs/pythonEnv-f999ee4c-8130-40fd-abd5-ebd7dc37f7a9[0m
[2mAudited [1m1 package[0m [2min 38ms[0m[0m
[2mUsing Python 3.10.12 environment at: /local_disk0/.ephemeral_nfs/envs/pythonEnv-f999ee4c-8130-40fd-abd5-ebd7dc37f7a9[0m
[2mAudited [1m1 package[0m [2min 12ms[0m[0m
[2mUsing Python 3.10.12 environment at: /local_disk0/.ephemeral_nfs/envs/pythonEnv-f999ee4c-8130-40fd-abd5-ebd7dc37f7a9[0m
[2mAudited [1m1 package[0m [2min 140ms[0m[0m


Imports Loaded


In [0]:
generic_pipeline(defined_schema_path="/dbfs/mnt/datalake/Base/PMDS/NonSensitive/ExternalSchemes/")

given a scheme name, read the scheme ECO4
latest submission: 2024-12-30
file location path: /dbfs/mnt/datalake/Raw/PMDS/OfficialSensitive/External/ECO4/2024/202412/20241230
the file being processed: eco4_test2_upload_20241230.csv
cleaning dataframe
mapping mandatory columns to internal naming
adding db_type
processing for curated


Overall_Obligation_Period,subname,number,name,streetname,townname,postcode,uprn,Measure_Reference_Number,measure_type,Measure_Group,Measure_Group2,date_of_install,TrustmarkUniqueMeasureReference,TrustmarkLodgedCertificateID,MCSInstallationCertificationNumber,Pre_Main_Heating_Source_for_the_Property,Post_Main_Heating_Source_for_the_Property,db_type,measure_details
ECO3i,FLAT 19,15.0,Westfield Oaks,Westfield Avenue,HAYLING ISLAND,PO11 9AQ,100062000000.0,SHL7093742,B_First_time_CH_cavity,Boiler,First time Central Heating,17/06/2022 00:00,85al27ZO7Nd,618315,,Electric Storage Heaters,Gas boiler,ECO4,"{""Measure_Group"": ""Boiler"", ""Pre_Main_Heating_Source_for_the_Property"": ""Electric Storage Heaters"", ""Measure_Group2"": ""First time Central Heating"", ""Overall_Obligation_Period"": ""ECO3i"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""SHL7093742"", ""TrustmarkLodgedCertificateID"": ""618315"", ""Post_Main_Heating_Source_for_the_Property"": ""Gas boiler"", ""TrustmarkUniqueMeasureReference"": ""85al27ZO7Nd""}"
ECO3i,,60.0,,QUEENS ROAD,PORTSMOUTH,PO2 7NA,1775067323.0,SHL7093779,B_First_time_CH_cavity,Boiler,First time Central Heating,29/06/2022 00:00,86np1EmjmN2,618813,,Electric Storage Heaters,Gas boiler,ECO4,"{""Measure_Group"": ""Boiler"", ""Pre_Main_Heating_Source_for_the_Property"": ""Electric Storage Heaters"", ""Measure_Group2"": ""First time Central Heating"", ""Overall_Obligation_Period"": ""ECO3i"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""SHL7093779"", ""TrustmarkLodgedCertificateID"": ""618813"", ""Post_Main_Heating_Source_for_the_Property"": ""Gas boiler"", ""TrustmarkUniqueMeasureReference"": ""86np1EmjmN2""}"
ECO3 SA,,,GALVESTON,,CARMARTHEN,SA33 4HL,200002000000.0,EON7986855,LI_lessequal100,Loft Insulation,Loft Insulation Ceiling Level Virgin,27/08/2021 00:00,d5nj9AXp6ac,436400,,Oil boiler,,ECO4,"{""Measure_Group"": ""Loft Insulation"", ""Pre_Main_Heating_Source_for_the_Property"": ""Oil boiler"", ""Measure_Group2"": ""Loft Insulation Ceiling Level Virgin"", ""Overall_Obligation_Period"": ""ECO3 SA"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""EON7986855"", ""TrustmarkLodgedCertificateID"": ""436400"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""d5nj9AXp6ac""}"
ECO3 SA,,,16G,ROSEFIELD STREET,Dundee,DD1 5PS,9059049417.0,EON7596330,B_First_time_CH_solid,Boiler,First time Central Heating,23/01/2021 00:00,33NxmxAeMBc,238173,,Electric room heaters*,Gas boiler,ECO4,"{""Measure_Group"": ""Boiler"", ""Pre_Main_Heating_Source_for_the_Property"": ""Electric room heaters*"", ""Measure_Group2"": ""First time Central Heating"", ""Overall_Obligation_Period"": ""ECO3 SA"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""EON7596330"", ""TrustmarkLodgedCertificateID"": ""238173"", ""Post_Main_Heating_Source_for_the_Property"": ""Gas boiler"", ""TrustmarkUniqueMeasureReference"": ""33NxmxAeMBc""}"
ECO3 SA,FLAT 3,8.0,,Clifton Road,Weston-Super-Mare,BS23 1BL,24086439.0,EON7616239,ESH_Upgrades_HHR_solid,Other Heating,Electric Storage Heater upgrade (single measure),20/01/2021 00:00,d5nb2r259nb,247770,,Electric Room Heaters,,ECO4,"{""Measure_Group"": ""Other Heating"", ""Pre_Main_Heating_Source_for_the_Property"": ""Electric Room Heaters"", ""Measure_Group2"": ""Electric Storage Heater upgrade (single measure)"", ""Overall_Obligation_Period"": ""ECO3 SA"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""EON7616239"", ""TrustmarkLodgedCertificateID"": ""247770"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""d5nb2r259nb""}"
ECO3 SA,,14.0,,AVONBANK CRESCENT,HAMILTON,ML3 7PD,484072149.0,BGT7739903,CWI_0.033,Cavity wall insulation,Standard CWI,21/02/2022 00:00,55nYoPjDbB9,566825,,Gas boiler,,ECO4,"{""Measure_Group"": ""Cavity wall insulation"", ""Pre_Main_Heating_Source_for_the_Property"": ""Gas boiler"", ""Measure_Group2"": ""Standard CWI"", ""Overall_Obligation_Period"": ""ECO3 SA"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""BGT7739903"", ""TrustmarkLodgedCertificateID"": ""566825"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""55nYoPjDbB9""}"
ECO4,,,MUIRLANDS,,KIRKBY-IN-FURNESS,LA17 7TT,0.0,EON8843176,LI_greater100,Loft Insulation,Loft Insulation Ceiling Level Top-up,30/06/2024 00:00,P190315W8WM,P190315-1,,,,ECO4,"{""Measure_Group"": ""Loft Insulation"", ""Pre_Main_Heating_Source_for_the_Property"": null, ""Measure_Group2"": ""Loft Insulation Ceiling Level Top-up"", ""Overall_Obligation_Period"": ""ECO4"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""EON8843176"", ""TrustmarkLodgedCertificateID"": ""P190315-1"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""P190315W8WM""}"
ECO3 SA,,9.0,,WILLIAM TURNER COURT,DUMFRIES,DG1 1XP,137053530.0,BGT7732888,CWI_0.033,Cavity wall insulation,Standard CWI,11/02/2022 00:00,0bN0LVdMYNa,562902,,Electric Storage Heaters,,ECO4,"{""Measure_Group"": ""Cavity wall insulation"", ""Pre_Main_Heating_Source_for_the_Property"": ""Electric Storage Heaters"", ""Measure_Group2"": ""Standard CWI"", ""Overall_Obligation_Period"": ""ECO3 SA"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""BGT7732888"", ""TrustmarkLodgedCertificateID"": ""562902"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""0bN0LVdMYNa""}"
ECO3 SA,,16.0,,DELPH HOLLOW WAY,ST. HELENS,WA9 5GP,39083023.0,BGT7742545,CWI_0.033,Cavity wall insulation,Standard CWI,15/02/2022 00:00,9dB7wZdoJB4,565765,,Gas boiler,,ECO4,"{""Measure_Group"": ""Cavity wall insulation"", ""Pre_Main_Heating_Source_for_the_Property"": ""Gas boiler"", ""Measure_Group2"": ""Standard CWI"", ""Overall_Obligation_Period"": ""ECO3 SA"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""BGT7742545"", ""TrustmarkLodgedCertificateID"": ""565765"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""9dB7wZdoJB4""}"
ECO4,,,UPPER INSHULL,,HEREFORD,HR4 8JN,10007371910.0,EON8808213,IWI_solid_1.7_0.3,Solid Wall Insulation,Internal wall insulation: Solid Walls,08/05/2024 00:00,P165825LHL7,P165825-1,,,,ECO4,"{""Measure_Group"": ""Solid Wall Insulation"", ""Pre_Main_Heating_Source_for_the_Property"": null, ""Measure_Group2"": ""Internal wall insulation: Solid Walls"", ""Overall_Obligation_Period"": ""ECO4"", ""MCSInstallationCertificationNumber"": null, ""Measure_Reference_Number"": ""EON8808213"", ""TrustmarkLodgedCertificateID"": ""P165825-1"", ""Post_Main_Heating_Source_for_the_Property"": null, ""TrustmarkUniqueMeasureReference"": ""P165825LHL7""}"
