# Data Preprocessing for LAEI 2019 Emissions Data

In this notebook, we will preprocess the LAEI 2019 emissions data by extracting relevant information from the **"Emissions by Grid ID"** sheet and filtering for the year 2019. The processed data will be saved to the `interim` folder, leaving the raw data intact.

---

## 1. Importing Necessary Libraries

We need `pandas` for data manipulation and `os` for handling file paths.


In [1]:
# Step 1: Import necessary libraries
import pandas as pd
import os


---

## 2. Defining File Paths and Variables

Define paths for the raw data and the output location for the processed data. We will also specify the sheet name and target year (2019) for filtering.


In [2]:
# Step 2: Define file paths and variables
RAW_FILE_PATH = os.path.join('..', '..', 'data', 'raw', 'LAEI-2019-Emissions-Summary-including-Forecast.xlsx')
INTERIM_FILE_PATH = os.path.join('..', '..', 'data', 'interim', 'LAEI-2019-Emissions-2019-Only.xlsx')
SHEET_NAME = 'Emissions by Grid ID'
YEAR_COLUMN = 'Year'
TARGET_YEAR = 2019


---

## 3. Loading the "Emissions by Grid ID" Sheet

Load the **"Emissions by Grid ID"** sheet from the raw Excel file using `pandas`. Note: This can take a minute or so. 


In [3]:
# Step 3: Load the "Emissions by Grid ID" sheet from the raw Excel file
raw_data = pd.read_excel(RAW_FILE_PATH, sheet_name=SHEET_NAME)
raw_data.head()  # Display the first few rows of the dataset


Unnamed: 0,Year,Grid ID 2019,LAEI 1km2 ID,Easting,Northing,Borough,Zone,Main Source Category,Sector,Source,...,n2o,nh3,nmvoc,nox,pb,pcb,pm10,pm2.5,so2,Emissions Unit
0,2030,1,5910,510500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.019183,0.019183,,tonnes/annum
1,2030,2,5911,511500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.015719,0.015719,,tonnes/annum
2,2030,3,5912,512500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.019878,0.019878,,tonnes/annum
3,2030,4,5915,515500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.020946,0.020946,,tonnes/annum
4,2030,5,5916,516500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.020105,0.020105,,tonnes/annum


---

## 4. Filtering Rows for the Year 2019

We will now filter the dataset to keep only the rows where the year is 2019.


In [4]:
# Step 4: Filter rows to keep only data from 2019
filtered_data = raw_data[raw_data[YEAR_COLUMN] == TARGET_YEAR]
filtered_data.head()  # Display the first few rows of the filtered dataset


Unnamed: 0,Year,Grid ID 2019,LAEI 1km2 ID,Easting,Northing,Borough,Zone,Main Source Category,Sector,Source,...,n2o,nh3,nmvoc,nox,pb,pcb,pm10,pm2.5,so2,Emissions Unit
285264,2019,1,5910,510500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.021923,0.021923,,tonnes/annum
285265,2019,2,5911,511500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.017965,0.017965,,tonnes/annum
285266,2019,3,5912,512500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.022718,0.022718,,tonnes/annum
285267,2019,4,5915,515500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.023939,0.023939,,tonnes/annum
285268,2019,5,5916,516500,203500,Non GLA,Non GLA,Domestic,Biomass,Wood Burning,...,,,,,,,0.022977,0.022977,,tonnes/annum


---

## 5. Saving the Processed Data

The filtered data will be saved as a new Excel file in the `interim` folder.


In [5]:
# Step 5: Save the processed data to the interim folder as a new Excel file
with pd.ExcelWriter(INTERIM_FILE_PATH, engine='xlsxwriter') as writer:
    filtered_data.to_excel(writer, sheet_name=SHEET_NAME, index=False)


---

## 6. Confirming the Process

Finally, display a confirmation message once the processed data is successfully saved.


In [6]:
# Step 6: Confirmation message
print(f"Processed data for the year {TARGET_YEAR} has been saved to {INTERIM_FILE_PATH}")


Processed data for the year 2019 has been saved to ..\..\data\interim\LAEI-2019-Emissions-2019-Only.xlsx
