# LFB Incidents Record - Data Loading & Initial Preparation

In this file, we will prepare our fire data by selecting and filtering the relevant columns and attributes.

We will export a CSV file which will contain fire incidents that occured in Residential dwellings only.

In [1]:
# importing libraries
import os
import pandas as pd

In [2]:
# setting the working directory - you need to change this to your own directory
#import os
os.chdir('/home/jovyan/work/quant_methods/Assessment/Written Investigation/London Dwelling Fires Analysis')
print(os.getcwd())

/home/jovyan/work/quant_methods/Assessment/Written Investigation/London Dwelling Fires Analysis


Download the London Brigade Incidents Record data from the following link [here](https://data.london.gov.uk/dataset/london-fire-brigade-incident-records) then save it as a csv. 

In [3]:
#import pandas as pd

incidents = pd.read_csv('LFB_incident_data_2018-Nov_2024.csv', encoding="utf-8")
print(f"Data frame 1 is {incidents.shape[0]:,} x {incidents.shape[1]}")

Data frame 1 is 794,155 x 39


In [4]:
# list of columns
print(incidents.columns.to_list())

['IncidentNumber', 'DateOfCall', 'CalYear', 'TimeOfCall', 'HourOfCall', 'IncidentGroup', 'StopCodeDescription', 'SpecialServiceType', 'PropertyCategory', 'PropertyType', 'AddressQualifier', 'Postcode_full', 'Postcode_district', 'UPRN', 'USRN', 'IncGeo_BoroughCode', 'IncGeo_BoroughName', 'ProperCase', 'IncGeo_WardCode', 'IncGeo_WardName', 'IncGeo_WardNameNew', 'Easting_m', 'Northing_m', 'Easting_rounded', 'Northing_rounded', 'Latitude', 'Longitude', 'FRS', 'IncidentStationGround', 'FirstPumpArriving_AttendanceTime', 'FirstPumpArriving_DeployedFromStation', 'SecondPumpArriving_AttendanceTime', 'SecondPumpArriving_DeployedFromStation', 'NumStationsWithPumpsAttending', 'NumPumpsAttending', 'PumpCount', 'PumpMinutesRounded', 'Notional Cost (£)', 'NumCalls']


In [5]:
# columns subsetting

cols = ['IncidentNumber', 'CalYear', 'HourOfCall', 'IncidentGroup', 'StopCodeDescription', 'PropertyCategory',  
        'Postcode_district', 'IncGeo_BoroughCode', 'ProperCase', 'IncGeo_WardCode', 'IncGeo_WardNameNew', 
        'Easting_m', 'Northing_m', 'Easting_rounded', 'Northing_rounded', 'Latitude', 'Longitude', 
        'IncidentStationGround', 'FirstPumpArriving_AttendanceTime', 'FirstPumpArriving_DeployedFromStation', 
        'SecondPumpArriving_AttendanceTime', 'SecondPumpArriving_DeployedFromStation', 'NumStationsWithPumpsAttending', 
        'NumPumpsAttending', 'PumpCount', 'PumpMinutesRounded', 'Notional Cost (£)', 'NumCalls']

In [6]:
incidents = pd.read_csv('LFB_incident_data_2018-Nov_2024.csv', encoding="utf-8",
                low_memory=False, usecols=cols)
print(f"Data frame 1 is {incidents.shape[0]:,} x {incidents.shape[1]}")

Data frame 1 is 794,155 x 28


In [7]:
incidents.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 794155 entries, 0 to 794154
Data columns (total 28 columns):
 #   Column                                  Non-Null Count   Dtype  
---  ------                                  --------------   -----  
 0   IncidentNumber                          794155 non-null  object 
 1   CalYear                                 794155 non-null  int64  
 2   HourOfCall                              794155 non-null  int64  
 3   IncidentGroup                           794149 non-null  object 
 4   StopCodeDescription                     794155 non-null  object 
 5   PropertyCategory                        794149 non-null  object 
 6   Postcode_district                       794155 non-null  object 
 7   IncGeo_BoroughCode                      794155 non-null  object 
 8   ProperCase                              794155 non-null  object 
 9   IncGeo_WardCode                         793613 non-null  object 
 10  IncGeo_WardNameNew                      7936

Our data contains False alarms, actual fires and emergency services. We will filter out the False alarms and emergency services and keep only the actual fires.

In [8]:
fires = incidents[incidents['IncidentGroup'] == 'Fire'] # subset of incidents with 'Fire' in 'IncidentGroup'
print(f"Fires data frame is {fires.shape[0]:,} x {fires.shape[1]}")

Fires data frame is 120,404 x 28


We remain with 120,404 fire incidents. Our next task is to only keep the fire incidents that occured in Residential dwellings.

In [9]:
res_fires = fires[fires['PropertyCategory'].isin(['Dwelling', 'Other Residential'])]
print(f"Filtered fires data frame is {res_fires.shape[0]:,} x {res_fires.shape[1]}")

Filtered fires data frame is 36,572 x 28


In [10]:
res_fires.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Index: 36572 entries, 19 to 794122
Data columns (total 28 columns):
 #   Column                                  Non-Null Count  Dtype  
---  ------                                  --------------  -----  
 0   IncidentNumber                          36572 non-null  object 
 1   CalYear                                 36572 non-null  int64  
 2   HourOfCall                              36572 non-null  int64  
 3   IncidentGroup                           36572 non-null  object 
 4   StopCodeDescription                     36572 non-null  object 
 5   PropertyCategory                        36572 non-null  object 
 6   Postcode_district                       36572 non-null  object 
 7   IncGeo_BoroughCode                      36572 non-null  object 
 8   ProperCase                              36572 non-null  object 
 9   IncGeo_WardCode                         36546 non-null  object 
 10  IncGeo_WardNameNew                      36546 non-null  objec

We now remain with 36,572 fires that occured in Residential dwellings. In case we need to find estimated locations of the fire incidents, we can use the `Easting_rounded` and `Northing_rounded` columns.

Our analysis will be based on the ward level and the data set already contains the Ward information in `IncGeo_WardCode` and `IncGeo_WardName` columns. A closer look indicates that 26 incidents do not have a Ward code. We will explore further to see whether to keep or remove these incidents.

We will further subset our columns to keep only the relevant columns for our analysis below.

In [11]:
res_cols = ['IncidentNumber', 'CalYear', 'PropertyCategory', 'IncGeo_WardCode', 'IncGeo_WardNameNew', 'Easting_rounded', 'Northing_rounded', 'IncidentStationGround']

In [12]:
# selecting columns, just those listed in res_cols
res_fires = res_fires[res_cols]
print(f"Filtered fires data frame is {res_fires.shape[0]:,} x {res_fires.shape[1]}")

Filtered fires data frame is 36,572 x 8


In [13]:
res_fires.info(verbose=True)

<class 'pandas.core.frame.DataFrame'>
Index: 36572 entries, 19 to 794122
Data columns (total 8 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   IncidentNumber         36572 non-null  object
 1   CalYear                36572 non-null  int64 
 2   PropertyCategory       36572 non-null  object
 3   IncGeo_WardCode        36546 non-null  object
 4   IncGeo_WardNameNew     36546 non-null  object
 5   Easting_rounded        36572 non-null  int64 
 6   Northing_rounded       36572 non-null  int64 
 7   IncidentStationGround  36572 non-null  object
dtypes: int64(3), object(5)
memory usage: 2.5+ MB


All looks good. Let us now export the data to a CSV file for further analysis.

In [14]:
res_fires.to_csv('residential_fires2.csv', index=False)

External analysis was done in trying to identify the missing Ward codes. We found that the 26 incidents occured in the Borough of Newham. We did a spatial join and found that all the incidents were being assigned to just 2 boroughs of Little Ilford and Manor Park. This was however in conflict with the fact that when you look up the home fire station for these incidents, they were assigned to 4 different fire stations. We decided to remove these incidents from our data set.

The difference can be attributed to the fact that our eastings and northings are rounded to the nearest 50m. The shapefile used had generalised the ward boundaries by 20m. This could have caused the spatial join to assign the incidents to the wrong wards.

We will drop these 26 incidents in our next analysis.

As a bonous, we will export the data to a geopackage file for further spatial analysis in case we need it.

In [15]:
import geopandas as gpd
from shapely.geometry import Point

# Create a GeoDataFrame
geometry = [Point(xy) for xy in zip(res_fires['Easting_rounded'], res_fires['Northing_rounded'])]
gdf = gpd.GeoDataFrame(res_fires, geometry=geometry)

# Set the coordinate reference system (CRS) to EPSG:27700
gdf.set_crs(epsg=27700, inplace=True)

# Export to GPKG file
gdf.to_file('residential_fires2.gpkg', layer='residential_fires', driver='GPKG')