In [2]:
import pandas as pd
import geopandas as gpd
from datetime import datetime
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns           

python3.12(74134) MallocStackLogging: can't turn off malloc stack logging because it was not enabled.


Let's look at the column legend below to get an idea of what data we're working with! 

|Name|Description|
|-----|---------|
|YEAR| is the fire year|
|FIREID | a uniquely assigned ID to each fire event over a spatial region and for a specific year. It is the common ID used to link a fire|
|FIRENAME|Agency firename|
|MONTH|Month of fire as provided by individual agencies|
|DAY|Day of fire as provided by individual agencies |
|FIRECAUS|describes the ignition source of the fire recorded by the agency.|
|DATE_TYPE| Type of date. |
|DECADE|decade which fire took place|
|OUT_DATE|Date fire was reported OUT as reported by individual agency|
|REP_DATE|Date associated with fire as reported by individual agency. Refer to DATE_TYPE for more catagories of report dates.|
|SIZE_HA|fire size (hectares) as reported in agency source data|
|CALC_HA| fire size (hectares) as calculated using GIS, using Canada Albers Equal Area projection.|
|CAUSE|Cause of fire as reported by agency DESCRIPTION SOURCE classifications: L-(lightning), H (human), H-PB (prescribed burn), U (unknown)|
|MAP_SOURCE|Source refers to WHAT data source was used to identify and map the fire perimeter.This attribute, together with the Method (which describes HOW the fire perimeter was created), provides insight into data quality and accuracy.*|
|SOURCE_KEY|Source_key may include keywords to provide additional information about the data source.|
|MAP_METHOD|Map_Method refers to process used to create or derive the fire polygon (the HOW).This attribute, together with Source (which describes WHAT data source was used),provides insight into data quality and accuracy.Note this attribute is in the process of being reviewed for national standards.|
|WATER_REM|Indicates if water areas have been removed from the polygon area.*|
|UNBURN_REM|Indicates if unburned islands have been removed from the polygon area.*|
|MORE_INFO|Additional attributes provided by agecny|
|CFS_REF_ID| CFS|
|ACQ_DATE|Date that fire data was aquired from agency|
|CFS_NOTE1|Information noted by CFS|
|AG_SRCFILE|source data file provided by agency|
|CFS_NOTE2|Additional information noted by CFS|
|POLY_DATE|Date fire polygon was captured, created or loaded by agency system|
|SRC_AGY2| General agency that does not go into specific Parks Canada Parks|


*Note this attribute is in the process of being reviewed for national standards.

In [25]:
fire_shp=gpd.read_file('/Users/sarismac/Downloads/NFDB_poly/NFDB_poly_20210707.shp') #found new set that has all the dates needed

In [31]:
list= list(fire_shp['YEAR']).sort()b

In [3]:
fire_shp.crs

<Projected CRS: PROJCS["NAD_1983_Lambert_Conformal_Conic",GEOGCS[" ...>
Name: NAD_1983_Lambert_Conformal_Conic
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: unnamed
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

In [24]:
fire_shp['YEAR']

0        2004
1        2004
2        2004
3        2004
4        2004
         ... 
26870    1994
26871    1994
26872    1994
26873    1992
26874    1992
Name: YEAR, Length: 26875, dtype: int64

Above we looked at the crs. The crs is a coordinate reference system that tells us the type of coordinate reference projections table is using for the fire locations. This is crucial to know as we are goiing to be merging 2 different geographically based tables and their respective projections must match!

In [10]:
fire_shp.columns

Index(['SRC_AGENCY', 'FIRE_ID', 'FIRENAME', 'YEAR', 'MONTH', 'DAY', 'REP_DATE',
       'DATE_TYPE', 'OUT_DATE', 'DECADE', 'SIZE_HA', 'CALC_HA', 'CAUSE',
       'MAP_SOURCE', 'SOURCE_KEY', 'MAP_METHOD', 'WATER_REM', 'UNBURN_REM',
       'MORE_INFO', 'POLY_DATE', 'CFS_REF_ID', 'CFS_NOTE1', 'CFS_NOTE2',
       'AG_SRCFILE', 'ACQ_DATE', 'SRC_AGY2', 'geometry'],
      dtype='object')

In [11]:
fire_shp.isna().sum()

SRC_AGENCY        0
FIRE_ID        6369
FIRENAME      55287
YEAR              0
MONTH             0
DAY               0
REP_DATE      11994
DATE_TYPE     22671
OUT_DATE      48911
DECADE            0
SIZE_HA           0
CALC_HA           0
CAUSE             0
MAP_SOURCE    16765
SOURCE_KEY    53896
MAP_METHOD    26161
WATER_REM     54998
UNBURN_REM    54998
MORE_INFO     45887
POLY_DATE     32350
CFS_REF_ID        0
CFS_NOTE1     47633
CFS_NOTE2     52367
AG_SRCFILE    26429
ACQ_DATE         43
SRC_AGY2          0
geometry          0
dtype: int64

We can see that there are alot of columns with missing data. Fortunately, We'll only be looking at 'YEAR','MONTH','geometry','SRC_AGY2', which all have no missing data. 

In [12]:
fire_shp.duplicated().value_counts()

False    59538
True         1
Name: count, dtype: int64

In [13]:
fire_shp.drop_duplicates(inplace=True)

In [14]:
fire_shp.duplicated().value_counts()

False    59538
Name: count, dtype: int64

In [15]:
fire_shp['MONTH'].value_counts()

MONTH
7     13887
0     11970
6     10018
8      8714
5      8097
4      3156
9      2406
10      812
3       256
11       90
2        50
12       47
1        35
Name: count, dtype: int64

There appears to be about 12,000 rows labelled as month 0. Let's see if we can redeem them by using either the reporting date or the out date. 

In [16]:
fire_shp[fire_shp['MONTH']==0]

Unnamed: 0,SRC_AGENCY,FIRE_ID,FIRENAME,YEAR,MONTH,DAY,REP_DATE,DATE_TYPE,OUT_DATE,DECADE,...,UNBURN_REM,MORE_INFO,POLY_DATE,CFS_REF_ID,CFS_NOTE1,CFS_NOTE2,AG_SRCFILE,ACQ_DATE,SRC_AGY2,geometry
21250,AB,00119A,,1954,0,0,,,,1950-1959,...,,,,AB-1954-00119A,,,WildfirePerimeters1931to2011,2012-05-08,AB,"POLYGON Z ((-1315065.000 925631.188 0.000, -13..."
21251,AB,002002,,1956,0,0,,,,1950-1959,...,,,,AB-1956-002002,,,WildfirePerimeters1931to2011,2012-05-08,AB,"POLYGON Z ((-1310576.375 598147.375 0.000, -13..."
21252,AB,002004,Open Creek Wildfire,1958,0,0,,,,1950-1959,...,,map includes areas of partial burn. Refer to A...,,AB-1958-002004,,,WildfirePerimeters1931to2011,2012-05-08,AB,"POLYGON Z ((-1297601.051 605680.875 0.000, -12..."
21253,AB,003001,,1955,0,0,,,,1950-1959,...,,,,AB-1955-003001,,,WildfirePerimeters1931to2011,2012-05-08,AB,"POLYGON Z ((-1311286.750 641550.626 0.000, -13..."
21254,AB,003001,,1960,0,0,,,,1960-1969,...,,,,AB-1960-003001,,,WildfirePerimeters1931to2011,2012-05-08,AB,"POLYGON Z ((-1293034.149 612751.447 0.000, -12..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
59534,PC-NI,94NI001,,1994,0,0,,,,1990-1999,...,,,,PC-NI-1994-94NI001,included from PC-NA/NI,,,2017-11-29,PC,MULTIPOLYGON Z (((-1614593.075 1938374.232 0.0...
59535,PC-NI,04NI001,,2004,0,0,,,,2000-2009,...,,,,PC-NI-2004-04NI001,included from PC-NA/NI,,,2017-11-29,PC,"POLYGON Z ((-1610555.586 1924687.995 0.000, -1..."
59536,PC-NI,05NI001,,2005,0,0,,,,2000-2009,...,,,,PC-NI-2005-05NI001,included from PC-NA/NI,,,2017-11-29,PC,"POLYGON Z ((-1623563.822 1973289.242 0.000, -1..."
59537,PC-NI,09NI001,,2009,0,0,,,,2000-2009,...,,,,PC-NI-2009-09NI001,included from PC-NA/NI,,,2017-11-29,PC,"POLYGON Z ((-1627684.086 1900898.042 0.000, -1..."


Seems like there's no data for reporting date or out date for those rows where the months are labelled as 0. Let's go ahead and drop those rows.

In [17]:
fire_shp.drop(fire_shp[fire_shp['MONTH']==0].index,inplace=True,axis=0)

In [18]:
fire_shp['MONTH'].value_counts()

MONTH
7     13887
6     10018
8      8714
5      8097
4      3156
9      2406
10      812
3       256
11       90
2        50
12       47
1        35
Name: count, dtype: int64

The rows with the months labelled as 0 have been removed. Let's clean up the table and leave the necessary columns. 
Let's drop the following columns as we will not be needing them for the final project. The only information we need is the geometry and the year and month columns. We will concat this informaiton to the weather data to determine whether there was a fire or not during within the month and what the weather was at the time. 

In [19]:
fire_shp = fire_shp.loc[:,['YEAR','MONTH','geometry','SRC_AGY2']]

In [20]:
fire_shp.isna().sum()

YEAR        0
MONTH       0
geometry    0
SRC_AGY2    0
dtype: int64

Let's now look at only fires from 1990, since that is the scope of our project:

In [21]:
fire_shp = fire_shp[fire_shp['YEAR'] > 1990]

In [22]:
fire_shp.reset_index(drop=True,inplace=True)

In [23]:
fire_shp.shape

(26875, 4)

In [16]:
fire_shp.to_file('Data/Fire_Data/fire_date_geo.shp')