# Prediction of Wildfires for Optimization of Prescribed Burns in British Columbia
**_Allan Go_**    
02/2022

## Abstract <br>
<div style="text-align: justify"> With increasing impacts of wildfires in British Columbia, machine learning techniques may see increased uses in optimizing prescribed burns. In exploring data science workflows I have demonstrated some capabilities of linear regression, logistic regression, and random forest alrorithms for wildfire prediction. Accuracte predictions of wildfires in populated regions can lead to data informed controlled burns and optimized modelling to reduce risk to human and environmental ecosystems. </div> 

---

## Introduction

### Background:
In recent years British Columbia (BC) has seen record breaking wildfire seasons (Bregolisse, 2018). A vareity of techniques exist to aid in minimization of risk to human life and sensitive ecosystems. Some of these techniques include satellite imaging, fire towers, reactive wildfire fighting, and proactive wildfire fighting techniques such as prescribed burns. Recently, news outlets and indigenous communities have called for increased usage of prescribed burns to manage our forest ecosystems and preemptively reduce the severity of uncontrolled wildfires (Owen 2021). This technique of selectively burning regions of forest has a long history in promoting healthy sustainable ecosystems while reducing uncontrolled wildfire severity. Additionally, increased usage of machine learning  (ML) has led to applications in wildfire fuel source prediction, and prediction of wildfires themselves. By exploring some of these techniques with public data from the BC government, I aim to demonstrate my general data science project workflow while allowing for future modelling of optimal prescribed burn patterns based on predicted BC wildfires. 

### Objective:
I aim to model BC Wildfires based on historical wildfire, wildfire source initiation, and forestry activity data. This model can then be used to simulate controlled burns and allow for preemptive measures to be taken in high risk areas. This project has a secondary aims to provide my overarching process to the data science work flow (though many steps will be abridged or summarized for conciseness). Initially I'll be defining an objective and performing some cursory background research. Next I will find relevant data sources, retrieve and clean them, and perform exploratory data analysis. Next, I will test a few in depth analysis and machine learning models for the prediction of wildfires. This step also has the potential for hypothesis testing the effects of Controlled burns on the severity and number of high risk predicted wildfires. Then I will summarize and provide some visualizations of the data with my conclusion and further notes on future work or iterations.

### Limits and Assumptions:
Throughout this project I will attempt to explain any simplifications I have made with respect to the quality of the data, selection of machine learning algorithm, or other aspects of the data science workflow. For example, it would be ideal to try a large variety of ML models, however I have selected three methods based on their general applications and it is likely more optimal models exist. Additionally, I will primarily focus on the prediction of wildfires with minimal exploration into optimizing controlled burn locations and size. In my exploration, after predicting upcoming wildfire perimeters, I will simply select the largest predicted wildfires and assume a controlled burn a percentage of the size of the wildfire to compare effects of prescribed burns to the unaltered model. Wildfire prescribed burns have many factors that are outside the scope of my analysis such as optimal timing, natural weather cycles, and provincial or regional resources available (Chiodi, 2018). 

### Resources and Existing Work:
1. https://storymaps.arcgis.com/stories/bf1223e6a1564ee3933aa0b3641493c1 
    - Machine learning and it's potential for replacing manual wildfire fuel type identification methods. Provides insight into the BC wildfire fuel data source and outlines machine learning predictions for identifying BC fuel sources that may pose higher risk of wildfire.  
    
    
2. https://www2.gov.bc.ca/assets/gov/farming-natural-resources-and-industry/forestry/wildfire-management/fire-fuel-management/bcws_bc_provincial_fuel_type_layer_overview_2015_report.pdf 
    - Description of the wildfire fuel data layer resource.
    
    
3. https://cdnsciencepub.com/doi/full/10.1139/er-2020-0019 
    - Review of machine learning applications in wildfire science and management.
    
    
4. https://www.analyticsvidhya.com/blog/2021/10/forest-fire-prediction-using-machine-learning/ 
    - ML (Random forest regressor) forest fire prediction in North Eastern Australia. One method outlining a ML example for Austrailia will be used for reference for a random forest algorithm. 

---

## Methods:

### Data Sources:
1. https://www2.gov.bc.ca/gov/content/safety/wildfire-status/about-bcws/wildfire-statistics
- BC Wildfire PSTA Lightning Fire Density
- BC Wildfire PSTA Human Fire Density
- Fire Perimeters - Historical
- Fire Perimeters - Current
2. https://cwfis.cfs.nrcan.gc.ca/ha/nfdb?type=poly&year=9999 (Canadian National Fire Database CNFDB)

### Workflow:
![ProcessFlow.png](attachment:ProcessFlow.png)

## The Implementation

### Setup:
1. Importing Libraries
2. Defining Functions for Plotting and Asthetics

In [5]:
# Import Libraries
import pandas as pd
import numpy as np
import datetime as dt

from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
from sklearn.ensemble import RandomForestRegressor

import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
import seaborn as sns

# Configure visualisations
%matplotlib inline
mpl.style.use( 'ggplot' )
sns.set_style( 'white' )
pylab.rcParams[ 'figure.figsize' ] = 8,6

In [7]:
# Functions
def plot_histograms(df, variables, n_rows, n_cols):
    fig = plt.figure(figsize = (16,12))
    
    for i, var_name in enumerate(variables):
        ax = fig.add_subplot(n_rows, n_cols, i+1)
        df[var_name].hist(bins = 10, ax = ax)
        ax.set_title('Skew: ' + str(round(float(df[var_name].skew()),))) # + ' ' + var_name ) #var_name+" Distribution")
        ax.set_xticklabels([], visible=False)
        ax.set_yticklabels([], visible=False)
        
    fig.tight_layout()
    plt.show()

def plot_distribution(df, var, target, **kwargs):
    row = kwargs.get('row', None)
    col = kwargs.get('col', None)
    facet = sns.FacetGrid(df, hue = target, aspect = 4, row = row, col = col)
    facet.map(sns.kdeplot, var, shade = True)
    facet.set(xlim=(0, df[var].max()))
    facet.add_legend()

def plot_categories(df, cat, target, **kwargs):
    row = kwargs.get('row', None)
    col = kwargs.get('col', None)
    facet = sns.FacetGrid(df, row = row , col = col)
    facet.map(sns.barplot, cat, target)
    facet.add_legend()

def plot_correlation_map(df):
    corr = titanic.corr()
    _ , ax = plt.subplots(figsize =(12,10))
    cmap = sns.diverging_palette(220, 10, as_cmap = True)
    _ = sns.heatmap(
        corr, 
        cmap = cmap,
        square = True, 
        cbar_kws = {'shrink' : .9}, 
        ax = ax, 
        annot = True, 
        annot_kws = {'fontsize' : 12 }
    )

def describe_more(df):
    var = [] ; l = [] ; t = []
    
    for x in df:
        var.append(x)
        l.append(len( pd.value_counts(df[x])))
        t.append(df[x].dtypes )
    levels = pd.DataFrame( {'Variable' : var, 'Levels' : l, 'Datatype' : t})
    levels.sort_values(by = 'Levels', inplace = True)
    return levels

def plot_variable_importance(X, y):
    tree = DecisionTreeClassifier(random_state = 99)
    tree.fit(X, y)
    plot_model_var_imp(tree, X, y)
    
def plot_model_var_imp(model, X, y):
    imp = pd.DataFrame( 
        model.feature_importances_, 
        columns = ['Importance'], 
        index = X.columns 
    )
    imp = imp.sort_values(['Importance'], ascending = True )
    imp[:10].plot(kind = 'barh')
    print (model.score(X, y))

### Data Cleaning
1. Data was previously downloaded from data source 1. (https://www2.gov.bc.ca/gov/content/safety/wildfire-status/about-bcws/wildfire-statistics)
2. Importing data where I have uploaded it to GitHub
3. Some initial exploration of the data format and variables
4. Cleaning of any irregularities such as empty values

In [23]:
# Import Data
forest = pd.read_csv('HistFirePreimeters.csv')
print(forest.shape)
print(forest.columns)
forest.head(3)

(22479, 18)
Index(['FIRE_NO', 'VERSION_NO', 'FIRE_YEAR', 'FIRE_CAUSE', 'FIRELABEL',
       'SIZE_HA', 'SOURCE', 'TRACK_DATE', 'LOAD_DATE', 'FIRE_DATE', 'METHOD',
       'FCODE', 'SHAPE', 'OBJECTID', 'AREA_SQM', 'FEAT_LEN', 'X', 'Y'],
      dtype='object')


Unnamed: 0,FIRE_NO,VERSION_NO,FIRE_YEAR,FIRE_CAUSE,FIRELABEL,SIZE_HA,SOURCE,TRACK_DATE,LOAD_DATE,FIRE_DATE,METHOD,FCODE,SHAPE,OBJECTID,AREA_SQM,FEAT_LEN,X,Y
0,114,,1919,Person,1919-114,718.7,linens,,20070520000000.0,19190923.0,digitised,JA70003000,,1960546,7187147.0,12383.3801,,
1,118,,1919,Person,1919-118,71.7,linens,,20070520000000.0,19190715.0,assumed_shape,JA70003000,,1960547,717456.0,3549.4426,,
2,119,,1919,Person,1919-119,162.7,linens,,20070520000000.0,19191007.0,digitised,JA70003000,,1960548,1627569.0,6136.542,,


In [24]:
forest.isnull().sum()

FIRE_NO           0
VERSION_NO    17939
FIRE_YEAR         0
FIRE_CAUSE        0
FIRELABEL         0
SIZE_HA           1
SOURCE           78
TRACK_DATE    17759
LOAD_DATE        78
FIRE_DATE         2
METHOD            0
FCODE             0
SHAPE         22479
OBJECTID          0
AREA_SQM          0
FEAT_LEN          0
X             22479
Y             22479
dtype: int64

In [25]:
forest = forest.drop(['X', 'Y', 'SHAPE'], axis=1)

In [26]:
forest.describe()

Unnamed: 0,VERSION_NO,FIRE_YEAR,SIZE_HA,TRACK_DATE,LOAD_DATE,FIRE_DATE,OBJECTID,AREA_SQM,FEAT_LEN
count,4540.0,22479.0,22478.0,4720.0,22401.0,22477.0,22479.0,22479.0,22479.0
mean,2013342000.0,1963.073847,687.076804,9932889000000.0,19895890000000.0,101572900000.0,1971764.0,6869421.0,7712.008243
std,3872608.0,33.557405,5828.203621,10052510000000.0,1931089000000.0,1428694000000.0,6489.31,58282780.0,18663.558041
min,2005053000.0,1917.0,0.0,20030700.0,20011010.0,19170720.0,1960285.0,0.004,0.315
25%,2009101000.0,1932.0,11.9,20170530.0,20070520000000.0,19320930.0,1966144.0,119581.8,1493.9746
50%,2013562000.0,1957.0,47.75,20200820.0,20070520000000.0,19600710.0,1971764.0,477406.4,3175.5013
75%,2017081000.0,1995.0,191.5,20091110000000.0,20070520000000.0,20040620.0,1977384.0,1913518.0,6856.7388
max,2021033000.0,2020.0,520885.2,20190820000000.0,20210320000000.0,20210330000000.0,1983003.0,5208852000.0,832168.6403


## Results and Presentation

## Discussion and Conclusion

### References

GovBC, (2010). Wildland fire excellence management strategy. (n.d.). Retrieved February 23, 2022, from https://www2.gov.bc.ca/assets/gov/public-safety-and-emergency-services/wildfire-status/governance/bcws_wildland_fire_mngmt_strategy.pdf 

Chiodi A. M., Larkin N. S., Varner J. Morgan (2018) An analysis of Southeastern US prescribed burn weather windows: seasonal variability and El Niño associations. International Journal of Wildland Fire 27, 176-189.

Shah, S. B., Grübler, T., Krempel, L., Ernst, S., Mauracher, F., & Contractor, S. (2019). REAL-TIME WILDFIRE DETECTION FROM SPACE – A TRADE-OFF BETWEEN SENSOR QUALITY, PHYSICAL LIMITATIONS AND PAYLOAD SIZE. Gottingen: Copernicus GmbH. doi:http://dx.doi.org/10.5194/isprs-archives-XLII-2-W16-209-2019

Brotons, L., Aquilué, N., de Cáceres, M., Fortin, M., & Fall, A. (2013). How fire history, fire suppression practices and climate change affect wildfire regimes in mediterranean landscapes. PLoS One, 8(5) doi:http://dx.doi.org/10.1371/journal.pone.0062392

Owen, B. (2021, July 18). Fire experts prescribe indigenous cultural burns to reduce wildfire risk in B.C. CTVNews. Retrieved February 23, 2022, from https://www.ctvnews.ca/canada/fire-experts-prescribe-indigenous-cultural-burns-to-reduce-wildfire-risk-in-b-c-1.5513720 

Bregolisse, D. M. (2018, October 17). Fighting fire with fire: Forestry experts call for more controlled burning in B.C. Global News. Retrieved February 23, 2022, from https://globalnews.ca/news/4562522/forestry-experts-call-for-more-controlled-burning-in-bc-to-reduce-risk-of-wildfire/ 

Labs, I. (2019, February 16). Top 10 machine learning algorithms and its use cases. Medium. Retrieved February 23, 2022, from https://medium.com/@imaginorlabs/top-10-machine-learning-algorithms-and-its-use-cases-fc303daa2003 

### Image Citations