# Data Science Final Project 


**College/University Name**: _CICCC - Cornerstone International Community College of Canada_  
**Course**: _Final Project_  
**Instructor**: _Derrick Park_  
**Student Name**: _Amir Lima Oliveira_  
**Submission Date**: _2025-09-26_  

---

### Project Title
    _Wildfire Restoration Priority Classification in Canada_
---

#### Objective
    Find, structure and analyse the NASA's datasets with satelite data points about wildfires detection, connect this with satelite images and engineer areas parameters for the detection of which wildfire area needs priority restoration.
### Problem Statement or Research Question
    This project aims to help manage and direct resources with efficiency in the right areas based on the data-driven structure of the machine learning model to the most critical areas. 
---

#### Dataset Overview
- **Source:** [Dataset URL or name]
- **Description:** Short explanation of the dataset (e.g., features, size, context)
- **Credits:** Cite source or dataset author if required

---

## Table of Contents


1. [Import Libraries](#import-libraries)  


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import rasterio as rio
import fiona
from rasterio.plot import show
import shapely.geometry as geom
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import urllib.request # to download the watershed gdb file

---

2. [Load & Inspect Data](#load--inspect-data)  


In [2]:
fire_perimeters = gpd.read_file('../data_raw/fire_perimeters/NFDB_poly_20210707.shp')

  _init_gdal_data()


Needed to make that code below to convert the gdb file into gpkg in order to be possible to make the geographical data into geopandas

   - [Shape](#shape)  

In [None]:
fire_perimeters.shape

(59539, 27)

   - [Missing Values](#missing-values)  


In [4]:
fire_perimeters.isnull().sum()

SRC_AGENCY        0
FIRE_ID        6369
FIRENAME      55287
YEAR              0
MONTH             0
DAY               0
REP_DATE      11994
DATE_TYPE     22671
OUT_DATE      48911
DECADE            0
SIZE_HA           0
CALC_HA           0
CAUSE             0
MAP_SOURCE    16765
SOURCE_KEY    53896
MAP_METHOD    26161
WATER_REM     54998
UNBURN_REM    54998
MORE_INFO     45887
POLY_DATE     32350
CFS_REF_ID        0
CFS_NOTE1     47633
CFS_NOTE2     52367
AG_SRCFILE    26429
ACQ_DATE         43
SRC_AGY2          0
geometry          0
dtype: int64

   - [Data Types](#data-types)  


In [5]:
fire_perimeters.describe()

Unnamed: 0,YEAR,MONTH,DAY,REP_DATE,OUT_DATE,SIZE_HA,CALC_HA,POLY_DATE,ACQ_DATE
count,59539.0,59539.0,59539.0,47545,10628,59539.0,59539.0,27189,59496
mean,1961.635986,5.246595,12.523909,1985-04-17 22:02:23.712000,2009-09-08 19:48:56.093000,2340.893525,2338.11,2008-12-31 15:36:35.431000,2012-08-11 15:50:16.216000
min,-9999.0,0.0,0.0,1917-07-21 00:00:00,1899-12-30 00:00:00,0.0,9.95329e-08,1981-05-06 00:00:00,2004-02-16 00:00:00
25%,1960.0,4.0,2.0,1960-07-27 00:00:00,2003-08-26 00:00:00,4.6,4.625503,2007-05-17 00:00:00,2010-11-03 00:00:00
50%,1994.0,6.0,12.0,1998-06-28 00:00:00,2013-07-18 00:00:00,65.0,64.69041,2007-05-17 00:00:00,2011-06-10 00:00:00
75%,2010.0,7.0,21.0,2011-09-29 00:00:00,2017-07-27 00:00:00,540.7,549.7889,2009-09-10 00:00:00,2015-04-22 00:00:00
max,2020.0,12.0,31.0,2020-12-06 00:00:00,2020-10-08 00:00:00,988497.2,987337.9,2020-10-20 00:00:00,2021-05-31 00:00:00
std,508.427746,2.924353,10.071721,,,14817.534794,14895.02,,


   - [Preview Data](#preview-data)


In [6]:
fire_perimeters.head()

Unnamed: 0,SRC_AGENCY,FIRE_ID,FIRENAME,YEAR,MONTH,DAY,REP_DATE,DATE_TYPE,OUT_DATE,DECADE,...,UNBURN_REM,MORE_INFO,POLY_DATE,CFS_REF_ID,CFS_NOTE1,CFS_NOTE2,AG_SRCFILE,ACQ_DATE,SRC_AGY2,geometry
0,BC,2004-C10175,,2004,6,23,2004-06-23,Report date,NaT,2000-2009,...,,,2007-05-17,BC-2004-C10175,,,H_FIRE_PLY,2011-06-10,BC,"POLYGON Z ((-1886926.467 898021.006 0, -188688..."
1,BC,2004-C10176,,2004,6,23,2004-06-23,Report date,NaT,2000-2009,...,,,2007-05-17,BC-2004-C10176,,,H_FIRE_PLY,2011-06-10,BC,"POLYGON Z ((-1880308.251 892344.865 0, -188024..."
2,BC,2004-C50114,,2004,6,20,2004-06-20,Report date,NaT,2000-2009,...,,,2007-05-17,BC-2004-C50114,,,H_FIRE_PLY,2011-06-10,BC,"POLYGON Z ((-1965048.293 820512.199 0, -196508..."
3,BC,2004-C50125,,2004,6,21,2004-06-21,Report date,NaT,2000-2009,...,,,2007-05-17,BC-2004-C50125,,,H_FIRE_PLY,2011-06-10,BC,"POLYGON Z ((-1995073.527 854615.146 0, -199507..."
4,BC,2004-C50149,,2004,6,22,2004-06-22,Report date,NaT,2000-2009,...,,,2007-05-17,BC-2004-C50149,,,H_FIRE_PLY,2011-06-10,BC,"POLYGON Z ((-1988211.829 940418.674 0, -198833..."


In [12]:
fire_perimeters.nunique()

SRC_AGENCY       31
FIRE_ID       42603
FIRENAME       3753
YEAR            105
MONTH            13
DAY              32
REP_DATE      10816
DATE_TYPE        12
OUT_DATE       3019
DECADE           12
SIZE_HA       28348
CALC_HA       59512
CAUSE             6
MAP_SOURCE      179
SOURCE_KEY       56
MAP_METHOD      246
WATER_REM         6
UNBURN_REM        6
MORE_INFO       611
POLY_DATE      2794
CFS_REF_ID    59443
CFS_NOTE1        96
CFS_NOTE2      1520
AG_SRCFILE      206
ACQ_DATE         99
SRC_AGY2         13
geometry      59515
dtype: int64

---

3. [Data Cleaning](#data-cleaning)  

   - [Drop Duplicates](#drop-duplicates)  

In [15]:
fire_perimeters.duplicated().sum()

1

- [Filter Irrelevant Records](#filter-irrelevant-records)  

In [18]:
fire_perimeters["SRC_AGENCY"] = fire_perimeters["SRC_AGENCY"].str.strip().str.upper()
fires_bc = fire_perimeters[fire_perimeters["SRC_AGENCY"] == "BC"].copy()

   - [Handle Inconsistent Values](#handle-inconsistent-values)  

---

  
   - [Handling Missing Data](#handling-missing-data)  

In [19]:
fires_bc.shape

(21250, 27)

In [20]:
fires_bc.isnull().sum()

SRC_AGENCY        0
FIRE_ID           0
FIRENAME      21250
YEAR              0
MONTH             0
DAY               0
REP_DATE          0
DATE_TYPE       974
OUT_DATE      21250
DECADE            0
SIZE_HA           0
CALC_HA           0
CAUSE             0
MAP_SOURCE      189
SOURCE_KEY    20901
MAP_METHOD       45
WATER_REM     21250
UNBURN_REM    21250
MORE_INFO     21239
POLY_DATE      1357
CFS_REF_ID        0
CFS_NOTE1     21226
CFS_NOTE2     16138
AG_SRCFILE     2694
ACQ_DATE          0
SRC_AGY2          0
geometry          0
dtype: int64

In [21]:
fires_bc.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
Index: 21250 entries, 0 to 21249
Data columns (total 27 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   SRC_AGENCY  21250 non-null  object        
 1   FIRE_ID     21250 non-null  object        
 2   FIRENAME    0 non-null      object        
 3   YEAR        21250 non-null  int64         
 4   MONTH       21250 non-null  int64         
 5   DAY         21250 non-null  int64         
 6   REP_DATE    21250 non-null  datetime64[ms]
 7   DATE_TYPE   20276 non-null  object        
 8   OUT_DATE    0 non-null      datetime64[ms]
 9   DECADE      21250 non-null  object        
 10  SIZE_HA     21250 non-null  float64       
 11  CALC_HA     21250 non-null  float64       
 12  CAUSE       21250 non-null  object        
 13  MAP_SOURCE  21061 non-null  object        
 14  SOURCE_KEY  349 non-null    object        
 15  MAP_METHOD  21205 non-null  object        
 16  WATER_REM   0 non-n

- [Feature Selection](#feature-selection)  

In [23]:
columns = [
    "FIRE_ID", 
    "YEAR", "MONTH", "DAY", "REP_DATE",
    "SIZE_HA", "CALC_HA", 
    "CAUSE", 
    "MAP_METHOD", "POLY_DATE", 
    "geometry"
]

fires_bc = fires_bc[columns].copy()

In [24]:
fires_bc.shape

(21250, 11)

In [25]:
fires_bc.head()

Unnamed: 0,FIRE_ID,YEAR,MONTH,DAY,REP_DATE,SIZE_HA,CALC_HA,CAUSE,MAP_METHOD,POLY_DATE,geometry
0,2004-C10175,2004,6,23,2004-06-23,1370.5,1370.507344,L,digitized,2007-05-17,"POLYGON Z ((-1886926.467 898021.006 0, -188688..."
1,2004-C10176,2004,6,23,2004-06-23,520.7,520.796287,L,digitized,2007-05-17,"POLYGON Z ((-1880308.251 892344.865 0, -188024..."
2,2004-C50114,2004,6,20,2004-06-20,268.2,268.290572,L,digitized,2007-05-17,"POLYGON Z ((-1965048.293 820512.199 0, -196508..."
3,2004-C50125,2004,6,21,2004-06-21,20506.4,20506.415129,L,Modified from Protection,2007-05-17,"POLYGON Z ((-1995073.527 854615.146 0, -199507..."
4,2004-C50149,2004,6,22,2004-06-22,2408.5,2408.587142,L,digitized,2007-05-17,"POLYGON Z ((-1988211.829 940418.674 0, -198833..."


   - [Creating New Features](#creating-new-features)  


---

In [26]:
# Reproject to EPSG:3005
fires_bc = fires_bc.to_crs(epsg=3005)

# Save processed dataset
fires_bc.to_file("../data_raw/fire_perimeters/fire_perimeters.gpkg", driver="GPKG")

print(fires_bc.head())
print(fires_bc.crs)

       FIRE_ID  YEAR  MONTH  DAY   REP_DATE  SIZE_HA       CALC_HA CAUSE  \
0  2004-C10175  2004      6   23 2004-06-23   1370.5   1370.507344     L   
1  2004-C10176  2004      6   23 2004-06-23    520.7    520.796287     L   
2  2004-C50114  2004      6   20 2004-06-20    268.2    268.290572     L   
3  2004-C50125  2004      6   21 2004-06-21  20506.4  20506.415129     L   
4  2004-C50149  2004      6   22 2004-06-22   2408.5   2408.587142     L   

                 MAP_METHOD  POLY_DATE  \
0                 digitized 2007-05-17   
1                 digitized 2007-05-17   
2                 digitized 2007-05-17   
3  Modified from Protection 2007-05-17   
4                 digitized 2007-05-17   

                                            geometry  
0  POLYGON Z ((1092870.828 897955.996 0, 1092917....  
1  POLYGON Z ((1101470.06 895987.632 0, 1101545.3...  
2  POLYGON Z ((1059622.287 791392.109 0, 1059646....  
3  POLYGON Z ((1016741.687 807717.542 0, 1016827....  
4  POLYGON Z ((

10. [References](#references)  


https://cwfis.cfs.nrcan.gc.ca/datamart/download/nfdbpoly