# Data Science Final Project 


**College/University Name**: _CICCC - Cornerstone International Community College of Canada_  
**Course**: _Final Project_  
**Instructor**: _Derrick Park_  
**Student Name**: _Amir Lima Oliveira_  
**Submission Date**: _2025-09-26_  

---

### Project Title
    _Wildfire Restoration Priority Classification in Canada_
---

#### Objective
    Find, structure and analyse the NASA's datasets with satelite data points about wildfires detection, connect this with satelite images and engineer areas parameters for the detection of which wildfire area needs priority restoration.
### Problem Statement or Research Question
    This project aims to help manage and direct resources with efficiency in the right areas based on the data-driven structure of the machine learning model to the most critical areas. 
---

#### Dataset Overview
- **Source:** [Dataset URL or name]
- **Description:** Short explanation of the dataset (e.g., features, size, context)
- **Credits:** Cite source or dataset author if required

---

## Table of Contents


1. [Import Libraries](#import-libraries)  


In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import rasterio as rio
import fiona
from rasterio.plot import show
import shapely.geometry as geom
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix

---

2. [Load & Inspect Data](#load--inspect-data)  


In [4]:
roads = gpd.read_file('../data_raw/roads/lrnf000r24a_e.shp')

In [5]:
roads.head()

Unnamed: 0,NGD_UID,NAME,TYPE,DIR,CSDUID_L,CSDUID_R,PRUID_L,PRUID_R,CSDNAME_L,CSDTYPE_L,PRNAME_L,CSDNAME_R,CSDTYPE_R,PRNAME_R,AFL_VAL,ATL_VAL,AFR_VAL,ATR_VAL,CLASS,geometry
0,107215,Alec,RD,,5917015,5917015,59,59,Central Saanich,DM,British Columbia / Colombie-Britannique,Central Saanich,DM,British Columbia / Colombie-Britannique,8126.0,8240.0,8173.0,8243.0,23,"LINESTRING (3960597.411 1952862.777, 3960605.6..."
1,3206465,70,HWY,,1001365,1001365,10,10,Victoria,T,Newfoundland and Labrador / Terre-Neuve-et-Lab...,Victoria,T,Newfoundland and Labrador / Terre-Neuve-et-Lab...,123.0,131.0,122.0,126.0,12,"LINESTRING (8935036.86 2147931.914, 8934958.03..."
2,1609984,,,,4708074,4708074,47,47,Snipe Lake No. 259,RM,Saskatchewan,Snipe Lake No. 259,RM,Saskatchewan,,,,,23,"LINESTRING (5032461.06 1824835.834, 5031609.22..."
3,5628186,de la Rivière,RUE,,2446080,2446080,24,24,Cowansville,V,Quebec / Québec,Cowansville,V,Quebec / Québec,717.0,721.0,700.0,728.0,23,"LINESTRING (7698238.814 1230648.543, 7698195.1..."
4,5469054,Henri-Bourassa,BOUL,E,2466023,2466023,24,24,Montréal,V,Quebec / Québec,Montréal,V,Quebec / Québec,,,,,23,"LINESTRING (7625178.291 1255497.869, 7625189.6..."


Needed to make that code below to convert the gdb file into gpkg in order to be possible to make the geographical data into geopandas

   - [Shape](#shape)  

In [6]:
roads.shape


(2260441, 20)

   - [Missing Values](#missing-values)  


In [7]:
roads.isnull().sum()

NGD_UID            0
NAME          299662
TYPE          450816
DIR          2002757
CSDUID_L           2
CSDUID_R          48
PRUID_L            2
PRUID_R           48
CSDNAME_L          2
CSDTYPE_L          2
PRNAME_L           2
CSDNAME_R         48
CSDTYPE_R         48
PRNAME_R          48
AFL_VAL      1016674
ATL_VAL      1017547
AFR_VAL      1017632
ATR_VAL      1017956
CLASS              3
geometry           0
dtype: int64

   - [Data Types](#data-types)  


In [8]:
roads.describe()

Unnamed: 0,NGD_UID,NAME,TYPE,DIR,CSDUID_L,CSDUID_R,PRUID_L,PRUID_R,CSDNAME_L,CSDTYPE_L,PRNAME_L,CSDNAME_R,CSDTYPE_R,PRNAME_R,AFL_VAL,ATL_VAL,AFR_VAL,ATR_VAL,CLASS,geometry
count,2260441,1960779,1809625,257684,2260439,2260393,2260439,2260393,2260439,2260439,2260439,2260393,2260393,2260393,1243767,1242894,1242809,1242485,2260438,2260441
unique,2260441,133471,183,11,4906,4911,13,13,4759,54,13,4768,54,13,54990,57054,54836,56844,17,2260441
top,107215,Main,RD,NW,4806016,4806016,35,35,Calgary,CY,Ontario,Calgary,CY,Ontario,1,15,2,20,23,LINESTRING (3960597.4114285717 1952862.7771428...
freq,1,13012,349247,53715,58188,58155,576880,576875,58188,531449,576880,58155,531598,576875,54867,6629,57422,6730,1607724,1


---

3. [Data Cleaning](#data-cleaning)  

- [Filter Irrelevant Records](#filter-irrelevant-records)  

In [8]:
roads = roads[
    (roads["PRNAME_L"] == "British Columbia / Colombie-Britannique") |
    (roads["PRNAME_R"] == "British Columbia / Colombie-Britannique")
].copy()

---

- [Feature Selection](#feature-selection)  

In [9]:
roads = roads[["NGD_UID", "CLASS", "geometry"]]

- [Encoding Categorical Variables](#encoding-categorical-variables)  

   - [Creating New Features](#creating-new-features)  


- [Feature Transformation (Scaling, Normalization)](#feature-transformation-scaling-normalization)  

---

In [10]:
roads.shape

(279326, 3)

In [11]:
roads.isnull().sum()

NGD_UID     0
CLASS       1
geometry    0
dtype: int64

  
   - [Handling Missing Data](#handling-missing-data)  

In [12]:
roads_BC = roads['CLASS'].fillna(roads['CLASS'].mode()[0])

In [13]:
roads_BC.isnull().sum()

0

In [14]:
# Replace missing CLASS values with the most frequent one (mode)
roads["CLASS"] = roads["CLASS"].fillna(roads["CLASS"].mode()[0])

# Now select the useful columns
roads_BC = roads[["NGD_UID", "CLASS", "geometry"]]

# Reproject to EPSG:3005
roads_BC = roads_BC.to_crs(epsg=3005)

# Save processed dataset
roads_BC.to_file("../data_raw/roads/roads_BC.gpkg", driver="GPKG")

print(roads_BC.head())
print(roads_BC.crs)

    NGD_UID CLASS                                           geometry
0    107215    23  LINESTRING (1187054.303 401928.001, 1187047.09...
21  2910067    23  LINESTRING (1076598.102 519287.903, 1076519.59...
22  4409173    29  LINESTRING (1271948.102 453261.799, 1271949.99...
25  5888173    26  LINESTRING (1286865.199 647664.555, 1286963.45...
29  4507098    21  LINESTRING (1318502.402 1260924.759, 1318487.6...
EPSG:3005


In [15]:
roads_BC = gpd.read_file("../data_raw/roads/roads_BC.gpkg")
roads_BC.head()

Unnamed: 0,NGD_UID,CLASS,geometry
0,107215,23,"LINESTRING (1187054.303 401928.001, 1187047.09..."
1,2910067,23,"LINESTRING (1076598.102 519287.903, 1076519.59..."
2,4409173,29,"LINESTRING (1271948.102 453261.799, 1271949.99..."
3,5888173,26,"LINESTRING (1286865.199 647664.555, 1286963.45..."
4,4507098,21,"LINESTRING (1318502.402 1260924.759, 1318487.6..."


In [16]:
roads_BC.isnull().sum()

NGD_UID     0
CLASS       0
geometry    0
dtype: int64

10. [References](#references)  


https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&t=lrnf000r24a_e.zip&k=%20%20%20289136&loc=/census-recensement/2011/geo/RNF-FRR/files-fichiers/lrnf000r24a_e.zip