# Data Science Final Project 


**College/University Name**: _CICCC - Cornerstone International Community College of Canada_  
**Course**: _Final Project_  
**Instructor**: _Derrick Park_  
**Student Name**: _Amir Lima Oliveira_  
**Submission Date**: _2025-09-26_  

---

### Project Title
    _Wildfire Restoration Priority Classification in Canada_
---

#### Objective
    Find, structure and analyse the NASA's datasets with satelite data points about wildfires detection, connect this with satelite images and engineer areas parameters for the detection of which wildfire area needs priority restoration.
### Problem Statement or Research Question
    This project aims to help manage and direct resources with efficiency in the right areas based on the data-driven structure of the machine learning model to the most critical areas. 
---

#### Dataset Overview
- **Source:** [Dataset URL or name]
- **Description:** Short explanation of the dataset (e.g., features, size, context)
- **Credits:** Cite source or dataset author if required

---

## Table of Contents


1. [Import Libraries](#import-libraries)  


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import rasterio as rio
import fiona
from rasterio.plot import show
import shapely.geometry as geom
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import urllib.request # to download the watershed gdb file

---

2. [Load & Inspect Data](#load--inspect-data)  


In [2]:
burn_severity = rio.open('../data_raw/burn_severity/NBAC_MRB_1972to2024_30m.tif')

In [4]:
burn_severity = "../data_raw/burn_severity/NBAC_MRB_1972to2024_30m.tif"

with rio.open(burn_severity) as src:
    print("CRS:", src.crs)
    print("Bounds:", src.bounds)
    print("Width, Height:", src.width, src.height)
    print("Count (bands):", src.count)


CRS: PROJCS["Canada_Lambert_Conformal_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-95],PARAMETER["standard_parallel_1",49],PARAMETER["standard_parallel_2",77],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
Bounds: BoundingBox(left=-2307210.000000001, bottom=-716416.0226333509, right=2981039.999999999, top=2764453.977366649)
Width, Height: 176275 116029
Count (bands): 1


In [1]:
from shapely.geometry import box
import geopandas as gpd

# Step 1: Get the bounding box of all fires
minx, miny, maxx, maxy = fires.total_bounds
bbox = box(minx, miny, maxx, maxy)
bbox_gdf = gpd.GeoDataFrame({"geometry": [bbox]}, crs=fires.crs)

# Step 2: Split fires into two groups (with and without burnsev/elevation)
fires_with_bs = fires.dropna(subset=["burnsev_mean", "elevation_mean"])
fires_missing_bs = fires[fires["burnsev_mean"].isna() | fires["elevation_mean"].isna()]

# Step 3: Get centroids for nearest-neighbor comparison
fires_with_bs_points = fires_with_bs.copy()
fires_with_bs_points["geometry"] = fires_with_bs_points.centroid

fires_missing_bs_points = fires_missing_bs.copy()
fires_missing_bs_points["geometry"] = fires_missing_bs_points.centroid

# Step 4: Perform spatial nearest-neighbor join
nearest = gpd.sjoin_nearest(
    fires_missing_bs_points,
    fires_with_bs_points[["burnsev_mean", "elevation_mean", "geometry"]],
    how="left",
    distance_col="dist_to_valid"
)

# Step 5: Assign imputed values back
fires.loc[nearest.index, "burnsev_mean"] = fires.loc[nearest.index, "burnsev_mean"].fillna(nearest["burnsev_mean"])
fires.loc[nearest.index, "elevation_mean"] = fires.loc[nearest.index, "elevation_mean"].fillna(nearest["elevation_mean"])


  _init_gdal_data()


Raster clipped successfully to fire polygons bounds.


   - [Shape](#shape)  

In [2]:
burn_severity_epsg = "../data_raw/burn_severity/burn_severity_bc_clipped_fires.tif"

with rio.open(burn_severity_epsg) as src:
    print("CRS:", src.crs)
    print("Bounds:", src.bounds)
    print("Width, Height:", src.width, src.height)
    print("Count (bands):", src.count)

CRS: PROJCS["Canada_Lambert_Conformal_Conic",GEOGCS["NAD83",DATUM["North_American_Datum_1983",SPHEROID["GRS 1980",6378137,298.257222101004,AUTHORITY["EPSG","7019"]],AUTHORITY["EPSG","6269"]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4269"]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",49],PARAMETER["central_meridian",-95],PARAMETER["standard_parallel_1",49],PARAMETER["standard_parallel_2",77],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
Bounds: BoundingBox(left=408929.99999999907, bottom=370243.9773666491, right=1870589.999999999, top=1709053.977366649)
Width, Height: 48722 44627
Count (bands): 1


10. [References](#references)  
