# Data Science Final Project 


**College/University Name**: _CICCC - Cornerstone International Community College of Canada_  
**Course**: _Final Project_  
**Instructor**: _Derrick Park_  
**Student Name**: _Amir Lima Oliveira_  
**Submission Date**: _2025-09-26_  

---

### Project Title
    _Wildfire Restoration Priority Classification in Canada_
---

#### Objective
    Find, structure and analyse the NASA's datasets with satelite data points about wildfires detection, connect this with satelite images and engineer areas parameters for the detection of which wildfire area needs priority restoration.
### Problem Statement or Research Question
    This project aims to help manage and direct resources with efficiency in the right areas based on the data-driven structure of the machine learning model to the most critical areas. 
---

#### Dataset Overview
- **Source:** [Dataset URL or name]
- **Description:** Short explanation of the dataset (e.g., features, size, context)
- **Credits:** Cite source or dataset author if required

---

## Table of Contents


1. [Import Libraries](#import-libraries)  


In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
import rasterio as rio
import fiona
from rasterio.plot import show
import shapely.geometry as geom
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import urllib.request # to download the watershed gdb file

---

2. [Load & Inspect Data](#load--inspect-data)  


In [27]:
community_proximity = gpd.read_file('../data_raw/community_prox/legal_admin/ABMS_LCTNS_point.shp')

   - [Shape](#shape)  

In [28]:
community_proximity.shape


(6444, 20)

   - [Missing Values](#missing-values)  


In [20]:
community_proximity.isnull().sum()

LGL_DMN_ID       0
AA_LOC_TYP       0
AA_LOC_LAB    3639
AA_NAME          0
AA_ABBRVN        0
AA_GRP_NM        0
ADMN_R_TYE       0
LATITUDE         0
LONGITUDE        0
UTM_ZONE         0
UTM_EAST         0
UTM_NORTH        0
CHNG_RQSTG       0
UPDT_TYPE        2
WHEN_UPDTD       0
ACCRCY_CD        0
MP_PNT_NTS    6387
SHAPE         6444
OBJECTID         0
geometry         0
dtype: int64

   - [Data Types](#data-types)  


In [21]:
community_proximity.describe()

Unnamed: 0,LGL_DMN_ID,LATITUDE,LONGITUDE,UTM_ZONE,UTM_EAST,UTM_NORTH,SHAPE,OBJECTID
count,6444.0,6444.0,6444.0,6444.0,6444.0,6444.0,0.0,6444.0
mean,3223.16108,51.902406,-122.875413,10.006518,504035.300435,5751695.0,,1179415.0
std,1861.431603,3.496365,4.630445,0.773185,117023.52786,389294.4,,1861.432
min,1.0,48.310634,-139.0613,7.0,282006.0,5350974.0,,1176193.0
25%,1611.75,49.276077,-124.79739,10.0,409509.0,5459018.0,,1177804.0
50%,3222.5,50.262017,-122.52991,10.0,497476.0,5569254.0,,1179414.0
75%,4833.25,54.009434,-119.97812,11.0,601364.25,5986317.0,,1181025.0
max,6455.0,60.002066,-114.05416,11.0,719233.0,6655305.0,,1182647.0


   - [Preview Data](#preview-data)


In [22]:
community_proximity.head()

Unnamed: 0,LGL_DMN_ID,AA_LOC_TYP,AA_LOC_LAB,AA_NAME,AA_ABBRVN,AA_GRP_NM,ADMN_R_TYE,LATITUDE,LONGITUDE,UTM_ZONE,UTM_EAST,UTM_NORTH,CHNG_RQSTG,UPDT_TYPE,WHEN_UPDTD,ACCRCY_CD,MP_PNT_NTS,SHAPE,OBJECTID,geometry
0,312,GEO,Geographic coordinate,CRD Electoral Area G,CRD - EA G,Cariboo Regional District,ELECT,52.07581,-121.20488,10,623022,5770990,MUNI,E,20240628,UNKNOWN,,,1176504.0,POINT (1327879.911 796066.823)
1,313,GEO,,CRD Electoral Area G,CRD - EA G,Cariboo Regional District,ELECT,51.719358,-121.23423,10,621973,5731301,MUNI,E,20240628,UNKNOWN,,,1176505.0,POINT (1328548.693 756289.155)
2,314,GEO,Point of Commencement,CRD Electoral Area H,CRD - EA H,Cariboo Regional District,ELECT,51.991341,-121.21206,10,622762,5761585,MUNI,E,20240628,UNKNOWN,,,1176506.0,POINT (1328026.001 786638.651)
3,315,GEO,Geographic coordinate,CRD Electoral Area H,CRD - EA H,Cariboo Regional District,ELECT,51.991295,-120.88585,10,645157,5762180,MUNI,E,20240628,UNKNOWN,,,1176507.0,POINT (1350337.775 788196.424)
4,316,GEO,Geographic coordinate,CRD Electoral Area H,CRD - EA H,Cariboo Regional District,ELECT,52.137933,-120.8857,10,644693,5778488,MUNI,E,20240628,UNKNOWN,,,1176508.0,POINT (1349168.327 804502.068)


---

   - [Standardize Text and Formats](#standardize-text-and-formats)  

In [23]:
community_proximity.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 6444 entries, 0 to 6443
Data columns (total 20 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   LGL_DMN_ID  6444 non-null   int64   
 1   AA_LOC_TYP  6444 non-null   object  
 2   AA_LOC_LAB  2805 non-null   object  
 3   AA_NAME     6444 non-null   object  
 4   AA_ABBRVN   6444 non-null   object  
 5   AA_GRP_NM   6444 non-null   object  
 6   ADMN_R_TYE  6444 non-null   object  
 7   LATITUDE    6444 non-null   float64 
 8   LONGITUDE   6444 non-null   float64 
 9   UTM_ZONE    6444 non-null   int32   
 10  UTM_EAST    6444 non-null   int64   
 11  UTM_NORTH   6444 non-null   int64   
 12  CHNG_RQSTG  6444 non-null   object  
 13  UPDT_TYPE   6442 non-null   object  
 14  WHEN_UPDTD  6444 non-null   object  
 15  ACCRCY_CD   6444 non-null   object  
 16  MP_PNT_NTS  57 non-null     object  
 17  SHAPE       0 non-null      float64 
 18  OBJECTID    6444 non-null   float64 
 19

- [Convert Data Types](#convert-data-types)  
   

- [Filter Irrelevant Records](#filter-irrelevant-records)  

---

- [Feature Selection](#feature-selection)  

In [29]:
community_proximity = community_proximity[['AA_NAME', 'geometry','AA_LOC_TYP']]

  
   - [Handling Missing Data](#handling-missing-data)  

In [30]:
community_proximity.isnull().sum()

AA_NAME       0
geometry      0
AA_LOC_TYP    0
dtype: int64

In [31]:
com_prox_epgs = community_proximity.to_crs(epsg=3005)

# Save processed dataset
com_prox_epgs.to_file("../data_raw/community_prox/com_prox_epgs.gpkg", driver="GPKG")

print(com_prox_epgs.head())
print(com_prox_epgs.crs)

                AA_NAME                        geometry AA_LOC_TYP
0  CRD Electoral Area G  POINT (1327879.911 796066.823)        GEO
1  CRD Electoral Area G  POINT (1328548.693 756289.155)        GEO
2  CRD Electoral Area H  POINT (1328026.001 786638.651)        GEO
3  CRD Electoral Area H  POINT (1350337.775 788196.424)        GEO
4  CRD Electoral Area H  POINT (1349168.327 804502.068)        GEO
EPSG:3005


   - [Creating New Features](#creating-new-features)  


Distance to fire perimeter
* risk = population / (1 + distance_km)

---

10. [References](#references)  


https://catalogue.data.gov.bc.ca/dataset/regional-districts-legally-defined-administrative-areas-of-bc

https://catalogue.data.gov.bc.ca/dataset/legally-defined-administrative-areas-of-bc-boundary-locations

https://catalogue.data.gov.bc.ca/dataset/municipalities-legally-defined-administrative-areas-of-bc