# **Predicting Fire Radiative Power Using Random Forest Regressor**



## **Introduction:**

This notebook explores the prediction Fire Radiative Power (FRP) using satellite data of Australian bushfires with a Random Forest Regressor. The goal of this notebook is to predict the intensity of wildfires based on features like brightness values (`bright_ti4` and `bright_ti5`), geographical coordinates, and confidence levels. The analysis covers data preprocessing, feature selection, model training, and evaluation.

## **Importing Required Libraries**

The following libraries are essential for data processing, model training, and visualization:

- **`os`**: Handles file paths and directories.
- **`pandas`**: Used for data manipulation and analysis.
- **`numpy`**: Core library for numerical computing.
- **`matplotlib.pyplot`**: Creates static visualizations, like plots and charts.
- **`seaborn`**: Built on `matplotlib`, provides high-level statistical graphics.
- **`sklearn.ensemble.RandomForestRegressor`**: Random Forest model for handling non-linear relationships.
- **`sklearn.model_selection.train_test_split`**: Splits the dataset into training and testing subsets.
- **`sklearn.metrics`**: Provides performance metrics like MSE and R² score.
- **`geopandas`**: Handles geographic data for plotting maps of Australia.
- **`matplotlib.colors`**: Manages color normalization for visualizations.

In [3]:
# Import required libraries
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import geopandas as gpd
import matplotlib.colors as mcolors

## **Loading and Examining the Dataset:**

In this step, the dataset will be loaded into a Pandas DataFrame and displayed using some simple dataset visaulation techniques

1. Load the dataset using `pd.read_csv()`.
2. Examine the structure of the data with `df.info()` to check for any missing values and to get an overview of the column types.
3. Preview the first few rows using `df.head()` to get an initial look at the data and understand its content.

In [4]:
# Load the dataset (your own filename or path may be different)
dataset_path = "/Users/ciaranbritton/Library/Mobile Documents/com~apple~CloudDocs/Ciaran's Folder/University/Year 4/Advanced Computational Techniques/bushfires.csv"
df = pd.read_csv(dataset_path)

In [5]:
# Display information on the dataset
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105713 entries, 0 to 105712
Data columns (total 14 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   latitude    105713 non-null  float64
 1   longitude   105713 non-null  float64
 2   bright_ti4  105713 non-null  float64
 3   scan        105713 non-null  float64
 4   track       105713 non-null  float64
 5   acq_date    105713 non-null  object 
 6   acq_time    105713 non-null  int64  
 7   satellite   105713 non-null  object 
 8   instrument  105713 non-null  object 
 9   confidence  105713 non-null  object 
 10  version     105713 non-null  int64  
 11  bright_ti5  105713 non-null  float64
 12  frp         105713 non-null  float64
 13  type        105713 non-null  int64  
dtypes: float64(7), int64(3), object(4)
memory usage: 11.3+ MB


In [6]:
# Display the first rows of each column
df.head()

Unnamed: 0,latitude,longitude,bright_ti4,scan,track,acq_date,acq_time,satellite,instrument,confidence,version,bright_ti5,frp,type
0,-37.48861,149.63156,341.1,0.41,0.6,2019-09-01,304,N,VIIRS,n,1,293.7,4.4,0
1,-34.4611,150.88142,328.5,0.33,0.55,2019-09-01,305,N,VIIRS,l,1,298.5,2.1,2
2,-33.94823,151.21292,341.1,0.62,0.54,2019-09-01,305,N,VIIRS,n,1,295.3,6.4,0
3,-34.45618,150.87723,328.5,0.33,0.55,2019-09-01,305,N,VIIRS,n,1,298.1,2.1,2
4,-31.60223,150.15147,367.0,0.34,0.56,2019-09-01,306,N,VIIRS,h,1,302.4,19.3,0


### **Dataset Overview**

After loading the dataset, we can see the following:

- **Satellite, Instrument, and Version Columns**: These columns don't change throughout the dataset as they describe static variables.
- **acq_date**: The acquisition date stays within the same month for all observations
- **bright_ti4 and bright_ti5**: These columns represent the brightness values of the fire in different infrared bands. They provide information about the intensity of the fire, which will be crucial for predicting Fire Radiative Power (FRP).
- **frp**: This is our target continuous variable. FRP represents the intensity of the fire and is the variable we aim to predict using the features from the dataset.
- **type**: This column is meant to display whether the readings were taken during the day or night, however the column only displays the numbers 0, 2, or 3. Either way this variable wouldn't have impacted our target varaible considerably enough to worry about.
- **confidence**: This column represents the confidence level of the measurements, either low, nominal, or high.
- **latitude and longitude**: These columns indicate the geographical coordinates of each fire event.
- **scan and track**: These columns represent the pixel size in the scan and track directions of the satellite’s observation. `Scan` refers to the pixel size along the satellite's movement across the Earth's surface, while `track` refers to the pixel size along the satellite's orbit path.