# Group Project Report: 
# Evaluation of K-Nearest Neighbours Classification Prediction on Algerian Forest Fire based on Fine Fuel Moisture Code (FFMC) and Drought Code (DC)
Group members: Cassie Zhong, Khoi Nguyen, Helen He, Donna Li

## Introduction

### Background information: 
While forest fires can be incredibly destructive to the environment, they are also a natural part of forest life cycles. To mitigate risks and better prepare local areas, being able to predict if a forest fire will occur at a given time would be very beneficial. 

Researchers have collected data on forest fires in two regions of Algeria, Bejaia and Sidi Bel-Abbes, from June 2012 to September 2012 (https://archive.ics.uci.edu/ml/datasets/Algerian+Forest+Fires+Dataset++). This dataset includes 244 total observations and 10 variables: 
- Temperature, in Celsius degrees
- Relative humidity, in %
- Wind speed, in km/h
- Rain, in mm
- Fine Fuel Moisture Code (FFMC)
- Duff Moisture Code (DMC)
- Drought Code (DC)
- Initial Spread Index (ISI)
- Buildup Index (BUI)
- Fire Weather Index (FWI)

Six of these variables (FFMC, DMC, DC, ISI, BUI and FWI) are components of the Forest Fire Weather Index System (https://cwfis.cfs.nrcan.gc.ca/background/summary/fwi), indicating relative potential for wildfires, and are calculated from the remaining four variables (temperature, relative humidity, wind speed and rain).

<img src='https://cwfis.cfs.nrcan.gc.ca/images/fwi_structure.gif' width='400'>

Source: https://cwfis.cfs.nrcan.gc.ca/images/fwi_structure.gif

Using this dataset, we will attempt to perform K-nearest neighbours classification to predict from a new set of measurements if a forest fire will occur, and we will evaluate the accuracy of such predictions.

### Question: 
How accurate is the prediction of whether a fire will occur based on FFMC and DC?

In [5]:
#load these libraries first
install.packages("GGally")
library(repr)
library(tidyverse)
library(tidymodels)
library(dplyr)
library(tidyr)
library(GGally)
options(repr.matrix.max.rows = 6)
set.seed(9999)

In [6]:
###
#   Step 1: Reading the dataset into R
###
dataset_url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00547/Algerian_forest_fires_dataset_UPDATE.csv"

# Read dataset of Bajaia region and change Classes column type to fct
bajaia_data_all <- read_csv(dataset_url, skip = 1, n_max = 122) %>%
    mutate(Classes = as.factor(Classes))

# Read first 43 rows of dataset of Sidi-Bel Abbes region and change Classes column type to fct
sidi_data_1 <- read_csv(dataset_url, skip = 126, n_max = 43) %>%
    mutate(Classes = as.factor(Classes))

# Read remaining rows of dataset of Sidi-Bel Abbes region and change Classes column type to fct
sidi_data_2_colnames <- colnames(sidi_data_1)
sidi_data_2 <- read_csv(dataset_url, skip = 171, col_names = sidi_data_2_colnames) %>%
    mutate(Classes = as.factor(Classes))

# Concatenate the Sidi-Bel Abbes data frames
sidi_data_all <- rbind(sidi_data_1, sidi_data_2)

# Display raw datasets
bajaia_data_all %>%
    head(6)
sidi_data_all %>%
    head(6)

Parsed with column specification:
cols(
  day = [31mcol_character()[39m,
  month = [31mcol_character()[39m,
  year = [32mcol_double()[39m,
  Temperature = [32mcol_double()[39m,
  RH = [32mcol_double()[39m,
  Ws = [32mcol_double()[39m,
  Rain = [32mcol_double()[39m,
  FFMC = [32mcol_double()[39m,
  DMC = [32mcol_double()[39m,
  DC = [32mcol_double()[39m,
  ISI = [32mcol_double()[39m,
  BUI = [32mcol_double()[39m,
  FWI = [32mcol_double()[39m,
  Classes = [31mcol_character()[39m
)

Parsed with column specification:
cols(
  day = [31mcol_character()[39m,
  month = [31mcol_character()[39m,
  year = [32mcol_double()[39m,
  Temperature = [32mcol_double()[39m,
  RH = [32mcol_double()[39m,
  Ws = [32mcol_double()[39m,
  Rain = [32mcol_double()[39m,
  FFMC = [32mcol_double()[39m,
  DMC = [32mcol_double()[39m,
  DC = [32mcol_double()[39m,
  ISI = [32mcol_double()[39m,
  BUI = [32mcol_double()[39m,
  FWI = [32mcol_double()[39m,
  Classes = [3

day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire
5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire
6,6,2012,31,67,14,0.0,82.6,5.8,22.2,3.1,7.0,2.5,fire


day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,6,2012,32,71,12,0.7,57.1,2.5,8.2,0.6,2.8,0.2,not fire
2,6,2012,30,73,13,4.0,55.7,2.7,7.8,0.6,2.9,0.2,not fire
3,6,2012,29,80,14,2.0,48.7,2.2,7.6,0.3,2.6,0.1,not fire
4,6,2012,30,64,14,0.0,79.4,5.2,15.4,2.2,5.6,1.0,not fire
5,6,2012,32,60,14,0.2,77.1,6.0,17.6,1.8,6.5,0.9,not fire
6,6,2012,35,54,11,0.1,83.7,8.4,26.3,3.1,9.3,3.1,fire


In [7]:
###
#   Step 2: Dataset Cleaning & Wrangling
###

# Bind new Region column to each data frame
bajaia_region <- "Bajaia"
sidi_region <- "Sidi-Bel Abbes"
bajaia_data_all["Region"] <- bajaia_region
sidi_data_all["Region"] <- sidi_region


# Combine Bajaia and Sidi-Bel Abbes data frames
fire_data <- rbind(bajaia_data_all, sidi_data_all)

# Split data into training and testing sets
fire_split <- initial_split(fire_data, prop = 0.75, strata = Classes)
fire_train <- training(fire_split)
fire_test <- testing(fire_split)

# Display datasets
# fire_data
fire_train
fire_test

day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes,Region
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<chr>
01,06,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire,Bajaia
03,06,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire,Bajaia
04,06,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire,Bajaia
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
27,09,2012,28,87,15,4.4,41.1,6.5,8.0,0.1,6.2,0.0,not fire,Sidi-Bel Abbes
28,09,2012,27,87,29,0.5,45.9,3.5,7.9,0.4,3.4,0.2,not fire,Sidi-Bel Abbes
30,09,2012,24,64,15,0.2,67.3,3.8,16.5,1.2,4.8,0.5,not fire,Sidi-Bel Abbes


day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes,Region
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>,<chr>
02,06,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire,Bajaia
06,06,2012,31,67,14,0.0,82.6,5.8,22.2,3.1,7.0,2.5,fire,Bajaia
09,06,2012,25,88,13,0.2,52.9,7.9,38.8,0.4,10.5,0.3,not fire,Bajaia
⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮,⋮
17,09,2012,34,44,12,0.0,92.5,25.2,63.3,11.2,26.2,17.5,fire,Sidi-Bel Abbes
24,09,2012,26,49,6,2.0,61.3,11.9,28.1,0.6,11.9,0.4,not fire,Sidi-Bel Abbes
29,09,2012,24,54,18,0.1,79.7,4.3,15.2,1.7,5.1,0.7,not fire,Sidi-Bel Abbes
