In [1]:
library(tidyverse)

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.0 ──

[32m✔[39m [34mggplot2[39m 3.3.2     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.0.3     [32m✔[39m [34mdplyr  [39m 1.0.2
[32m✔[39m [34mtidyr  [39m 1.1.2     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 1.3.1     [32m✔[39m [34mforcats[39m 0.5.0

“package ‘ggplot2’ was built under R version 4.0.1”
“package ‘tibble’ was built under R version 4.0.2”
“package ‘tidyr’ was built under R version 4.0.2”
“package ‘dplyr’ was built under R version 4.0.2”
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()



## Preliminary Exploratory Data Analysis:
1) Reading the dataset from the web into R

In [6]:
forest_fire_data_raw <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00547/Algerian_forest_fires_dataset_UPDATE.csv", skip = 1)
head(forest_fire_data_raw) #previewing the first 6 rows of the dataset

Unnamed: 0_level_0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
1,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
2,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
3,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
4,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire
5,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire
6,6,6,2012,31,67,14,0.0,82.6,5.8,22.2,3.1,7.0,2.5,fire


Here is what the column headings mean:
* day/month/year: indicate the day/month/year that the observation was taken, respectively.
* Temperature: Maximum Temperature on that day (Celsius)
* RH: Realtive Humidity (%)
* Ws: Wind Speed (km/h)
* Rain: Total that day (mm)
* FFMC: Fine Fuel Moisture Code Index from FWI system
* DMC: Duff Moisture Code Index from FWI system
* DC: Drought Code Index from FWI system
* ISI: Initial Spread Index from FWI system
* BUI: Buildup index from FWI system
* FWI: Fire Weather Index
* Classes: either "not fire" or "fire"

2) Cleaning and Wrangling the Data into a Tidy Format
* The data is already in a tidy form as all observations are split into individual rows, each column is a single variable and each value is in a single cell.
* However, to make the data more usable, we decided to convert some of the columns from characters to factors (Classes) and numeric values (Temperature:FWI).

In [33]:
forest_fire_data <- collect(forest_fire_data_raw)

forest_fire_data_mutated_classes <- forest_fire_data %>%
    mutate(Classes = as.factor(Classes)) %>%
    mutate(Temperature = as.numeric(Temperature, na.rm = TRUE)) %>%
    mutate(RH = as.numeric(RH, na.rm = TRUE)) %>%
    mutate(Ws = as.numeric(Ws, na.rm = TRUE)) %>%
    mutate(Rain = as.numeric(Rain, na.rm = TRUE)) %>%
    mutate(FFMC = as.numeric(FFMC, na.rm = TRUE)) %>%
    mutate(DMC = as.numeric(DMC, na.rm = TRUE)) %>%
    mutate(DC = as.numeric(DC, na.rm = TRUE)) %>%
    mutate(ISI = as.numeric(ISI, na.rm = TRUE)) %>%
    mutate(BUI = as.numeric(BUI, na.rm = TRUE)) %>%
    mutate(FWI = as.numeric(FWI, na.rm = TRUE))

head(forest_fire_data_mutated_classes)

“Problem with `mutate()` input `Temperature`.
[34mℹ[39m NAs introduced by coercion
[34mℹ[39m Input `Temperature` is `as.numeric(Temperature, na.rm = TRUE)`.”
“NAs introduced by coercion”
“Problem with `mutate()` input `RH`.
[34mℹ[39m NAs introduced by coercion
[34mℹ[39m Input `RH` is `as.numeric(RH, na.rm = TRUE)`.”
“NAs introduced by coercion”
“Problem with `mutate()` input `Ws`.
[34mℹ[39m NAs introduced by coercion
[34mℹ[39m Input `Ws` is `as.numeric(Ws, na.rm = TRUE)`.”
“NAs introduced by coercion”
“Problem with `mutate()` input `Rain`.
[34mℹ[39m NAs introduced by coercion
[34mℹ[39m Input `Rain` is `as.numeric(Rain, na.rm = TRUE)`.”
“NAs introduced by coercion”
“Problem with `mutate()` input `FFMC`.
[34mℹ[39m NAs introduced by coercion
[34mℹ[39m Input `FFMC` is `as.numeric(FFMC, na.rm = TRUE)`.”
“NAs introduced by coercion”
“Problem with `mutate()` input `DMC`.
[34mℹ[39m NAs introduced by coercion
[34mℹ[39m Input `DMC` is `as.numeric(DMC, na.rm = TRUE)`.”
“NA

Unnamed: 0_level_0,day,month,year,Temperature,RH,Ws,Rain,FFMC,DMC,DC,ISI,BUI,FWI,Classes
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<fct>
1,1,6,2012,29,57,18,0.0,65.7,3.4,7.6,1.3,3.4,0.5,not fire
2,2,6,2012,29,61,13,1.3,64.4,4.1,7.6,1.0,3.9,0.4,not fire
3,3,6,2012,26,82,22,13.1,47.1,2.5,7.1,0.3,2.7,0.1,not fire
4,4,6,2012,25,89,13,2.5,28.6,1.3,6.9,0.0,1.7,0.0,not fire
5,5,6,2012,27,77,16,0.0,64.8,3.0,14.2,1.2,3.9,0.5,not fire
6,6,6,2012,31,67,14,0.0,82.6,5.8,22.2,3.1,7.0,2.5,fire
