In [2]:
#run this cell to install the following libraries!

#install.packages("tidymodels")  #install if necessary
#install.packages("tidyverse")   #install if necessary
#install.packages("repr")        #install if necessary
library(tidyverse)
library(repr)
library(tidymodels)

**Title: Predicting the Probability of Wildfire in Algerian Forests**

**Introduction:** 

While the North-African nation of Algeria is perhaps best known for occupying a significant part of the Sahara Desert, the country is also host to a large section of the Atlas Mountains. The elevations offered by the range provide a refuge from the aridity of the desert below for many species of vegetation and fauna, and consequently these areas contain the country's most productive ecosystems (Kharytonov et al., 2018). This ecological zone also provide the ideal environment for a host of human economic activity - directly through forestry, which makes up 5% of Algeria's gross domestic product, and indirectly as productive agricultural land (Berdikeeva, 2019). As a vegetated zone in a warm alpine environment, wildfires have been always been a part of the ecosystems of Northern Algeria, providing a way for ageing forests to regenerate. However, the rate at which wildfires are occuring in the region has accelerated as global temperatures continue to increase, causing much damage both to the natural ecosystems, and to the livelihoods of the people who inhabit the region (Law, 2019). Subsequently, the Algerian authorities have put significant resources into the process of mitigating the destruction that these fires cause, but there is still a lot of work that needs doing (Meddour-Sahar et al., 2013). 

Therefore, this project seeks to provide assistance to the General Directorate of Forestry (Algeria's national department of forestry), through the creation of a machine learning classification model that can predict if a particular forest stand is likely to be the immediate scene of a fire, through several predictive factors, which will be examined below. More specifically, this project will attempt to predict wildfires in two regions of Algeria: the northeastern region of Bejaja and the northwestern region of Sidi Bel-abbes. The two regions from the dataset will be analyzed together such that data from both regions will be used to train the model. The predictor variables include **insert predictors when picked**. This will then be used to predict our target variable, which will be the fire status of the forest stand. There will be two possible classes: fire or not a fire.

This model will answer the question: 

"Can **insert predicitve variables** be used to predict whether a future wildfire will or will not occure in the Northern forests of Algeria?"

The classification model will be trained using the Algerian Forest Fires Dataset from the UCI Machine Learning Repository. The dataset contains 244 recorded fires that occured in the two regions (Bejaja and Sidi Bel-abbes) between June and September in 2012. For each observed fire, 11 attributes of the forest stand in question were recorded. These attributes were:

- Temperature (Celsius)
- Relative Humidity (%)
- Wind Speed (km/h)
- Rain (mm)
- Fine Fuel Moisture Code 
- Duff Moisture Code
- Drought Code 
- Initial Spread Index
- Build Up Index
- Fire Weather Index
- If a fire was observed at the forest stand

The final column will be used as the target variable for this model. 

**Preliminary exploratory data analysis:** 

In [None]:
###set.seed(1)

In [None]:
forest_fire_original <- read_csv(url("https://archive.ics.uci.edu/ml/machine-learning-databases/00547/Algerian_forest_fires_dataset_UPDATE.csv"), skip = 1)
# forest_fire_original

In [None]:
forest_fire <- forest_fire_original %>%
  select(-day, -month, -year) %>%
  slice(1:122, 125:167, 169:244) %>%
  mutate(Temperature = as.numeric(Temperature)) %>%
  mutate(RH = as.numeric(RH)) %>%
  mutate(Ws = as.numeric(Ws)) %>%
  mutate(Rain = as.numeric(Rain)) %>%
  mutate(FFMC = as.numeric(FFMC)) %>%
  mutate(DMC = as.numeric(DMC)) %>%
  mutate(DC = as.numeric(DC)) %>%
  mutate(ISI = as.numeric(ISI)) %>%
  mutate(BUI = as.numeric(BUI)) %>%
  mutate(FWI = as.numeric(FWI)) %>%
  mutate(Classes = as.factor(Classes))

glimpse(forest_fire)


In [None]:
forest_fire_split <- initial_split(forest_fire, prop = 0.75, strata = Classes)  
forest_fire_train <- training(forest_fire_split)   
forest_fire_test <- testing(forest_fire_split)

In [None]:
#The outputted dataframes of the following cell show the averages of temperature and rain for all the observations, 
#as well as the number of fire/non-fire days there were for the 244 days observed. 

forest_fire_means <- forest_fire_train %>%
    select(Temperature, Rain) %>%
    map_df(mean, na.rm = TRUE)

forest_fire_means

forest_fire_classes <- forest_fire_train %>%
    group_by(Classes) %>%
    summarize(count = n())

forest_fire_classes

In [None]:
#scale the variables
forest_fire_train_scaled <- forest_fire_train %>% 
 mutate(scaled_Temp = scale(Temperature, center = TRUE), 
        scaled_RH = scale(RH, center = TRUE),
        scaled_Ws = scale(Ws, center = TRUE),
        scaled_Rain = scale(Rain, TRUE),
        scaled_FFMC = scale(FFMC, TRUE),
        scaled_DMC = scale(DMC, TRUE),
        scaled_DC = scale(DC, TRUE),
        scaled_ISI = scale(ISI, TRUE),
        scaled_BUI = scale(BUI, TRUE),
        scaled_FWI = scale(FWI, TRUE),
        Fire_Status = Classes) %>%
select(scaled_Temp:Fire_Status)

# head(forest_fire_train_scaled) #we changed it to forest_fire_train because we are only supposed to scale the training data. 

In [None]:
# #split dataset into a training and testing set
# wildfire_split <- initial_split(wildfire_scaled, prop = 0.75, strata = Fire_Status)  
# wildfire_train <- training(wildfire_split)   
# wildfire_test <- testing(wildfire_split)

#head(wildfire_train)
#head(wildfire_test)

In [None]:
#read in data from web
#clean and wrangle data
ggpairs(forest_fire)

In [None]:
# wildfire_means <- wildfire_train %>%
#     map_df(mean)

# wildfire_means

In [None]:
options(repr.plot.width = 12, repr.plot.height = 9)

FFMC_ISI_Plot <- ggplot(forest_fire_train_scaled, aes(x = scaled_Rain, y = scaled_Temp, colour = Fire_Status)) + #training set, not whole dataset
  geom_point() +
  labs(x = "Standardized Rain (mm)", y = " Standardized Temperature", colour = "Fire Status") +
  ggtitle("Standardized temperature at noon (C) vs Standardized rain per day (mm)") +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(text = element_text(size = 15)) +
  scale_color_discrete(labels = c("Fire", "No Fire", "Unknown")) #no unknowns though
  
FFMC_ISI_Plot
#summarize training data
#training data plot

**Methods:** 

The data analysis will be conducted using the scaled_Temp and scaled_Rain (temperature and rain scaled). Since the dataset is a small dataset, the training and testing sets are split 75-25. Then, the recipe will be created using the training set. The number of neighbours used will be found from tuning the initial training set by splitting it in different ways into multiple validation and training sets then cross validating for accuracy.

Our results will be visualized by using the geom_point() function to display the observations in the validation set with respect to temperature at noon and amount of rain per day, coloured by the classification of fire or not fire.

**Expected Outcomes and Project Significance:**

Intuition suggests that higher noon temperatures and lower daily rain  (leading to dryer conditions) would lead to a higher probability of wildfire occurrence.

Establishing a reliable classification model ensures that, by  measuring these predictive variables, more informed decisions can be made on any given day to better deal with or contain potential wildfires when and where they are more likely to occur. Adequate resources may be allocated in advance using weather forecasts, and warnings of potential fires can be announced through media like weather channels in a timely fashion.

Investigating  the intersection of wildfire predictability and climate conditions could lead to intriguing territory with regards to the discussion of climate change; that is, if the occurrence of wildfires can be accurately predicted using the chosen variables, which are natural phenomena, and climate change has a significant effect on said phenomena, an intuitive follow-up question could be asked, “Does anthropogenic climate change increase wildfire frequency?”

**References:**

Berdikeeva, S. (2019). Burning Forests Threaten At-Risk Populations in Morocco and Algeria. Inside Arabia. Retrieved 7 April 2021, from https://insidearabia.com/burning-forests-threaten-at-risk-populations-in-morocco-and-algeria/.

Kharytonov, M., Islem, B., & Maatoug, M. (2018). Vegetation dynamics of Algerian’s steppe ecosystem. Case of the region of Tiaret. Environmental Research, Engineering And Management, 74(1). https://doi.org/10.5755/j01.erem.74.1.20095

Law, J. (2019). Algerian forest reinstated as National Park after turbulent history. BirdLife. Retrieved 7 April 2021, from https://www.birdlife.org/worldwide/news/algerian-forest-reinstated-national-park-after-turbulent-history.

Meddour-Sahar, Ouahiba; González-Cabán, Armando; Meddour, Rachid; Derridj, Arezki. (2013). Wildfire management policies in Algeria: present and future needs. In: González-Cabán, Armando, tech. coord. Proceedings of the fourth international symposium on fire economics, planning, and policy: climate change and wildfires. Gen. Tech. Rep. PSW-GTR-245 (English). Albany, CA: U.S. Department of Agriculture, Forest Service, Pacific Southwest Research Station: 382-395.