06-Suitability_mapping_ecological_niche_models.Rmd

# Suitability Mapping: Part 2

## Introduction

### Lecture video (Length: 01:06:42)

```{r, warnings=FALSE, message=FALSE, echo=FALSE}
library(vembedr)
embed_youtube('uAakkG-l-r0', height=400) %>% use_align('center')
```
[[**Watch on YouTube**]](https://youtu.be/uAakkG-l-r0)

### Computer practical video (Length 02:00:01)

```{r, warnings=FALSE, message=FALSE, echo=FALSE}
library(vembedr)
embed_youtube('C_aGyS5Cmos', height=400) %>% use_align('center')
```
[[**Watch on YouTube**]](https://youtu.be/C_aGyS5Cmos)

Last week, we described how knowledge-driven methods informed by expert opinion (often drawn from experience or literature) can be used to predict locations that are suitable (or favourable) for a specific outcome given a set of environmental factors, which in turn, can be outputted as a thematic raster map. Last week's approach uses a mixed (i.e., qualitative and quantitative) methodology through the use of **Overlay Mapping** and **Analytical Hierarchy Process (AHP)** for suitability mapping, and these approaches are used strictly when there are no point occurrence data. 

This week, we are using distributional niche models, often term as either: **Ecological (or Environmental) Niche Models**, or **Environmental or Habitat Suitability Models**. These are quantitative methods that use occurrence data in conjunction with environmental data to make a correlative model of the environmental conditions that meet an outcome's environmental (or ecological) requirements, which in turn, can predict the relative suitability of habitat for that outcome. It has many applications in ecology, epidemiology, disaster risk reduction and social sciences (e.g., crime patterns). Today, we are going to use the **Maximum Entropy Model (MAXENT)** to infer zones for disaster hazards such as wildfires in California given a set of predictor variables (i.e, climate, vegetation, anthropogenic and socioeconomic risk factors which are raster).

### Learning outcomes 

We are going to learn the following steps: 

1. Handling of point locations of occurrence data (i.e., points of fires) and preparing it as testing and training data; 
2. Handling of predictor variables (i.e., raster) and compiling them into a raster stack object, as well as perform extraction of raster values to the points of fire locations;
3. Generating a Niche model using the `maxent()` from points, and use the stacked raster values to fit a model to estimate probability (or trigger points) of a fire hazard;
4. Testing for model validation using ROC and AUC curves, and producing a surface based on threshold to delineate regions for suitability (in context probable trigger points for wildfires).

### Datasets & setting up the work directory 

Before you begin do make sure to download all data by [**clicking here**](https://github.com/UCLPG-MSC-SGDS/GEOG0114/raw/main/Week%205%20-%20Dataset.zip). Create a folder on called "**Week 5**" within your "**GEOG0114**" folder stored in the desktop of your personal computer. Make sure to extract all data from the zip folder and store it into "**Week 5**" folder. Open a new R script and set the work directory to **Week 5's** folder.

For Windows, the work directory will be:
```{r, eval = FALSE}
setwd("C:/Users/AccountName/Desktop/GEOG0114/Week 5")
```

For MAC, the work directory will be:
```{r, eval = FALSE}
setwd("/Users/AccountName/Desktop/GEOG0114/Week 5")
```

### Loading and installing packages

<div class="warning">
IMPORTANT NOTE: We will need to install **Java** for the `rjava` package as well as the `maxent()` function and other functions associated with performing MAXENT to work. You can download the latest version of Java [**[HERE]**](https://www.java.com/en/download/)
</div>

Next, we will need to load the following packages:

- `raster`: Raster/gridded data analysis and manipulation
- `sf`: Simple Features
- `sp`: Package for providing classes for spatial data (points, lines, polygons and grids)
- `spdep`: Access to spatial functions such `spsample()` needed for generating background points
- `tmap`: Thematic Mapping

The above packages `raster` `sf`, `sp`, `spdep` & `tmap` should have been installed in the previous session(s). We will need to install the following new package(s):

- `dismo`: Provides access to methods for niches distribution modelling in RStudio.
- `rJava`: Low-level interface for Java. The `maxent()` function depends on this so it must be installed.

```{r, eval = FALSE}
# Install the packages with install.package():
install.packages("dismo")
install.packages("rJava")

# Load the packages with library():
library("raster")
library("dismo")
library("tmap")
library("sf")
library("rJava")
library("spdep")
```

### Loading datasets

We will be dealing with both point occurrence and raster data for this exercise. The point data are remote-sensed fire detection locations across the State of California during the summer period of 2018. We will be using predictor variables that are climatic (i.e., temperature, precipitation, dryness), environmental (vegetation and elevation) and other social-anthropogenic (socioeconomic deprivation) gridded raster data for California. 

We will combine them into a **MAXENT** model in order to quantify the areas that are potential trigger points for wildfires, and whether these variables greatly influencing the risk of a fire hazard.

Let us begin loading the following list of raster files, each is a variable of interest:

- Raster: Temperature named `California_temperature_summer_2018.tif`
- Raster: Precipitation named `California_precipitation_summer_2018.tif`
- Raster: Dryness named `California_dryness_summer_2018.tif`
- Raster: Vegetation (NDVI) named `California_vegetation_summer_2018.tif`
- Raster: Elevation named `California_elevation_summer_2018.tif`
- Raster: Deprivation Index named `California_socdeprivation_summer_2018.tif`

```{r, eval=FALSE}
# load data raster data
temp <- raster("California_temperature_summer_2018.tif")
prec <- raster("California_precipitation_summer_2018.tif")
dryn <- raster("California_dryness_summer_2018.tif")
ndvi <- raster("California_vegetation_summer_2018.tif")
elev <- raster("California_elevation_summer_2018.tif")
sevi <- raster("California_socdeprivation_summer_2018.tif")
```

Load the boundary and county shapefile for California:

- Shape file: California's boundary border named `.shp`
- Shape file: California's County borders named `.shp`

```{r, eval=FALSE}
# load shapefile data for california
california_border <- read_sf("California_boundary.shp")
california_county <- read_sf("California_county.shp")
```

Load the point occurrence data for California:

```{r, eval=FALSE}
# load occurrence fire data in California
california_fires <- read.csv("California_Fire_Summer_2018.csv")
```

<div class="note">
**IMPORTANT NOTES**: All shape file and raster data (5km resolution) were projected to the CRS: WGS84 4236.
</div>

The occurrence data imported into RStudio is a data frame object. We will need to first convert the occurrence data from data frame to a spatial points object by declaring columns `longitude` and `latitude` corresponds to `x` and `y` respectively and thus coordinates in data frame object `california_fires` 

```{r, eval=FALSE}
# code chunk format coordinates(data.frame) = ~x+y
coordinates(california_fires) = ~longitude+latitude
```

Now, we need to define its coordinate reference system for the points which should be the same as any of the raster data. You can get this information by simple typing the raster object to console and printing its info there:

```{r, eval=FALSE}
# show details of temp as example
temp
```

We need to copy the detail under the `crs` section of out `temp` output and assign that detail to our points so as it also has a CRS of `WGS84 4326`. We can use the `crs()` function to perform this action. Now copy: `+proj=longlat +datum=WGS84 +no_defs` and use the following code:

```{r, eval=FALSE}
crs(california_fires) <- "+proj=longlat +datum=WGS84 +no_defs"
```

Let us visualise the study area to examine the spatial distribution of wildfires.

```{r, eval=FALSE}
tm_shape(california_county) + tm_polygons() + tm_shape(california_fires) + tm_dots(col = "red")
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image1_studyarea.png', error = FALSE) 
```

Let us visualise the six predictor raster variables that will be used for the Niche modelling with MAXENT. Instead of visual each output individually - you can use `tmap_arrange()` to create a figure panel for the six images.

```{r, eval=FALSE}
# map object of temperature stored in m1
m1 <- tm_shape(temp) + tm_raster(style = "cont", title = "Celsius", palette= "Oranges") +
			tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
			tm_layout(frame = FALSE, legend.position = c("right", "top"), title.position = c("left", "bottom"), title = "A")

# map object of precipitation stored in m2
m2 <- tm_shape(prec) + tm_raster(style = "cont", title = "mm", palette= "Blues") +
			tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
			tm_layout(frame = FALSE, legend.position = c("right", "top"), title.position = c("left", "bottom"), title = "B")

# map object of dryness stored in m3
m3 <- tm_shape(dryn) + tm_raster(style = "cont", title = "mm/0.25yr", palette= "-Spectral") +
			tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
			tm_layout(frame = FALSE, legend.position = c("right", "top"), title.position = c("left", "bottom"), title = "C")

# map object of ndvi stored in m4
m4 <- tm_shape(ndvi) + tm_raster(style = "cont", title = "Index", palette= "Greens") +
			tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
			tm_layout(frame = FALSE, legend.position = c("right", "top"), title.position = c("left", "bottom"), title = "D")

# map object of elevation stored in m5
m5 <- tm_shape(elev) + tm_raster(style = "cont", title = "m", midpoint = 1500, palette= "-Spectral") +
			tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
			tm_layout(frame = FALSE, legend.position = c("right", "top"), title.position = c("left", "bottom"), title = "E")

# map object of sevi stored in m6
m6 <- tm_shape(sevi) + tm_raster(style = "cont", title = "%", palette= "Reds") +
			tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
			tm_layout(frame = FALSE, legend.position = c("right", "top"), title.position = c("left", "bottom"), title = "F")

# stitch the maps together using tmap_arrange() function
tmap_arrange(m1, m2, m3, m4, m5, m6, nrow = 2)
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image2_environmental_variables.png', error = FALSE) 
```
A: Temperature (degree Celsius); B: Precipitation (mm); C: Dryness (Evapotranspiration) (mm/0.25 year); D: Vegetation (NDVI); E: Elevation (meters [m]); & F: Socioeconomic vulnerability index (%)

<div class="note">
**IMPORTANT NOTES**: It is good practice to produce figure outputs of the study area of interest inside the methodology section of an essay, report, dissertation, research paper etc.,. This gives you the opportunity to describe the study area that is under investigation and variables in the methods section. 
</div>

## Data preparation for the MAXENT analysis

### Creating a multi-band raster using the stack() function

Basically, a band is represented by a single matrix of cell values, and a raster with multiple bands contains multiple spatially coincident matrices of cell values representing the same spatial area. For example, the raster object for `temp` (i.e., temperature) is a **single band raster object**. But, if we start to stack raster objects `prec`, `dryn`, `ndvi` and so on top of `temp`, one over the other, then we have created a **multi-band raster object**. 

We need to create this **multi-band raster object** to enable the following action needed for the analysis:

1. To perform the extraction of raster values from all 6 variables on to the occurrence points in one go;
2. The entire multi-band raster object is needed for MAXENT estimation and spatial prediction

We use the `stack()` to stack all raster grids into one object and rename the layers within the multi-band object to names that are tenable. 

```{r, eval=FALSE}
envCovariates <- stack(temp, prec, dryn, ndvi, elev, sevi)
names(envCovariates) <- c("Temperature", "Precipitation", "Dryness", "NDVI", "Elevation", "Deprivation")
```

### Preparing data for pseudo-background points as absence

We need to prepare the background data. What is the background data? With Background data we are not attempting to guess point locations where an event is absent. Here, we are rather trying to characterise the environment of the study region. In this sense, background is the same, irrespective of where the point fire are found or not. Background data establishes the environmental domain of the study, whilst presence data should establish under which conditions a fire is more likely to be present than on average.

In essence, we are creating a set of control points which we act as pseudo-absence. These are typically generated at random. There are several ways of performing this action using other functions such as `randomPoints()`, `sampleRandom` and many more. 

We are using the `spsample()` function because it allows the user to specify the boundary for which the background points (i.e., controls) should be randomly generated within. Twice the number of occurrence points are generated (the choice of twice is up to the user). For reproducibility in the random generation, we have set the `set.seed()` function to this `20000430`. 

```{r, eval=FALSE}
# set the seed
set.seed(20000430)
# we need to coerce 'sf' object california_border into 'sp' object for spsample to work
california_border_sp <- as(california_border, Class = "Spatial")
# here, spsample() generates twice number of fire occurrence points randomly within California's border
background_points <- spsample(california_border_sp, n=2*length(california_fires), "random")
```

### Extraction of all raster values from predictor variables onto presence-absence points

Now, we are going to extract information from our `envCovariates` raster stack to both the presence and background points. This can be done using the `extract()` function. 

```{r, eval=FALSE}
# perform raster extraction from the environmental covariates on to points
california_fires_env <- extract(envCovariates, california_fires)
background_points_env <- extract(envCovariates, background_points)
```

<div class="note">
**IMPORTANT NOTES**: After the extraction, the objects `california_fires_env` and `background_points_env` exist as a large matrix, and not as a data frame.
</div>

For all occurrence points (i.e., presence), we need to add an indicator of `1` to signify presence; while for all background points (i.e., absence) - we need to also add an indicator of `0` to signify absence. We do this because we are modelling a probability and such niche models take outcomes that are from a [**Bernoulli**](https://simple.wikipedia.org/wiki/Bernoulli_distribution) or [**Binomial distribution**](https://simple.wikipedia.org/wiki/Binomial_distribution).

```{r, eval=FALSE}
# convert large matrix objects to data frame objects and add outcome `fire` indicator
california_fires_env <-data.frame(california_fires_env,fire=1)
background_points_env <-data.frame(background_points_env,fire=0)

# View one of the data frame
head(california_fires_env, n=5)
head(background_points_env, n=5)
```

### Preparation of training & test data for prediction & model cross-validation

Now, we need to complete one more step before we construct our wildfire risk model. We have to come up with a way to assess how well our model can actually predict whether we will likely find a trigger point for fires in a particular location. To make this assessment, we will need to perform some ‘cross-validation’ i.e., that is setting aside some of our presence-absence locations, and using them to test the model. In our case, we will randomly withhold 25% of the observations as test data, and retain the other 75% as training data for the prediction.

This means that we will be fitting the model multiple times, withholding each fourth of the data separately, then average the results. This is called a **k-fold cross-validation** (in our case 4-fold). 

However, for our purposes of time, we will just fit the model once to demonstrate what is actually happening for you to get the gist. Ideally, we will need to perform a 4-fold cross-validation and in turn average the estimates across the values for AUC and those for the true positives and negative as described in **section 5.4.2**.

Use the `kfold()` function to split the presence data from `california_fires_env` object into 4 equal parts. This should add an index that makes four random groups of observations. You can hold 25% of the data (i.e., the first portion) by specifying `select == 1` as the test data. You can hold the remaining 75% of the data (i.e., the 2nd, 3rd and 4th portion) by specifying `select != 1` as the training data.

```{r, eval=FALSE}
# set same set.seed() as before
set.seed(20000430)
# using k-fold function to split data into 4 equal parts
select <- kfold(california_fires_env, 4)
# 25% of the fire data use for testing the model
california_fires_env_test <- california_fires_env[select==1,]
# 75% of the fire data use for training the model
california_fires_env_train <- california_fires_env[select!=1,]
```

Repeat the above process for the background points:

```{r, eval=FALSE}
# set same set.seed() as before
set.seed(20000430)
# repeat the process for the background points
select <- kfold(background_points_env, 4)
background_points_env_test <- background_points_env[select==1,]
background_points_env_train <- background_points_env[select!=1,]
```

Now, let us row bind the training and test dataset together using the `rbind()` function:

```{r eval=FALSE}
training_data <- rbind(california_fires_env_train, background_points_env_train)
testing_data <- rbind(california_fires_env_test, background_points_env_test)
```

We are now in the position to execute the distributional niche models.

## MAXENT Analysis

Now, we can fit the niche model using the Maximum Entropy (MAXENT) algorithm, which tries to define the combination of environmental risk factors that best predicts the occurrence of the wildfires in California. The `maxent()` allows the users to implement such algorithm. 

Here are some important notes on it's usage:

1. `maxent()`: This function uses environmental data for locations of known presence and for a large number of 'background' locations. It has three mandatory arguments - `x`, `p`, and `args`.
2. `x`: In this argument, you must specify the columns of the predictor variables in the training data frame. The first columns in the example are the risk factors we are interested.
3. `p`: In this argument, you must specify the column containing the presence and absence of fires in the training data frame.
4. `args`: This allows for additional arguments. 

Running the `maxent()` code should look something like:

```{r eval=FALSE}
model_training <- maxent(x=training_data[,c(1:6)], p=training_data[,7], args=c("responsecurves"))
```

### Examination of the predictor's contribution and response curves

The results are stored in the `model_training` object. We can examine which variable has the biggest contribution to the presence of wildfire presences in California:

```{r eval=FALSE}
plot(model_training, pch=19, xlab = "Percentage [%]", cex=1.2)
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image3_variable_contribution.png', error = FALSE) 
```

<div class="note">
**IMPORTANT NOTES**: We can view the contribution estimates for each covariate more precisely by typing in RConsole the following code: `model_training@results`. Here, we can see the following contribution estimates: NDVI (44.2321%); Elevation (23.5530%); Deprivation (12.0339%); Dryness (9.9266); Temperature (6.8892%); and Precipitation (3.3653%). The contribution estimates should sum up to 100%. 

**Interpretation**: From this plot, we can see that the model is most sensitive to variation in NDVI, followed with additional contributions from land surface elevation, and from increased levels of socioeconomic deprivation (reporting top three).
</div>

We can examine as well as the likelihood of fire occurrence and how it responds to variation in these conditions. To see the shape of the response curves estimated by the model, we can use the `response()` function:

```{r eval=FALSE}
response(model_training)
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image4_responsecurves.png', error = FALSE) 
```

<div class="note">
**Interpretation**: In the response plots, we are looking at how the probability of fire occurrence (Y-axes, from zero to one) varies with each the environmental predictors (X-axes). From these plots, we can see that the MAXENT models can include complex environmental responses including plateau, linear, and nonlinear shapes, and some which are utterly unclear. For example, if we look at mean temperature during the summer, we can see that the probability for fire occurrence peaks around 0.60 when temperatures are around 30 degrees Celsius. We can also see that the probability of such outcome increases with more and more vegetation during the summer period. Probability in terms of fires in relation to deprivation is a flat line. For precipitation, dryness and elevation - the patterns are unclear.
</div>

### Model validation

An important part is model validation - this involves assessing how well does the model actually predict the occurrence of wildfires. To evaluate the predictive accuracy of the model, we turn back to our test data i.e., `testing_data` object, and use cross-validation to test the model. 

In our evaluation - there are two main outputs of interest:

1. `AUC` (Area Under the Curve), which is a test of model performance where higher values indicate greater accuracy in our predictions. An AUC value of 0.5 is common cut-off point used for assessing model performance. Note that an AUC value of 0.5 or lower is the same as random guessing of presence/absence, while values towards one mean our predictions are more reliable and accurate.
2. `max TPR+TNR`, which denotes the probability threshold at which our model maximizes the True Positive Rate and the True Negative Rate. It is generally accepted that this is an optimum value at which to set the threshold for binary classification of the predicted probabilities in our mapping outputs. Anything above value is deemed as a region environmentally suitable for outcome.

We use the `evaluate()` function to perform cross-validation analysis.

```{r eval=FALSE}
# model evaluation use the test data on the trained model for validation
cross_validation <- evaluate(p=testing_data[testing_data$fire==1,], a=testing_data[testing_data$fire==0,], model = model_training)
```

Here are some important notes on it's usage:

1. `evaluate()`: This function used for model evaluation and validation. It has the following arguments - `p`, `a`, and `model`.
2. `p`: In this argument, you must specify the column of outcome and filter on the presence value e.g., `testing_data[testing_data$fire==1,]`.
3. `a`: In this argument, you must specify the column of outcome and filter on the absence value e.g., `testing_data[testing_data$fire==0,]`
4. `model`: Specify the full training model object e.g., `model_training`. 

Now call results and plot AUC curve:

```{r eval=FALSE}
cross_validation 
```

```{r eval=FALSE}
> cross_validation
class          : ModelEvaluation 
n presences    : 14120 
n absences     : 27609 
AUC            : 0.9074081 
cor            : 0.7003824 
max TPR+TNR at : 0.4054474
```

```{r eval=FALSE}
plot(cross_validation, 'ROC', cex=1.2)
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image5_auc_curve.png', error = FALSE) 
```

<div class="note">
**Interpretation**: On the receiver operator curve, the 1:1 line give an AUC of 0.5. From our curve and the AUC, it is clear that our model appears to do substantially better than random guessing (high AUC value = 0.907 [90.7%]). The optimal probability threshold at which our model maximizes the True Positive Rate and the True Negative Rate is 0.4054474 (40.55%). Hence, we will use predicted probability > 0.4054 to delineate areas of suitability (or trigger points) for wildfires.  
</div>

### Mapping the predicted probability and suitability 

To map the predicted probabilities use the `predict()` function:

```{r eval=FALSE}
prob_wildfire <- predict(model_training, envCovariates)
```

Generate a predicted probability map from above `prob_wildfire` object:

```{r eval=FALSE}
# generate a publication-worthy figure
# map of probability 
tm_shape(prob_wildfire) +
	tm_raster(title = "Predicted probability", palette = '-RdYlBu', style ='cont', breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1.0))+
tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
	tm_layout(main.title = "Predicted Probability of Wild Fire [%]", main.title.position = c(0.2, 0.7), title.size=3, legend.text.size = 1.1, 
		legend.position = c(0.65, 0.55), legend.height= -0.3, legend.title.size = 1.1, frame='white')+
	tm_scale_bar(position=c(0.02, 0.02), text.size = 1, breaks = c(0, 100, 200, 300))+
	tm_compass(north = 0,type = 'arrow', position = c('right', 'top'), text.size = 0.9)
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image6_predicted_prob.png', error = FALSE) 
```

Extract the optimal threshold value from cross validation object `cross_validation` using the `threshold()` function and use it to reclassify the raster object i.e., `prob_wildfire` to a binary raster. Recall, the value was `0.4054474`. A probability estimate less than `0.4054474` is classed as `0` and anything above as `1`. The predicted probability > 0.4054 are the areas in California suitable (or expected trigger points) for wildfires.

```{r eval=FALSE}
# calculate thresholds of models
threshold_value <- threshold(cross_validation, "spec_sens")
# report value
threshold_value
```

Reclassifying raster object `prob_wildfire` with threshold value:

```{r eval=FALSE}
# prepare threshold total map 
create_classes_vector <- c(0, threshold_value, 0, threshold_value, 1, 1)
create_clasess_matrix <- matrix(create_classes_vector, ncol = 3, byrow = TRUE)
create_clasess_matrix
```

```{r eval=FALSE}
> create_clasess_matrix
          [,1]      [,2] [,3]
[1,] 0.0000000 0.4054474    0
[2,] 0.4054474 1.0000000    1
```

```{r eval=FALSE}
# create new reclassify raster based on prob_wildfires
suitability_wildfires <- reclassify(prob_wildfire, create_clasess_matrix)
```

Generate final output which shows regions as trigger points:

```{r eval=FALSE}
tm_shape(suitability_wildfires) + tm_raster(style = "cat", title = "Threshold", palette= c("lightgrey", "red"), labels = c("Safe", "Trigger Points")) +
	tm_shape(california_county) + tm_polygons(alpha = 0, border.col = "black") +
	tm_layout(frame = FALSE, legend.outside = TRUE)
```

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image7_suitability.png', error = FALSE) 
```

### Supplementary code for 4-fold analysis

<details><summary>**Click here to see code:**</summary>
```{r eval=FALSE}
# split plot panel into 4 segments for 4 AUC plots
par(mfrow=c(2,2))
# create a list() object to dump results inside `eMAX`
eMAX<-list()

# use california_fires_env
# use background_points_env
folds <- 4

kfold_pres <- kfold(california_fires_env, folds)
kfold_back <- kfold(background_points_env, folds)

set.seed(20000430)
# adapting loop code from https://rpubs.com/mlibxmda/GEOG70922_Week5
# takes a long time to run 4-fold
for (i in 1:folds) {
	train <- california_fires_env[kfold_pres!= i,]
	test <- california_fires_env[kfold_pres == i,]
	backTrain<-background_points_env[kfold_back!=i,]
	backTest<-background_points_env[kfold_back==i,]
	dataTrain<-rbind(train,backTrain)
	dataTest<-rbind(test,backTest)
	maxnet_eval <- maxent(x=dataTrain[,c(1:6)], p=dataTrain[,7], args=c("responsecurves"))
	eMAX[[i]] <- evaluate(p=dataTest[dataTest$fire==1,],a=dataTest[dataTest$fire==0,], maxnet_eval)
	plot(eMAX[[i]],'ROC')
}

aucMAX <- sapply( eMAX, function(x){slot(x, 'auc')} )
# report 4 of the AUC
aucMAX
# find the mean of AUC (and it must be > 0.50)
mean(aucMAX)

#Get maxTPR+TNR for the maxnet model
Opt_MAX<-sapply( eMAX, function(x){ x@t[which.max(x@TPR + x@TNR)] } )
Opt_MAX

Mean_OptMAX<-mean(Opt_MAX)
Mean_OptMAX
# use Mean_OptMAX as threshold for mapping suitability

#Note: that final results is AUC: 0.9059569; threshold: 0.4338418
```
</details>

## Attributions
This week’s practical uses content and inspiration from:

Taylor, L. 2022. The social side of fire: assessing the inclusion of human social factors in fire prediction models (submitted as a dissertation [for degree in MSc Social & Geographic Data Science] at UCL). [**Source**](https://github.com/lucyadtaylor/Dissertation-code-and-data)

## References (see reading list)

1. **Book**: [R Programming] Dorman, M. (2014) Learning R for Geospatial Analysis; **Chapter 3: Working with Rasters** [**Click link**](https://drive.google.com/file/d/1zkin0SY2bQFUGGZtBwXqSBoR8I2m-8SR/view?usp=sharing) (Note: **Downloadable**)
2. **Book**: [Theory] Stockwell, D. (2019) Niche Modeling: Predictions from Statistical Distributions; **Chapter 4: Topology**; CRC Press; pages: 45-63.
3. **Online**: [Tutorials] Hijmans, R.J., & Elith, J. (2021) **Species distribution modelling** [**Click link**](https://rspatial.org/raster/sdm/1_sdm_introduction.html)
4. **Online**: [Tutorials] Kerkhoff, D. (2016) **Ecological Responses to Climate Change: Species Distribution Modelling using Maxent** [**Click link**](https://rstudio-pubs-static.s3.amazonaws.com/224303_df34f170cd9144cda6477ae8232887f7.html)
5. **Online**: [Tutorials] Dennis, M. (2020) **Practical 4: Species Distribution Modelling I** [**Click link**](https://rpubs.com/mlibxmda/GEOG70922_Week5)
6. **Paper**: [Application] Escobar, L.E., (2020). Ecological Niche Modeling: An Introduction for Veterinarians and Epidemiologists, Frontiers in Veterinary Science [**Click link**](https://www.frontiersin.org/articles/10.3389/fvets.2020.519059/full)
7. **Paper**: [Application] Banks, W.E., (2017). The application of ecological niche modeling methods to archaeological data in order to examine culture-environment relationships and cultural trajectories; Quarternaire [**Click link**](https://journals.openedition.org/quaternaire/7966)
8. **Paper**: [Application] Liao, Y., Lei, Y., Ren, Z., Chen, H., & Li., D., (2017). Predicting the potential risk area of illegal vaccine trade in China; Scientific Reports, Issue 7, 3883. [**Click link**](https://www.nature.com/articles/s41598-017-03512-3)

## Data Sources

1. All shape files [Source: **California Open Data Portal**] [**Click Here**](https://data.ca.gov/dataset/ca-geographic-boundaries)
2. Global Wildfires detection points [Source: **Fire Information Resource Management System**] [**Click Here**](https://firms.modaps.eosdis.nasa.gov)
3. Environmental data for temperature & precipitation [Source: **WorldClim**] [**Click Here**](https://www.worldclim.org)
4. Socioeconomic vulnerability index (requires user login) [Source: **Socioeconomic Data and Applications Center (SEDAC)**] [**Click Here**](https://sedac.ciesin.columbia.edu)
5. Digital Elevation Model [Source: **SRTM 90m DEM Digital Elevation Database**] [**Click Here**](https://srtm.csi.cgiar.org)
6. Evapotranspiration (aridity) 1.0km [Source: **NASA MODIS MOD16A2**] [**Click Here**](https://modis.gsfc.nasa.gov/data/dataprod/mod16.php)
7. Normalised Differenced Vegetation Index (NDVI) 250m [Source: **NASA MODIS MOD13Q1**] [**Click Here**](https://modis.gsfc.nasa.gov/data/dataprod/mod13.php)

## Practical homework

**Suitability mapping of the Aedes mosquito and infestation in Brazil**

Many regions in Brazil were hit hard by the Zika virus infection outbreak in 2015. Zika infection is caused by the arboviruses transmitted by the Aedes mosquitoes which are abundant in Brazil. It is a known fact that increased abundance of the Aedes mosquito is typically associated with standing (or stagnant) water which serves as a reservoir or hotspot for breeding. Apart from the presence of standing (or stagnated) water in human dwellings, it is important to consider other intermediate factors that drive the mosquitoes to increase in population size. These factors are the following:

- Temperature
- Precipitation
- Population Density
- NDVI
- Land surface elevation
- Natural lighting
- Urban-rural classification

The above listed variables are gridded datasets which can be downloaded by clicking [**[HERE]**](https://github.com/UCLPG-MSC-SGDS/Data-Sources/raw/main/Gridded%20datasets.zip). Create a map which should the following: 1.) The predicted probability of infestation of the Aedes mosquito in Brazil; and 2.) the suitability map based on the `max TPR + TNR` threshold to illustrate where the Aedes mosquito will thrive in Brazil.

All identified points for mosquito breeding including background points can downloaded from [**[HERE]**](https://github.com/UCLPG-MSC-SGDS/Data-Sources/raw/main/Points.zip). The boundaries for Brazil can be downloaded from [**[HERE]**](https://github.com/UCLPG-MSC-SGDS/Data-Sources/raw/main/Shapefiles.zip).

The expected output should look like:

```{r echo=FALSE, out.width = "100%", fig.align='center', cache=TRUE,}
knitr::include_graphics('images/general/week5/image8_FinalPicture.png', error = FALSE) 
```

**Notes:**

- For the predicted probability map (A), reclassify the predictions to the following categories - `0-0.2`, `0.2-0.4`, `0.4-0.6`, `0.6-0.8` and `0.8-1.0`.
- For the second suitability map (B), reclassify the probabilities based on the  `max TPR + TNR` threshold i.e., below it as `0` and labelled as 'None infested areas' and above as `1` labelled as 'Infested areas'.

This week’s homework practical was based on the original research paper:

- Musah _et al_ (2023). Coalescing disparate data sources for the geospatial prediction of mosquito abundance, using Brazil as a motivating case study. **Frontiers in Tropical Diseases**. Volume 4. DOI: https://doi.org/10.3389/fitd.2023.1039735