## EDGE Scores and Conservation Strategy

In [None]:
#@title ### Run to set up notebook (Takes ~ 15 minutes).
# Clone github repository, install spatial dependencies for linux, install R packages
dir.create("My Git Repo")
git2r::clone("https://github.com/Syrph/BCB_Practicals", "My Git Repo")
setwd("My Git Repo")
system("sudo apt-get update")
system("sudo apt-get install libgdal-dev libproj-dev libgeos-dev libudunits2-dev libv8-dev libprotobuf-dev libjq-dev")
source("install.R")

# Load in all the accip maps and combine them into one big sp datafame
library(rgdal)
library(raster)
library(dplyr)

# Get the file names that are Rdata objects
file_names <- list.files()
indices <- grep("*Rdata", file_names)
file_names <- file_names[indices]

# load in the first map and create a vector ready
load("Accipiter_albogularis_maps.Rdata")
assign("Accipiter_albogularis_maps", species_map)
Accip_maps <- Accipiter_albogularis_maps

# Load in the rest of the maps and combine them
for (file in file_names[2:248]){
  name <- gsub(".Rdata", "", file)
  load(file)
  assign(name, species_map)
  Accip_maps <- rbind(Accip_maps, get(name))
}

# Save it back as an R.data object
save(Accip_maps, file = "Accipitridae_maps.Rdata")
rm(list=ls())

# Remove the packages for students to load back in later
detach("package:raster", unload = TRUE)
detach("package:rgdal", unload = TRUE)
detach("package:dplyr", unload= TRUE)

### 1. Introduction and resources

This practical is aimed to introduce you to the EDGE & FUDGE Scores that you'll need for your conservation strategy coursework. Put briefly, these scores balance the distinctiveness of species against their risk of extinction to detirmine conservation priorities. You can find out more information about EDGE scores from the ZSL website: 

https://www.zsl.org/conservation/our-priorities/wildlife-back-from-the-brink/animals-on-the-edge 

We will also try plotting a simple map of IUCN categories so we can visual the risk to our clade across the globe.

### 2. Preparing your Data

To calculate EDGE metrics, we need data on the species we're interested in, and their phylogenetic relationship. For the coursework we're interested in EDGE scores for a specific clade, however it's also common to look at areas such as national parks. 

For this practical we're going to use the same family as Practical 3, Accipitridae. We'll use the same table of traits from Practical 3 to import our data and filter it. 

In [None]:
trait_data <- read.csv("coursework_trait_data.csv")
str(trait_data)
head(trait_data)

And again filter for Accipitridae.

In [None]:
library(dplyr)
Accip_data <- trait_data %>% filter(Jetz_family == "Accipitridae")
nrow(Accip_data)

Because we're going to use EDGE scores, we should check for any extinct species we need to remove. 

In [None]:
# This operator | means OR. EW means extinct in the wild.
Accip_data %>% filter(Redlist_cat == "EX" | Redlist_cat == "EW")

Great, no extinct species in this family! There shouldn't really be many in our Jetz phylogeny, but some do turn up occasionally.

Now we need to load in our tree. For this practical we're using a random tree extracted from http://birdtree.org/

Because we're not sure on the exact placement of some species tips, the Jetz tree has multiple versions, each with a slightly different layout. Normally this only means a few species have swapped places slighly. This is why we've chosen a random tree for our analysis. There are other (better) methods for dealing with this uncertainty, but for these practicals it will be enough to use a random tree. If you're interested in these methods then this is a good paper to check out:

https://academic.oup.com/cz/article/61/6/959/1800551

In [None]:
library(ape)
library(caper)

# Load in and plot the tree
bird_tree <- read.tree("all_birds.tre")
plot(bird_tree)

### 2. ED Scores

Now that we've got our tree and our species we can start calculating our ED (Evolutionary Distinctiveness) scores. Because we are calculating the evolutionary distinctiveness of Accipitridae, we want to use the whole bird phylogeny to compare against. Then we can find then out if our species in the UK are very closely related to others in the tree, or represent distinct lineages that might want to conserve to protect valuable evolutionary diversity.

We can do this easily using a simple function from the `caper` package. This sometimes takes a while to run.

In [None]:
# We can first transform our tree into a matrix of distances from each tip to tip. This step is optional but stops a warning message from ed.calc, which prefers a matrix to a tree.
bird_matrix <- clade.matrix(bird_tree)

# Now we can run the ed.calc function, which calculates ED scores for each species. The output gives two dataframes, but we only want the species names and scores so we use $spp
ED <- ed.calc(bird_matrix)$spp
head(ED)

Now that we've got our ED scores for each species, we need to log transform and normalise our scores. 

In [None]:
# By adding 1 to our scores, this prevents negative logs when our ED scores are below 1. 
ED$EDlog <- log(1+ED$ED)

# We can normalise our scores so they're scaled between 0 and 1
ED$EDn <- (ED$EDlog - min(ED$EDlog)) / (max(ED$EDlog) - min(ED$EDlog))
head(ED)

Now that we have our normalised scores for all birds, we need to subset the list for just Accipitridae.

In [None]:
# Pull out the ED row numbers for our species list.
row_numbers <- (ED$species %in% Accip_data$Jetz_Name)

# Get our UK species with ED scores
Accip_ED <- ED[row_numbers,]
str(Accip_ED)

We now have the ED scores of 237 species in Accipitridae. With these scores we can see how unique our species are in terms of the evolutionary pathway.

In [None]:
# Find the highest ED score
Accip_ED[Accip_ED$EDn == max(Accip_ED$EDn),]

The highest ED scores belong to *Chelictinia riocourii*,  the scissor-tailed kite, and *Gampsonyx swainsonii*, the pearl kite. Both species are the only member of a monotypic genus, and part of the small subfamily Elaninae, the elanine kites. This subfamily only has six species, and all the others form one genus. Therefore, with so few close relatives, we might consider this species a conservation priority to protect as much diversity as we can. However we don't yet know if this species needs conserving...

### 3. EDGE Scores

This is where EDGE scores come in. By combining ED scores with IUCN categories we can select the species that need conservation action, and represent unique evolutionary variation.

First we need to convert the IUCN status in GE (Globally Endangered) scores. This is relatively simple as we're just assigning numeric rankings, but we'll use a for loop to practice our skills! We'll also use an `if` statement as well because we have two catergories with the same GE score (near threatened and deficient).

In [None]:
# Create an empty column to store our GE scores.
Accip_data$GE <- NA

# Create a vector to increase with each new ranking, starting at 0 for least concern.
i <- 0

# Create a list to loop through in the order of GE scores.
redlist_cats <- c("LC", "NT", "DD", "VU", "EN", "CR")

# Loop through each different category in the redlist categories.
for (category in redlist_cats){

  # Add the GE score for that category.
  Accip_data[Accip_data$Redlist_cat == category, "GE"] <- i
  
  # Because DD comes after NT, and both are scored as 1, don't want to change i if the category is NT.
  # We can use an if statement to do this. != means not equal to. 
  if (category != "NT"){
    i = i + 1
  }
}

In [None]:
unique(Accip_data$Redlist_cat)

In [None]:
unique(Accip_data$GE)

Now we'll merge our GE scores with our ED scores in one dataframe.

In [None]:
# Join the last two columns of UK_Jetz to ED scores. This time we'll use the 'by' arguement rather than change the column names.
Accip_EDGE <- left_join(Accip_data, Accip_ED,  by = c("Jetz_Name" = "species"))
head(Accip_EDGE)

We can now calculate our EDGE scores using some simple maths:

$$EDGE=ln⁡(1+ED)+GE×ln⁡(2)$$

We have already done the first half. Now we just need to multiply GE scores by the natural log of 2, and combine them.

In [None]:
# The log function uses natural logarithims by default.
Accip_EDGE$EDGE <- Accip_EDGE$EDlog + Accip_EDGE$GE * log(2)
head(Accip_EDGE)

Now we have our EDGE scores, we can see if our conservation priority has changed in light of IUCN categories.

In [None]:
# Find the highest EDGE score.
Accip_EDGE[Accip_EDGE$EDGE == max(Accip_EDGE$EDGE),]

# Find the EDGE score for our previous highest species.
Accip_EDGE[Accip_EDGE$Jetz_Name == "Chelictinia_riocourii",]
Accip_EDGE[Accip_EDGE$Jetz_Name == "Gampsonyx_swainsonii",]

So now we can see that the top conservation priority is Pithecophaga jefferyi	Philippine Eagle. Whilst our previous kites are still high, their low IUCN score means its less of a priority than P. jefferyi, which is critically endangered. 

In reality, you want to preserve more than just one species! We can see from the spread of EDGE scores that there are few species with high EDGE scores, and we would ideally like to create a plan that maximises the conservation of all of them (if it's possible). Based on your own taxa you'll decide what constitutes a high EDGE score.

In [None]:
hist(Accip_EDGE$EDGE, breaks = 20)

In [None]:
# With the filter function we can split our dataframes based on rules for certain columns.
Accip_EDGE %>% filter(EDGE > 5)

### 4. FUDGE Scores

Instead of evolutionary distinctiveness, we might instead be interested in what functional traits each species provides. Species with low functional diversity may be 'functionally redundant' in the ecosystem, whereas those with high functional diversity may provide key ecosystem services that aren't easily replaceable. 

Unlike ED, we will not calculate functional distinctiveness (FD and FDn) in relation to all species within the order worldwide. Instead, we will calculate FD and FDn for just our chosen species. The reason for this is that FD is traditionally used in the context of a specific community or radiation of species (i.e. all birds found within a national park, or all species of lemur).

We need to change row names to species names and remove all the columns except traits. Then normalise our trait data so that body_mass and beak have the same scale (the same variance). 


In [None]:
# Make a copy of Accip Data
Accip_traits <- Accip_EDGE

# Change row names and keep just trait data.
rownames(Accip_traits) <- Accip_traits$Jetz_Name
Accip_traits <- Accip_traits[,6:7]

# Make each column have the same scale.
Accip_traits <- scale(Accip_traits, scale=T)
head(Accip_traits)

To calculate functional diversity we'll create a distance matrix of our traits. Species with similar traits will have smaller 'distances'.

In [None]:
# Create a matrix
traits_matrix <- as.matrix(Accip_traits)

# Converts traits into 'distance' in trait space.
distance_matrix <- dist(traits_matrix)

The next step is to create a new tree using the neighbour-joining method (Saitou & Nei, 1987) (Google for more information!). This will create a tree where branch lengths show how similar species are in trait space rather than evolutionary distance. This function may take a while with more species so don't be alarmed if the group you've chosen takes much longer.

In [None]:
# Create the tree
trait_tree <- nj(distance_matrix)

# Test to see if it's worked. The tree looks different to a normal one because tips don't line up neatly at the present time period like with evolutionary relationships.
plot(trait_tree, cex=0.4)

FD trees can fail if there are too many NAs in the data. If this is the case for your taxa, either impute missing data using genus averages (following Swenson et al. 2013) or remove species or traits with high NA counts from FD analysis. Note, however, that the bird data is very complete so there should be no need to remove NA species from the dataset; this should be a last resort so only do this if the analyses are failing repeatedly.

With our tree of functional space, we can now calculate FD scores the same way we calculated ED scores. 

In [None]:
# Create a matrix of distance from tip to tip.
tree_matrix <- clade.matrix(trait_tree)

# Calculate FD scores.
FD <- ed.calc(tree_matrix)$spp

# Change the name to FD
colnames(FD)[2] <- "FD"
head(FD)

Log and normalise the data as we did before with ED so we could compare FD scores from different groups. 

In [None]:
FD$FDlog <- log(1+FD$FD)
FD$FDn <- (FD$FDlog - min(FD$FDlog)) / (max(FD$FDlog) - min(FD$FDlog))

# Find the highest FD score
FD[FD$FDn == max(FD$FDn),]

So the species with the largest FD score is	*Gyps himalayensis*, the Himalayan Griffon. Not suprising seeing as Himalayan Griffons are one of the heaviest flying birds alive today! We can also combine GE scores to see how IUCN categories change our priorities. We use the same formula as before:

$$FUDGE=ln⁡(1+FD)+GE×ln⁡(2)$$

In [None]:
# Join FD and GE scores
Accip_FUDGE <- left_join(Accip_data, FD, by = c("Jetz_Name" = "species"))

# Calculate FUDGE scores
Accip_FUDGE$FUDGE <- Accip_FUDGE$FDlog + Accip_FUDGE$GE * log(2)
head(Accip_FUDGE)

And does IUCN categories change our conservation priorities?

In [None]:
# Find the highest EDGE score
Accip_FUDGE[Accip_FUDGE$FUDGE == max(Accip_FUDGE$FUDGE),]

# Find the EDGE score for Gyps himalayensis
Accip_FUDGE[Accip_FUDGE$Jetz_Name == "Gyps_himalayensis",]

Yes! Funnily enough the Philippine Eagle is again the species we need to check. This may be because the GE component of FUDGE scores is weighted much higher than the FD component. In fact, looking at FD, the Himalayan Griffon has a higher score. 

In [None]:
# Get the top 5% of FD scores.
Accip_FUDGE[Accip_FUDGE$FD > quantile(Accip_FUDGE$FD, 0.95),]

In [None]:
# Get the top 5% of FUDGE scores.
Accip_FUDGE[Accip_FUDGE$FUDGE > quantile(Accip_FUDGE$FUDGE, 0.95),]

As we can see, all of the higest FUDGE scores are critically endangered. This has been a criticism of FUDGE scores, that functional diversity isn't weighted highly enough. Of course for our taxa these are probably the species we want to protect, and maybe GE should be the more pressing issue. However if your taxa has very few CR species, it's worth checking FD scores as well, as you may want to adjust your GE scores to give more weighting to FD.

### 5. EcoEDGE Scores

So we've used EDGE scores to combine extinction risk with evolutionary diversity, and FUDGE scores to do the same with functional diversity. However, both are important, and we might want to combine all three into one metric. This is exactly what EcoEDGE scores do. And we've pretty much done all the hard work already. The equation is similar to the ones we've used, but we give ED and FD scores equal weighting:

$$EcoEDGE= (0.5×EDn + 0.5×FDn) + GE×ln⁡(2)$$

And remember our EDn and FDn scores have already been logged, so we don't need to log them now.

In [None]:
# Merge FD and ED scores.
Accip_EcoEDGE <- left_join(Accip_EDGE, Accip_FUDGE)

# Calculate EcoEDGE scores
Accip_EcoEDGE$EcoEDGE <- (0.5*Accip_EcoEDGE$EDn + 0.5*Accip_EcoEDGE$FDn) + Accip_EcoEDGE$GE*log(2)
head(Accip_EcoEDGE)

We can again look at the spread and see which are the highest species.

In [None]:
# Get the highest scoring species
Accip_EcoEDGE[Accip_EcoEDGE$EcoEDGE == max(Accip_EcoEDGE$EcoEDGE),]

# Get the top 10% of EcoEDGE scores.
Accip_EcoEDGE[Accip_EcoEDGE$EcoEDGE > quantile(Accip_EcoEDGE$EcoEDGE, 0.9),]

# See the spread
hist(Accip_EcoEDGE$EcoEDGE, breaks = 20)

Unsuprisingly, the 	Philippine Eagle is again the highest species. However, most birds in Accipitridae are not currently threatened by extinction according to IUCN criteria. For your own taxa, this may be a very different story, and ED and FD scores may matter a lot more. It's also up to you if you want to down weight GE scores, or you agree that conservation priority goes to those species most threatened with extinction. How you chose to interpret and present your results is up to you, and will depend on the group that you've chosen.

For the practicals and coursework we've chosen to use a simplified version of EcoEDGE scores. If you're interested in learning more, check out this paper which first proposed the use of EcoEDGE scores:

https://onlinelibrary.wiley.com/doi/full/10.1111/ddi.12320



### 6. Plotting a map of IUCN categories.

You may wish to plot maps of your IUCN redlist categories, especially if you're intersted in what areas of the world are most threatened by extinction. We can do this easily using similar code from practical 3. 

In [None]:
# First load in the spatial packages we'll need
library(raster)
library(rgdal)
library(sf)
library(geosphere)


# Load the data into our environment
load("Accipitridae_maps.Rdata")

# Inspect the maps
class(Accip_maps)
head(Accip_maps)

We'll run the same code as before to compile our spatial dataframe into a raster stack. The only difference is this time we're assigning a value to each species layer, corresponding with their GE rating.

In [None]:
# Start by creating an empty raster stack to store our data in.
raster_stack <- raster(ncols=2160, nrows = 900, ymn = -60)

# Open a for loop which will cycle through each row of our EcoEDGE table. We used i as rownumber because we want multiple columns in different parts of the loop.
for (i in 1:nrow(Accip_EcoEDGE)) {

  # We want to subset our range maps for each species and only the range maps in which it is present now (not historical). 
  map_data_i <- subset(Accip_maps, Accip_maps$SCINAME == Accip_EcoEDGE$Birdlife_Name[i])  
  map_data_i <- subset(map_data_i, map_data_i$PRESENCE %in% c(1,2,3))
  
  # Combine the different ranges (Shapefiles) and convert to a Spatial Polygon.
  map_i <- as_Spatial(st_combine(map_data_i$Shape))
  
  # Convert this Spatial Polygon into a raster with dimensions == raster_stack. Value = 1 if pixel is inside the polygon (range).
  raster_i <- rasterize(map_i, raster_stack)
  
  # Areas with a value of 1 are inside the range. We need to convert this to the GE score.
  raster_i[raster_i == 0] <- NA
  raster_i[raster_i == 1] <- Accip_EcoEDGE$GE[i] 
  
  # Lastly we want to add our finished range map (coded for different GE scores) to our stack to store for later.
  raster_stack <- addLayer(raster_stack, raster_i) 

}

Now we've created our stack of range maps, and each are coded for their IUCN category. In this case we'll take the maximum GE score as the one that's shown. So if two ranges overlap, we take the largest score.

In [None]:
# Combine all layers in the stack together to produce a final raster_layer.
# By using fun = max, we're asking the function to pick the highest GE score. You can change this if you wanted the average instead.
# You could also change the indices arguement and replace with Accip_EcoEDGE$GE if you wanted a separate map for each GE score.
final_layer <- stackApply(raster_stack, rep(1, nlayers(raster_stack)), fun = max)

# Resize the plot window and plot.
options(repr.plot.width=15, repr.plot.height=15)
plot(final_layer)

So now you can see the spread of GE scores throughout the globe. For your own species you may wish to focus on a specific area of Earth using the `crop()` function. Again we'll use ggplot2 to make them a little nicer to look at.

In [None]:
library(tidyr)
library(ggplot2)

# Convert the raster into a raster dataframe. This will be coordinates of the raster pixels (cols x and y) and the value of the raster pixels (col index_1). Remove rows with NA values from this dataframe.
raster_data <- as.data.frame(final_layer, xy=TRUE) %>% drop_na()
colnames(raster_data) <- c("long", "lat", "index")

# Turn the GE score values to a factor to give a discrete raster rather than continuous values.
raster_data$index <- as.factor(raster_data$index)

# we can then plot this in ggplot. We have to first create the color scheme for our map.
# The six character codes (hexcodes) signify a color. There are many stock colors (i.e. "grey80" yellow" "orange" "red") but hexcodes give more flexibility. 
# Find color hexcodes here: https://www.rapidtables.com/web/color/RGB_Color.html
myColors <- c("grey80", "grey80", "#FCF7B7", "#FFD384", "#FFA9A9")

# Assign names to these colors that correspond to each GE score. We also use the sort() function to make sure the numbers are in asscending order.
names(myColors) <- unique(sort(raster_data$index))

# Create the color scale.
colScale <- scale_fill_manual(name = "IUCN Status", values = myColors)


# Create a plot with ggplot (the plus signs at the end of a line carry over to the next line).
GE_plot <- ggplot() +
  # borders imports all the country outlines onto the map. colour changes the color of the outlines, fill changes the color of the insides of the countries
  # this will grey out any terrestrial area which isn't part of a range.
  borders(ylim = c(-60,90), fill = "grey90", colour = "grey90") +
  
  # Borders() xlim is -160/200 to catch the edge of russia. We need to reset the xlim to -180/180 to fit our raster_stack.
  xlim(-180, 180) + 

  # Add the GE information on top.
  geom_tile(aes(x = long, y = lat, fill = index), data = raster_data) +
  colScale +
  ggtitle("Accipitridae Threat Map") + 
  theme_classic() +
  ylab("Latitude") + 
  xlab("Longitude") + coord_fixed() # coord_fixed() makes ggplot keep our aspect ratio the same, rather than stretching the plot to fit all available space.

# Resize the plotting window and return the plot so we can view it.
options(repr.plot.width=15, repr.plot.height=15)
GE_plot

There's our finished map! Think how you'd change it yourself if you want to include one in your report. Maybe you want to include EDGE scores, or average values instead of max. It's up to you and what you think is the best way to visualise your data!