# Open Ocean
# Open Earth Fundation

<h1> Step 2: calculate different metrics for each modulating factor </h1>

This notebook is the second part of the `Step1_Curate_IUCN_RedList.ipynb`

<h2> Modulating Factor 1: Normalize Biodiversity Score </h2>

Species diversity refers to the variety of different species present in a given area, as well as their abundance and distribution. This includes the number of species, their relative abundances, and how evenly or unevenly distributed they are.
Our proposal is: apply the Simpson and Shannon Index to obtain a local value of the MPA and normalize each sqd km value

### Data needed for this project

- Species names
- Species abundance
- Species distribution

Next Steps:

1. Find a database or datasets with abundance and distribution information for the entire ACMC
2. If it isn't reallistic, try to simulate that data

Options:
1. IUCN RED List and simulate abundance information
2. GBIF species information and simulate abundance and distribution information

### 1. Importing libraries.

In [1]:
# load basic libraries
import os
import glob
import boto3

import math
import numpy as np
import pandas as pd

# to plot
import matplotlib.pyplot as plt

# to manage shapefiles
import shapely
import geopandas as gpd
from shapely.geometry import Polygon, Point, box
from shapely.ops import linemerge, unary_union, polygonize

In [2]:
import fiona; #help(fiona.open)

**Import OEF functions**

In [3]:
%load_ext autoreload

In [4]:
#Run this to reload the python file
%autoreload 2
from MBU_utils import *

### 2. Load data

In [5]:
ACMC = gpd.read_file('https://ocean-program.s3.amazonaws.com/data/raw/MPAs/ACMC.geojson')

In [None]:
%%time
df = gpd.read_file('https://ocean-program.s3.amazonaws.com/data/processed/ACMC_IUCN_RedList/gdf_ACMC_IUCN_range_status_filtered.shp')

In [6]:
%%time
df = gpd.read_file('/Users/maureenfonseca/Desktop/Data-Oceans/ACMC_IUCN_data/gdf_ACMC_IUCN_range_status_filtered.shp')

CPU times: user 7min 20s, sys: 3.72 s, total: 7min 24s
Wall time: 7min 25s


In [7]:
grid = create_grid(ACMC, grid_shape="hexagon", grid_size_deg=1.)

### 3. Preliminary calculations


In [10]:
df = df[0:100]

In [11]:
fake_abundance = np.random.randint(50, size = (len(df)))

In [12]:
df['abundance'] = fake_abundance

**Simpson Index**

$\text{D} = 1-(\frac{\sum{n(n-1)}}{N(N-1)})$

n = the total number of organisms of a particular species

N = the total number of organisms of all species

The value of ***D*** ranges between 0 and 1. With this index, 1 represents infinite diversity and 0, no diversity.

In [9]:
def simpson(gdf_abundance_col):
    """
    Calculates the value of D using the given values of abundance.
    
    Parameters:
        - n (int): The sum of n(n-1)
        - N (int): The total number of elements
        
    Returns:
        - D (float): The calculated value of D
    """
    
    abundance = np.array(gdf_abundance_col)
    N = np.sum(abundance)
    
    numerator = sum([n*(n-1) for n in abundance])
    denominator = N * (N - 1)
    
    D = 1 - (numerator / denominator)
    
    return D

In [49]:
def simpson(roi, gdf, grid_gdf, gdf_col_name):
    """
    This function calculates the Simpson Index per grid cell and its corresponding MBU value
    
    input(s):
    roi <shapely polygon in CRS WGS84:EPSG 4326>: region of interest or the total project area
    gdf <geopandas dataframe>: contains at least the name of the species, the distribution polygons of each of them 
                             :and their abundance
    grid_gdf <geopandas dataframe>: consists of polygons of grids typically generated by the gridding function
                                  : containts at least a geometry column and a unique grid_id
    gdf_col_name <string>: corresponds to the name of the abundance information column in the gdf
    
    output(s):
    gdf <geopandas dataframe>: with an additional column ('mbu_habitat_survey') containing the number
                             : of units for that grid or geometry
    """
    
    #Join in a gdf all the geometries within ROI
    gdf = gpd.clip(gdf.set_crs(epsg=4326, allow_override=True), roi)

    #This function calculates the sum of all abundances of overlapping species
    overlap = sum_values(gdf, str(gdf_col_name))

    #Merged the overlap values of overlapping geometries with the grid gdf
    merged = gpd.sjoin(overlap, grid_gdf, how='left')
    merged['n_value'] = overlap['sum_overlaps']
    
    #Calculate the numerator and denominator needed per row
    num = merged['n_value']*(merged['n_value']-1)
    den = np.sum(merged['n_value'])*(np.sum(merged['n_value'])-1)
    merged['num'] = num

    #Dissolve the DataFrame by 'index_right' and aggregate using the calculated Shannon entropy
    dissolve = merged.dissolve(by="index_right", aggfunc={'num': 'sum'})
    
    #Calculate the Shannon index per grid
    dissolve['simpson'] = 1-(dissolve['num']/den)

    #Put this into cell
    grid_gdf.loc[dissolve.index, 'Simpson'] = dissolve.simpson.values

    #Normalization factor
    Norm_factor = grid_gdf['Simpson']/grid_gdf['Simpson'].max()
    
    return grid_gdf

In [50]:
simpson(ACMC, df, grid, 'abundance')

  new_gdf = gdf.explode('geometry')


Unnamed: 0,geometry,Grid_ID,Shannon,area_sqkm,mbu_shannon,mbu_shannon_n,Simpson,mbu_simpson,mbu_simpson_n
0,"POLYGON ((-88.32201 2.15063, -88.82201 3.01666...",0,,31986.471199,,,,,
1,"POLYGON ((-88.32201 3.88268, -88.82201 4.74871...",1,-0.0,31936.727495,-0.0,-0.0,1.0,31936.727495,31936.727495
2,"POLYGON ((-88.32201 5.61474, -88.82201 6.48076...",2,-0.0,31858.4651,-0.0,-0.0,1.0,31858.4651,31858.4651
3,"POLYGON ((-88.32201 7.34679, -88.82201 8.21281...",3,-0.0,31751.740608,-0.0,-0.0,1.0,31751.740608,31751.740608
4,"POLYGON ((-86.82201 1.28461, -87.32201 2.15063...",4,,31954.393897,,,,,
5,"POLYGON ((-86.82201 3.01666, -87.32201 3.88268...",5,-0.0,31919.081068,-0.0,-0.0,1.0,31919.081068,31919.081068
6,"POLYGON ((-86.82201 4.74871, -87.32201 5.61474...",6,1.977455,31855.35422,62992.541734,18531.965971,0.999931,31853.151407,31853.151407
7,"POLYGON ((-86.82201 6.48076, -87.32201 7.34679...",7,2.477857,31763.257814,78704.808427,23154.40513,0.999918,31760.638419,31760.638419
8,"POLYGON ((-86.82201 8.21281, -87.32201 9.07884...",8,,31642.856236,,,,,
9,"POLYGON ((-85.32201 2.15063, -85.82201 3.01666...",9,-0.0,31938.178096,-0.0,-0.0,1.0,31938.178096,31938.178096
