# HW2/Final Project Template: Dataset Overview and Use Case Examples
## EDS 220, Fall 2022

Many of the resources provided are adapted from this template guide to notebook creation built for the "EarthCube" project:
https://github.com/earthcube/NotebookTemplates

## Exploring US Marine Protected Areas


#### Authors
- Colleen McCamy, Master of Environmental Data Science Student, (colleenmccamy@bren.ucsb.edu)
- Atahualpa Gomez, Master of Environmental Data Science Student, (atahualpa@ucsb.edu)
- Jared Petry, Master of Environmental Data Science Student, (jaredpetry@ucsb.edu)
- Adelaide Robinson, Master of Environmental Data Science Student, (adelaide_robinson@ucsb.edu)

## Table of Contents

[1. Purpose](#purpose)

[2. Dataset Description](#overview)

[3. Data I/O](#io)

[4. Metadata Display and Basic Visualization](#display)

[5. Use Case Examples](#usecases)

[6. Create Binder Environment](#binder)

[7. References](#references)

<a id='purpose'></a> 
### Notebook Purpose

State the overall purpose of the notebook. Again, this may seem obvious for this particular case, but can come in handy if you're using the notebook later on!


<a id='overview'></a> 
# Data Description
The dataset is part of the MPA inventory that is created and managed by the National Oceanic and Atmosphere Administration (NOAA). The inventory indicates Marine Protected Area (MPA) boundaries in the US waters and provides additional classifications and attributes for each MPA. The data are updated annually and synthesizes an in depth [classification system](https://nmsmarineprotectedareas.blob.core.windows.net/marineprotectedareas-prod/media/docs/20200715-mpa-classification.pdf) to better understand impacts and uses of MPAs.


## Scope
The data spatially cover US waters throughout the contiguous US, Hawai'i, Alaska, and US Territories. All of the MPA boundaries adhere to the IUCN protected area definition of, "A clearly defined geographical space, recognized, dedicated, and managed, through legal or other effective means, to achieve the long-term conservation of nature with associated ecosystem services and cultural values." Additional data on boundaries that do not meet the IUCN definition, such as fishery management sites and water quality areas, are available at [Protected Seas database](https://mpa.protectedseas.net/). To access these data at Protected Seas, a login and approval is needed.

The geographic extent of the data expand from -180 to 180 longitudinal and -15.386142 to 74.709769 latitudinal and is projected using the World Eckert IV projection and GCS WGS 1984 coordinate reference system.


## File Types
Upon downloading the data you will receive a zipped folder containing a geodatabase, an excel file, and a PDF. The geodatabase contains the MPA polygon data and associated attributes. The excel file is an index associating site IDs with site names and the PDF contains the metadata which includes information of the different attributions and the meanings for each code and variables. 

For the analysis conducted below, the geodatabase was used however outside of Python the metadata pdf and excel file naming index were referenced.

## Data Retreival
There are many different ways to retrieve the data from the MPA Inventory. The following notebook outlines the process in downloading the data directly from the [inventory website](https://marineprotectedareas.noaa.gov/dataanalysis/mpainventory/) using [this link](https://marineprotectedareas.noaa.gov/media/data/NOAA_MPAI_2020_IUCN_gdb.zip), unzipping the file locally on a computer  and then reading the file in to the notebook with the geopandas package.

There are many other ways to read in this zipped data into the notebook, but we found this way to be the quickest.

## Included in the Data
The data provides simple polygons for each MPA boundary and does not include any topological information. For each MPA, additional attributes are includes to describe the management and focus of the area. These attributes include what types of activities are allowed or prohibited in the MPA, if the MPA is temporary or permanent, the level of ecological and cultural protection, and applicable management of the MPA.


## Assumptions
The data are intended for use pertaining to exploring the status and trends of MPAs, create customized data visualizations and spatial analysis, or add MPAs to data portals or online platforms. This data within the MPA inventory are not designed to be used for regulatory purposes and official boundary descriptions are available through the state and federal code.

It is also important to note that since 2020 only data that aligns with the IUCN definitions are included. Before the 2020 version, the MPA Inventory included data on sites that do not adhere to the IUCN definition. If comparing the current MPA Inventory data to analysis using MPA Inventory data prior to the 2020 version, it is important to take in account these differences.

Additional questions about the data? Contact the GIS Manager at the National Marine Protected Areas Center, part of NOAA, at MPAINVENTORY@NOAA.GOV.

<a id='io'></a> 
### Dataset Input/Output 

Next, provide code to read in the data necessary for your analysis. This should be in the following order:

1) Import all necessary packages (matplotlib, numpy, etc)

2) Set any parameters that will be needed during subsequent portions of the notebook. Typical examples of parameters include:
- names of any directories where data are stored
- ranges of years over which data are valid
- any thresholds or latitude/longitude ranges to be used later (e.g. dimensions of NINO3.4 region, threshold SSTA values for El Nino, etc.)

3) Read in the data! If the data files are very large, you may want to consider subsetting the portion of files to be read in (see examples of this during notebooks provided in Weeks 7 and 8).

Here is a good rule of thumb: It's good to aim for a relatively short amount of time needed to read in the data, since otherwise we'll be sitting around waiting for things to load for a long time. A  minute or two for data I/O is probably the max you'll want to target!

In [None]:
#import packages 
import geopandas as gpd
import folium
import matplotlib
import numpy as np
import scipy
from datetime import datetime


## Reading in the data
We are storing the data on a shared folder on Taylor, and then reading it through the GeoPandas Package. 

In [None]:
MPA = gpd.read_file("../data/NOAA_MPAI_v2020.gdb") #need to change this to file path on taylor

<a id='display'></a> 
# Metadata and Data Visualization

## Meta Data


Fields Details for the Object NOAA_MPAI_2020_IUCN.
| Field    | OBJECTID| Data Type | Field description | Description of Values |
| ---------|---------|-----------|------------------ |-------------------|
| OBJECTID |OBJECTID |OID      | Internal feature number| Sequential unique whole numbers that are automatically generated.|
| Anchor   | Anchor |String        | Describes whether anchoring in the site is permitted              |Prohibited, Unrestricted , Restricted|
|Permanence|Permanence |  String | Classification of the permanence of the site| Conditional, Permanent, Temporary. |
| Prot_Focus| Protection Focus|String| Ecological scale of site conservation targets| Focal Resource, Ecosystem.|
|Mgmt_Plan|Management Plan| String| The type of management plan developed for the site| Non-MPA Programmatic Species Management Plan, MPA Programmatic Management Plan, No Management Plan,Non-MPA Programmatic Habitat Management Plan, Site-Specific Management Plan, Non-MPA Programmatic Fisheries Management Plan, Community Agreement.|
|Prot_Lv | Level of Protection| String| Level of legal protection afforded to the site's natural and cultural resources and ecological processes| Uniform Multiple Use, Zoned Multiple Use,  No Impact, No Take,  No Access, To Be Determined, Zoned w/No Take Areas.|
|Field_Vesse| Vessel| String|Describes if vessel access is allowed within the MPA|Restricted, Prohibited, Unrestricted.|
|State|State|String|State name of MPA, or Program, if MPA is federal||
|Shape_Area|Shape Area| Double| Area of feature in internal units squared|Positive real numbers that are automatically generated.|
|Estab_Yr| Year Estabilished| Integer| The year the site was officially designated or established|Positive numbers, or 0 if unknown.|
|Fish_Rstr| Fishing Restrictions| String|Level of restrictions on commercial and/or recreational fishing| Commercial and Recreational Fishing Restricted , Recreational Fishing Prohibited,Commercial Fishing Restricted, Commercial Fishing Prohibited and Recreational Fishing Restricted, Recreational Fishing Restricted, Commercial Fishing Restricted and Recreational Fishing Prohibited, Commercial Fishing Prohibited, Commercial and Recreational Fishing Prohibited, Unknown, No Site Restrictions.|
|Mgmt_Agen|Management Agency| String|Agency responsible for managing the site||
|Pri_Con_Fo| Primary Conservation Focus| String|Represents the primary characteristics of the area that the MPA was established to conserve| |
|Shape_Length| Shape_Length| Double| Length of feature in internal units| Positive real numbers that are automatically generated.|
|Site_ID| Site_ID| String| Unique site identifier assigned by MPAC|Consists of government level identifier combined with unique number value|
|Field_Shape| Shape| Geometry| Feature geometry| Coordinates defining the features.|
|Site_Name| Site Name| String| Official name of the MPA| |
|Constancy| Constancy| String| Classification of the constancy of the site throughout the year| Year-Round, Rotating, Seasonal.|
|IUCNcat|IUCN Category| String| IUCN category assigned to sites that meet the international IUCN protected area definition.| Ia (Strict nature reserve), Ib(Wilderness area), II (National Park), III (Natural monument and Natural feature), IV (Habitat management area and Species management area), V(Protected landscape and Protected seascape), VI (Protected Area with sustainable use of natural resources)|
|Category| Categories| String|Categorical assignment that separates sites into groups based on whether they are IUCN MPAs, Fishery Management Areas, Water Quality/Human Health, Potential 'Other Effective Conservation Measure" sites or Other. This field can be used to sort and filter out IUCN MPAs from other kinds of managed areas and will be used in the future to upgrade potential OECMs.| |
|Design| Designation| String| Site designation derived from site name|Derived from name given to the site from its managing agency.|
|URL| Website| String| Website Link| |
|WDPA_Cd| WDPA_Cd| Integer|Unique identifier code assigned to the site in the World Database of Protected Areas| |
|ProSeasID|ProSeasID|String|Unique identifier code assigned to the same site in the ProtectedSeas database.|ProtectedSeas.net|
|AreaMar| Marine Area (sqkm)| Double| Area of each site that does not include any land areas (exception of small islands, seastacks, atolls, etc). Calculated in World Eckert IV by erasing the terrestrial portion as defined by a moderate resolution national shoreline vector| |
|AreaKm| AreaKm| Single| Total area of site. This includes land area when site is mixed land/sea| |
|MarPercent| Marine Percent| Integer| Calculated from total area and marine area values using World Eckert IV projection| Calculated from total area and marine area values using World Eckert IV projection.|
|AreaNT| No Take Area (sqkm)| Double| No Take area calculated for each site in square kilometers (based off World Eckert IV projection).| Calculated based on no-take area per site.|
|Marine|Marine| String| Categorical assignment based on percent area in the marine environment designed to filter out areas that are wholly marine, that are mostly land but intersect the shoreline or that are mixed marine and terrestrial| Marine, Mixed, Interface. | 
|Cons_Focus| Conservation Focus| String| All characteristics of the area the site was established to conserve| Natural Heritage;  Natural Heritage, Cultural Heritage and Sustainable Production; Natural Heritage and Cultural Heritage; Natural Heritage and Sustainable Production;  Sustainable Production; Cultural Heritage and Sustainable Production; C|Cons_Focus| Conservation Focus| String| All characteristics of the area the site was established to conserve| Natural Heritage;  Natural Heritage, Cultural Heritage and Sustainable Production; Natural Heritage and Cultural Heritage; Natural Heritage and Sustainable Production;  Sustainable Production; Cultural Heritage and Sustainable Production; Cultural Heritage|
|Gov_Level| Level of Government| String| Level of government responsible for designating and managing the site| Territorial, Local, State, Federal, Partnership.|

In [None]:
MPA.head() #shows a prieview of the data

In [None]:
# Analyzes numeric columns by default
MPA.describe(include=object)

In [None]:
#Counting the number of MPA per State
MPA["State"].value_counts()

In [None]:
#summarize by Level of Goverment 
MPA["Gov_Level"].value_counts()

In [None]:
#CRS information
MPA.crs

In [None]:

# Returns a DataFrame with columns minx, miny, maxx, maxy values containing the bounds for each geometry.
print(MPA.total_bounds)

# Data Visualization
We chose folium for our data visualization. The maps below include two visualizations of the data, one zoomed out to show more of the extent of the data and another zoomed in on the Santa Barbara area.

In [None]:
MPA = MPA[MPA.Site_Name != "Papahanaumokuakea Marine National Monument"] #taking out row that causes issues

MPA = MPA.to_crs(epsg=4326) #changing crs to one that works with folium

m = folium.Map(location=[40.956705, -100.278378], zoom_start=3.5) #zoom us

#for loop that changes file type, and adds to folium map
for _, r in MPA.iterrows():
    sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001) #simplify geometry
    geo_j = sim_geo.to_json() #
    geo_j = folium.GeoJson(data=geo_j,
                           style_function=lambda x: {'fillColor': 'blue'}) #setting style
    folium.Popup(r['Site_Name']).add_to(geo_j)
    geo_j.add_to(m)

m

In [None]:
ma = folium.Map(location=[34.420830, -119.698189], zoom_start=8) #zoom to santa barbara
for _, r in MPA.iterrows():
    sim_geo = gpd.GeoSeries(r['geometry']).simplify(tolerance=0.001)
    geo_j = sim_geo.to_json()
    geo_j = folium.GeoJson(data=geo_j,
                           style_function=lambda x: {'fillColor': 'blue'})
    folium.Popup(r['Site_Name']).add_to(geo_j)
    geo_j.add_to(ma)
ma

<a id='usecases'></a> 
### Use Case Examples

This is the "meat" of the notebook, and what will take the majority of the time to present in class. This section should provide:
1) A plain-text summary (1-2 paragraphs) of the use case example you have chosen: include the target users and audience, and potential applicability. 

2) Markdown and code blocks demonstrating how one walks through the desired use case example. This should be similar to the labs we've done in class: you might want to demonstrate how to isolate a particularly interesting time period, then create an image showing a feature you're interested in, for example.

3) A discussion of the results and how they might be extended on further analysis. For example, if there are data quality issues which impact the results, you could discuss how these might be mitigated with additional information/analysis.

Just keep in mind, you'll have roughly 20 minutes for your full presentation, and that goes surprisingly quickly! Probably 2-3 diagnostics is the most you'll be able to get through (you could try practicing with your group members to get a sense of timing).


<a id='binder'></a> 
### Create Binder Environment

The last step is to create a Binder environment for your project, so that we don't have to spend time configuring everyone's environment each time we switch between group presentations. Instructions are below:

 - Assemble all of the data needed in your Github repo: Jupyter notebooks, a README file, and any datasets needed (these should be small, if included within the repo). Larger datasets should be stored on a separate server, and access codes included within the Jupyter notebook as discussed above. 
 
 - Create an _environment_ file: this is a text file which contains information on the packages needed in order to execute your code. The filename should be "environment.yml": an example that you can use for the proper syntax is included in this template repo. To determine which packages to include, you'll probably want to start by displaying the packages loaded in your environment: you can use the command `conda list -n [environment_name]` to get a list.
 
 More information on environment files can be found here:
 https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#

 - Create Binder. Use http://mybinder.org to create a  URL for your notebook Binder (you will need to enter your GitHub repo URL). You can also add a Launch Binder button directly to your GitHub repo, by including the following in your README.md:

```
launch with myBinder
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/<path to your repo>)
```

<a id='references'></a> 
### References

List relevant references. Here are some additional resources on creating professional, shareable notebooks you may find useful:

1. Notebook sharing guidelines from reproducible-science-curriculum: https://reproducible-science-curriculum.github.io/publication-RR-Jupyter/
2. Guide for developing shareable notebooks by Kevin Coakley, SDSC: https://github.com/kevincoakley/sharing-jupyter-notebooks/raw/master/Jupyter-Notebooks-Sharing-Recommendations.pdf
3. Guide for sharing notebooks by Andrea Zonca, SDSC: https://zonca.dev/2020/09/how-to-share-jupyter-notebooks.html
4. Jupyter Notebook Best Practices: https://towardsdatascience.com/jupyter-notebook-best-practices-f430a6ba8c69
5. Introduction to Jupyter templates nbextension: https://towardsdatascience.com/stop-copy-pasting-notebooks-embrace-jupyter-templates-6bd7b6c00b94  
    5.1. Table of Contents (Toc2) readthedocs: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html  
    5.2. Steps to install toc2: https://stackoverflow.com/questions/23435723/installing-ipython-notebook-table-of-contents
6. Rule A, Birmingham A, Zuniga C, Altintas I, Huang SC, et al. (2019) Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLOS Computational Biology 15(7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007. Supplementary materials: example notebooks (https://github.com/jupyter-guide/ten-rules-jupyter) and tutorial (https://github.com/ISMB-ECCB-2019-Tutorial-AM4/reproducible-computational-workflows)
7. Languages supported by Jupyter kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
8. EarthCube notebooks presented at EC Annual Meeting 2020: https://www.earthcube.org/notebooks
9. Manage your Python Virtual Environment with Conda: https://towardsdatascience.com/manage-your-python-virtual-environment-with-conda-a0d2934d5195
10. Venv - Creation of Virtual Environments: https://docs.python.org/3/library/venv.html