# HW2/Final Project Template: Dataset Overview and Use Case Examples
## EDS 220, Fall 2022

The following is a template you can use for constructing your draft Jupyter notebooks demonstrating the features and use case examples for your chosen environmental datasets. I've included sections addressing the major themes that should be included, but there is also room for customization as well. 

Many of the resources provided are adapted from this template guide to notebook creation built for the "EarthCube" project:
https://github.com/earthcube/NotebookTemplates

## An Analysis of Global Flood Data 

## Authors

- Amrit Sandhu, UC Santa Barbara (aksandhu@ucsb.edu) <br>
- Elise Gonzales, UC Santa Barbara (efgonzales@ucsb.edu) <br>
- Lewis White, UC Santa Barbara (lewiswhite@ucsb.edu) <br>


## Table of Contents

Include a summary of the various sections included in your notebook, so that users can easily skip to a section of interest. It's also good to include hyperlinks to the different sections, so that clicking on the heading sends one to that section directly. Examples are below; see also this handy guide to adding hyperlinks to Jupyter notebooks:
https://medium.illumidesk.com/jupyter-notebook-little-known-tricks-b0866a558017

The major sections you'll need for HW2 - and your group project - are shown below:

[1. Purpose](#purpose)

[2. Dataset Description](#overview)

[3. Data I/O](#io)

[4. Metadata Display and Basic Visualization](#display)

[5. Use Case Examples](#usecases)

[6. Create Binder Environment](#binder)

[7. References](#references)

<a id='purpose'></a> 
### Notebook Purpose

The goal of our project is to map the flood events in the Philippines ranging from year 2000 to 2018. The Philippines is a series of thousands of islands in Southeast Asia with multiple climates such as tropical rainforest, tropical monsoon, tropical savanna, humid subtropical and oceanic. The country is exposed to many earthquakes on a daily basis and is surrounded by many active volcanos. Climate change has led to more extreme weather patterns including heavier rainfall and storm surges. A series of tropical storms over the years have led to mass destruction and loss of human life. 


<a id='overview'></a> 
### Dataset Description

The Global Food Database was created by the Dartmouth Flood Observatory (DFO) and Cloud to Street with the intent of being utilized for global flood risk management and mitigation. This ImageCollection was funded by Google Earth Outreach, and includes 18 years (2000-2018) of satellite-based flood data available on Google Earth Engine. The dataset includes 913 flood events using 12,719 scenes that that were successfully mapped to include the extent and temporal distribution using Terra and Aqua MODIS sensors. Each of the events are represented as an image in the ImageCollection and can be filtered by date, country, and DFO “original ID”.

The dataset includes many image properties such as estimated fatalities, country code, main cause of flood, the centroid longitude and latitude, among others. An interesting image property that the visualizations will focus on is the severity of a flood event. It is broken down into 3 categories; 1 represents large floods that had significant damage and at least a 5-to-15-year interval since the last large flood, 1.5 is assigned to very large events with greater than 15 years but less than 100 year recurrence interval, and 2 denotes extreme events that have a recurrence interval of more than 100 years. 

The bands in the dataset include duration of flood along with a value of 0 or 1 to represent whether or not there was a flood event. Data on cloud coverage is displayed as the number of cloud-free observations in days between the start and end day of each event. In addition, cloud coverage has been assigned a percentage of clear view observations during a given flood event.

Useful links:
Google Earth Engine - Global Flood Database v1 (2000-2018)
Global Flood Database Website
“Satellite imaging reveals increased proportion of population exposed to floods” (Tellman et al. 2021)

<a id='io'></a> 
### Dataset Input/Output 

Next, provide code to read in the data necessary for your analysis. This should be in the following order:

1) Import all necessary packages (matplotlib, numpy, etc)

2) Set any parameters that will be needed during subsequent portions of the notebook. Typical examples of parameters include:

3) Read in the data! If the data files are very large, you may want to consider subsetting the portion of files to be read in (see examples of this during notebooks provided in Weeks 7 and 8).

In [1]:
# Import packages
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

In [2]:
#authenticate and initialize google earth engine API

#ee.Authenticate()
ee.Initialize()

In [3]:
#retrieving the data from Earth Engine API. 

#gfd stands for global flood data
gfd = ee.ImageCollection("GLOBAL_FLOOD_DB/MODIS_EVENTS/V1") 

#checking out our data
#print(gfd)

#gfd.getInfo() 


### Metadata Display and Basic Visualization

Next, provide some example commands to take a quick look at what is in your dataset. We've done some things along these lines in class by now, but you should include at least one of:

- Metadata display: commands to indicate a) which variables are included in the dataset and their names; b) coordinate information associated with the data variables; c) other important metadata parameters (site names, etc); and d) any important information on missing data
- Basic visualization: a "quick and dirty" plot showing generally what the data look like. Depending on your dataset, this could be either a time series or a map (no fancy coordinate reference system/projection needed yet).

In [4]:
## Map all floods to generate the satellite-observed historical flood plain.

#initialize map, centered around kansas ~ the geographic centroid of the USA
Map = geemap.Map(center=[-100, 40], zoom=4)

#using a satellite true color map
Map.setOptions('SATELLITE')

#set center of the map
Map.setCenter(-100, 40, 4)

#sum all of the flood events to get the entire historical flood plain
gfdFloodedSum = gfd.select('flooded').sum();

#create a palette for the raster data. Here, darker blue represents longer flood duration. 
durationPalette = ['C3EFFE', '1341E8', '051CB0', '001133', '#00020a']

#add the flood data layer to our previously initialized map
Map.addLayer(
  gfdFloodedSum.selfMask(),
  {'min': 0, 'max': 10, 'palette': durationPalette})

#plot the map
Map

Map(center=[40, -100], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…

In [5]:
##Overlay permanent water to distinguish flood water from permanent water. 

#filters to just perm water. gte(1) means 
perm_water = gfd.select('jrc_perm_water').sum()

#adding the perm water later (a light cyan color) to the map
Map.addLayer(
  perm_water.selfMask(),
  {'min': 0, 'max': 1, 'palette': 'C3EFFE'},
  'JRC Permanent Water');

#plotting the map
Map

Map(bottom=1851.0, center=[40, -100], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_titl…

<a id='usecases'></a> 
### Use Case Examples

This is the "meat" of the notebook, and what will take the majority of the time to present in class. This section should provide:
1) A plain-text summary (1-2 paragraphs) of the use case example you have chosen: include the target users and audience, and potential applicability. 

2) Markdown and code blocks demonstrating how one walks through the desired use case example. This should be similar to the labs we've done in class: you might want to demonstrate how to isolate a particularly interesting time period, then create an image showing a feature you're interested in, for example.

3) A discussion of the results and how they might be extended on further analysis. For example, if there are data quality issues which impact the results, you could discuss how these might be mitigated with additional information/analysis.

Just keep in mind, you'll have roughly 20 minutes for your full presentation, and that goes surprisingly quickly! Probably 2-3 diagnostics is the most you'll be able to get through (you could try practicing with your group members to get a sense of timing).


<a id='binder'></a> 
### Create Binder Environment

The last step is to create a Binder environment for your project, so that we don't have to spend time configuring everyone's environment each time we switch between group presentations. Instructions are below:

 - Assemble all of the data needed in your Github repo: Jupyter notebooks, a README file, and any datasets needed (these should be small, if included within the repo). Larger datasets should be stored on a separate server, and access codes included within the Jupyter notebook as discussed above. 
 
 - Create an _environment_ file: this is a text file which contains information on the packages needed in order to execute your code. The filename should be "environment.yml": an example that you can use for the proper syntax is included in this template repo. To determine which packages to include, you'll probably want to start by displaying the packages loaded in your environment: you can use the command `conda list -n [environment_name]` to get a list.
 
 More information on environment files can be found here:
 https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#

 - Create Binder. Use http://mybinder.org to create a  URL for your notebook Binder (you will need to enter your GitHub repo URL). You can also add a Launch Binder button directly to your GitHub repo, by including the following in your README.md:

```
launch with myBinder
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/<path to your repo>)
```

<a id='references'></a> 
### References

List relevant references. Here are some additional resources on creating professional, shareable notebooks you may find useful:

1. Notebook sharing guidelines from reproducible-science-curriculum: https://reproducible-science-curriculum.github.io/publication-RR-Jupyter/
2. Guide for developing shareable notebooks by Kevin Coakley, SDSC: https://github.com/kevincoakley/sharing-jupyter-notebooks/raw/master/Jupyter-Notebooks-Sharing-Recommendations.pdf
3. Guide for sharing notebooks by Andrea Zonca, SDSC: https://zonca.dev/2020/09/how-to-share-jupyter-notebooks.html
4. Jupyter Notebook Best Practices: https://towardsdatascience.com/jupyter-notebook-best-practices-f430a6ba8c69
5. Introduction to Jupyter templates nbextension: https://towardsdatascience.com/stop-copy-pasting-notebooks-embrace-jupyter-templates-6bd7b6c00b94  
    5.1. Table of Contents (Toc2) readthedocs: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html  
    5.2. Steps to install toc2: https://stackoverflow.com/questions/23435723/installing-ipython-notebook-table-of-contents
6. Rule A, Birmingham A, Zuniga C, Altintas I, Huang SC, et al. (2019) Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLOS Computational Biology 15(7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007. Supplementary materials: example notebooks (https://github.com/jupyter-guide/ten-rules-jupyter) and tutorial (https://github.com/ISMB-ECCB-2019-Tutorial-AM4/reproducible-computational-workflows)
7. Languages supported by Jupyter kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
8. EarthCube notebooks presented at EC Annual Meeting 2020: https://www.earthcube.org/notebooks
9. Manage your Python Virtual Environment with Conda: https://towardsdatascience.com/manage-your-python-virtual-environment-with-conda-a0d2934d5195
10. Venv - Creation of Virtual Environments: https://docs.python.org/3/library/venv.html