# HW2/Final Project Template: Dataset Overview and Use Case Examples
## EDS 220, Fall 2022

The following is a template you can use for constructing your draft Jupyter notebooks demonstrating the features and use case examples for your chosen environmental datasets. I've included sections addressing the major themes that should be included, but there is also room for customization as well. 

Many of the resources provided are adapted from this template guide to notebook creation built for the "EarthCube" project:
https://github.com/earthcube/NotebookTemplates

## Just Keep Swimming: An Analysis of Global Flood Data 

## Authors

- Amrit Sandhu, UC Santa Barbara (aksandhu@ucsb.edu) <br>
- Elise Gonzales, UC Santa Barbara (efgonzales@ucsb.edu) <br>
- Lewis White, UC Santa Barbara (lewiswhite@ucsb.edu) <br>


## Table of Contents

Include a summary of the various sections included in your notebook, so that users can easily skip to a section of interest. It's also good to include hyperlinks to the different sections, so that clicking on the heading sends one to that section directly. Examples are below; see also this handy guide to adding hyperlinks to Jupyter notebooks:
https://medium.illumidesk.com/jupyter-notebook-little-known-tricks-b0866a558017

The major sections you'll need for HW2 - and your group project - are shown below:

[1. Purpose](#purpose)

[2. Dataset Description](#overview)

[3. Data I/O](#io)

[4. Metadata Display and Basic Visualization](#display)

[5. Use Case Examples](#usecases)

[6. Create Binder Environment](#binder)

[7. References](#references)

<a id='purpose'></a> 
### Notebook Purpose

State the overall purpose of the notebook. Again, this may seem obvious for this particular case, but can come in handy if you're using the notebook later on!


<a id='overview'></a> 
### Dataset Description

This portion of the notebook should contain a summary description of your chosen environmental dataset. In a few paragraphs, discuss:
- The creators of the dataset: NASA/NOAA/other government agency? Nonprofit? etc.
- Major characteristics of the dataset: global coverage? Spatial resolution? Temporal resolution? Creation date? 
- The file format(s) used to store the data: netCDF? CSV? Other?
- The source/archive you will be using to retrieve the data: Google Earth Engine? Agency data portal? Other API?
- Any known issues with data quality that might be expected to impact the results

Include links to any external resources needed to access the data here, including either the location of files stored on an external server you've set up or the access URL for a pre-existing repository. You can also include any example images you find useful for motivating the choice of dataset (optional).

**Here and throughout the notebook:** use a mix of markdown cells and code blocks to demonstrate your code. Markdown cells should be used to describe the purpose of the code blocks which follow them, but _do not replace_ comments within the code block! Make sure to include comments in the code as well illustrating the specific function of the various lines of code. Your later self - and other users - will thank you!

<a id='io'></a> 
### Dataset Input/Output 

Next, provide code to read in the data necessary for your analysis. This should be in the following order:

1) Import all necessary packages (matplotlib, numpy, etc)

2) Set any parameters that will be needed during subsequent portions of the notebook. Typical examples of parameters include:
- names of any directories where data are stored
- ranges of years over which data are valid
- any thresholds or latitude/longitude ranges to be used later (e.g. dimensions of NINO3.4 region, threshold SSTA values for El Nino, etc.)

3) Read in the data! If the data files are very large, you may want to consider subsetting the portion of files to be read in (see examples of this during notebooks provided in Weeks 7 and 8).

Here is a good rule of thumb: It's good to aim for a relatively short amount of time needed to read in the data, since otherwise we'll be sitting around waiting for things to load for a long time. A  minute or two for data I/O is probably the max you'll want to target!

In [2]:
# Import packages
import ee
import geemap
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Importing more packages
import cartopy
import cartopy.crs as ccrs
from geemap import cartoee

In [3]:
#ee.Authenticate()
ee.Initialize()

In [21]:
#retrieving the data from Earth Engine API
gfd = ee.ImageCollection("GLOBAL_FLOOD_DB/MODIS_EVENTS/V1") 

#checking out our data
#print(gfd)

#gfd.getInfo() 

In [24]:
## Map all floods to generate the satellite-observed historical flood plain.

Map = geemap.Map(center=[-90.2922, 29.4064], zoom=9)

Map.setOptions('SATELLITE')

Map.setCenter(-100, 40, 4)

gfdFloodedSum = gfd.select('flooded').sum();

durationPalette = ['C3EFFE', '1341E8', '051CB0', '001133', '#00020a']

Map.addLayer(
  gfdFloodedSum.selfMask(),
  {'min': 0, 'max': 10, 'palette': durationPalette},
  'GFD Satellite Observed Flood Plain');

Map

Map(center=[40, -100], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_t…

In [27]:
##Overlay permanent water to distinguish flood water.

jrc = gfd.select('jrc_perm_water').sum().gte(1);

Map.addLayer(
  jrc.selfMask(),
  {'min': 0, 'max': 1, 'palette': 'C3EFFE'},
  'JRC Permanent Water');

Map

Map(bottom=1851.0, center=[40, -100], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_titl…

In [32]:
#An individual flood event - flooding due to Hurricane Isaac in the USA.

hurricaneIsaacDartmouthId = 3977;

hurricaneIsaacUsa = ee.Image(gfd.filterMetadata('id', 'equals', hurricaneIsaacDartmouthId).first());

Map = geemap.Map(center=[-90.2922, 29.4064], zoom=9)

Map.setOptions('SATELLITE')

Map.setCenter(-90.2922, 29.4064, 8)


Map.addLayer(hurricaneIsaacUsa.select('flooded').selfMask(),
             {'min': 0, 'max': 1, 'palette': ['00FFFF', '0000FF']},
             'Hurricane Isaac - Inundation Extent')

Map

Map(center=[29.4064, -90.2922], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'z…

In [31]:
#The duration (number of days a flood event lasted).
durationPalette = ['C3EFFE', '1341E8', '051CB0', '001133', '#00020a'] #darker blue is longer d

Map.addLayer(
  hurricaneIsaacUsa.select('duration').selfMask(),
  {'min': 0, 'max': 4, 'palette': durationPalette},
  'Hurricane Isaac - Duration');

Map

Map(bottom=27463.0, center=[29.4064, -90.2922], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zo…

In [55]:
gfd.propertyNames().getInfo()
#gfd.get('date_range').getInfo()


['date_range',
 'period',
 'type_name',
 'max_mirrored_version',
 'keywords',
 'thumb',
 'description',
 'source_tags',
 'system:id',
 'provider_url',
 'title',
 'sample',
 'tags',
 'product_tags',
 'provider',
 'system:version',
 'visualization_0_name',
 'visualization_0_bands']

In [54]:
hurricaneIsaacUsa.propertyNames().getInfo()

['dfo_centroid_y',
 'dfo_main_cause',
 'gfd_country_name',
 'dfo_centroid_x',
 'system:id',
 'glide_index',
 'slope_threshold',
 'dfo_severity',
 'system:footprint',
 'threshold_b1b2',
 'system:version',
 'dfo_displaced',
 'id',
 'cc',
 'began',
 'dfo_validation_type',
 'composite_type',
 'system:time_end',
 'dfo_country',
 'countries',
 'dfo_other_country',
 'system:time_start',
 'dfo_dead',
 'gfd_country_code',
 'ended',
 'threshold_type',
 'threshold_b7',
 'system:asset_size',
 'system:index',
 'system:bands',
 'system:band_names']

In [59]:
#test = ee.Image(gfd.filterMetadata('id', 'equals', hurricaneIsaacDartmouthId).first());

collection = ee.ImageCollection("GLOBAL_FLOOD_DB/MODIS_EVENTS/V1").filterDate('2010-01-01', '2018-01-01')

filtered = collection.filter(ee.Filter.eq('countries', 'philippines'));

filtered.getInfo()

{'type': 'ImageCollection',
 'bands': [],
 'id': 'GLOBAL_FLOOD_DB/MODIS_EVENTS/V1',
 'version': 1641990150503727,
 'properties': {'date_range': [950745600000, 1544400000000],
  'period': 0,
  'type_name': 'ImageCollection',
  'max_mirrored_version': 1627592385154748,
  'keywords': ['c2s',
   'cloudtostreet',
   'dartmouth',
   'dfo',
   'flood',
   'gfd',
   'inundation',
   'surface',
   'water'],
  'thumb': 'https://mw1.google.com/ges/dd/images/GLOBAL_FLOOD_DB_MODIS_EVENTS_V1_0_thumb.png',
  'description': '<p>The Global Flood Database contains maps of the extent and\ntemporal distribution of 913 flood events occurring between 2000-2018. For more\ninformation, see\n<a href="https://doi.org/10.1038/s41586-021-03695-w">the associated journal article</a>.</p><p>Flood events were collected from\nthe <a href="https://floodobservatory.colorado.edu/">Dartmouth Flood Observatory</a>\nand used to collect MODIS imagery. The selected\n913 events are those that were successfully mapped (passed q

## BELOW CODE DOESN'T WORK

In [33]:
# Longitude, latitude of Philippines
philippines_lon = 121.7740
philippines_lat = 12.8797

# Create GEE point object for Phillipines lon/lat
philippines_poi = ee.Geometry.Point(philippines_lon, philippines_lat)

# Define radius within which to grab data
scale = 1000000000   #scale in m



# Load GPM image collection
flood_base = ee.ImageCollection("GLOBAL_FLOOD_DB/MODIS_EVENTS/V1")

# Select precipitation variable from GPM images
flooding = flood_base.select('flooded')

# Store precipitation around Arisaig, extracted from GPM, as a list object
flood_philippines = flooding.getRegion(philippines_poi, scale).getInfo()



# Turn precipitation information from GEE into a Pandas DataFrame
flood_philippines_df = pd.DataFrame(flood_philippines)

print(flood_philippines_df)



                                       0            1            2  \
0                                     id    longitude     latitude   
1     DFO_1586_From_20000218_to_20000301  4491.576421  4491.576421   
2     DFO_1587_From_20000217_to_20000311  4491.576421  4491.576421   
3     DFO_1595_From_20000405_to_20000425  4491.576421  4491.576421   
4     DFO_1614_From_20000711_to_20000810  4491.576421  4491.576421   
...                                  ...          ...          ...   
1822  DFO_4683_From_20180901_to_20181002 -4491.576421  4491.576421   
1823  DFO_4695_From_20181023_to_20181027 -4491.576421  4491.576421   
1824  DFO_4703_From_20181029_to_20181107 -4491.576421  4491.576421   
1825  DFO_4704_From_20181124_to_20181129 -4491.576421  4491.576421   
1826  DFO_4711_From_20181205_to_20181210 -4491.576421  4491.576421   

                  3        4  
0              time  flooded  
1      950832000000     None  
2      950745600000     None  
3      954892800000     None  
4   

In [34]:
# Assign the first entry in the data frame to a variable called "headers"
headers = flood_philippines_df.loc[0]  

# Look at what's in there
print(headers)     

0           id
1    longitude
2     latitude
3         time
4      flooded
Name: 0, dtype: object


In [35]:
# Make a new df out of the old one, but assigning the names we just retrieved as actual column headers
philippines_df = pd.DataFrame(flood_philippines_df.values[1:], columns=headers)      

# Make sure it worked
print(philippines_df)  

0                                     id    longitude     latitude  \
0     DFO_1586_From_20000218_to_20000301  4491.576421  4491.576421   
1     DFO_1587_From_20000217_to_20000311  4491.576421  4491.576421   
2     DFO_1595_From_20000405_to_20000425  4491.576421  4491.576421   
3     DFO_1614_From_20000711_to_20000810  4491.576421  4491.576421   
4     DFO_1627_From_20000830_to_20000910  4491.576421  4491.576421   
...                                  ...          ...          ...   
1821  DFO_4683_From_20180901_to_20181002 -4491.576421  4491.576421   
1822  DFO_4695_From_20181023_to_20181027 -4491.576421  4491.576421   
1823  DFO_4703_From_20181029_to_20181107 -4491.576421  4491.576421   
1824  DFO_4704_From_20181124_to_20181129 -4491.576421  4491.576421   
1825  DFO_4711_From_20181205_to_20181210 -4491.576421  4491.576421   

0              time flooded  
0      950832000000    None  
1      950745600000    None  
2      954892800000    None  
3      963273600000    None  
4      96

In [36]:
#adding datetime column with dates in ISO format
philippines_df['datetime'] = pd.to_datetime(philippines_df['time'], unit='ms') 

#making sure our datetime column looks good
print(philippines_df) 

0                                     id    longitude     latitude  \
0     DFO_1586_From_20000218_to_20000301  4491.576421  4491.576421   
1     DFO_1587_From_20000217_to_20000311  4491.576421  4491.576421   
2     DFO_1595_From_20000405_to_20000425  4491.576421  4491.576421   
3     DFO_1614_From_20000711_to_20000810  4491.576421  4491.576421   
4     DFO_1627_From_20000830_to_20000910  4491.576421  4491.576421   
...                                  ...          ...          ...   
1821  DFO_4683_From_20180901_to_20181002 -4491.576421  4491.576421   
1822  DFO_4695_From_20181023_to_20181027 -4491.576421  4491.576421   
1823  DFO_4703_From_20181029_to_20181107 -4491.576421  4491.576421   
1824  DFO_4704_From_20181124_to_20181129 -4491.576421  4491.576421   
1825  DFO_4711_From_20181205_to_20181210 -4491.576421  4491.576421   

0              time flooded   datetime  
0      950832000000    None 2000-02-18  
1      950745600000    None 2000-02-17  
2      954892800000    None 2000-04-

<a id='display'></a> 
### Metadata Display and Basic Visualization

Next, provide some example commands to take a quick look at what is in your dataset. We've done some things along these lines in class by now, but you should include at least one of:

- Metadata display: commands to indicate a) which variables are included in the dataset and their names; b) coordinate information associated with the data variables; c) other important metadata parameters (site names, etc); and d) any important information on missing data
- Basic visualization: a "quick and dirty" plot showing generally what the data look like. Depending on your dataset, this could be either a time series or a map (no fancy coordinate reference system/projection needed yet).

<a id='usecases'></a> 
### Use Case Examples

This is the "meat" of the notebook, and what will take the majority of the time to present in class. This section should provide:
1) A plain-text summary (1-2 paragraphs) of the use case example you have chosen: include the target users and audience, and potential applicability. 

2) Markdown and code blocks demonstrating how one walks through the desired use case example. This should be similar to the labs we've done in class: you might want to demonstrate how to isolate a particularly interesting time period, then create an image showing a feature you're interested in, for example.

3) A discussion of the results and how they might be extended on further analysis. For example, if there are data quality issues which impact the results, you could discuss how these might be mitigated with additional information/analysis.

Just keep in mind, you'll have roughly 20 minutes for your full presentation, and that goes surprisingly quickly! Probably 2-3 diagnostics is the most you'll be able to get through (you could try practicing with your group members to get a sense of timing).


<a id='binder'></a> 
### Create Binder Environment

The last step is to create a Binder environment for your project, so that we don't have to spend time configuring everyone's environment each time we switch between group presentations. Instructions are below:

 - Assemble all of the data needed in your Github repo: Jupyter notebooks, a README file, and any datasets needed (these should be small, if included within the repo). Larger datasets should be stored on a separate server, and access codes included within the Jupyter notebook as discussed above. 
 
 - Create an _environment_ file: this is a text file which contains information on the packages needed in order to execute your code. The filename should be "environment.yml": an example that you can use for the proper syntax is included in this template repo. To determine which packages to include, you'll probably want to start by displaying the packages loaded in your environment: you can use the command `conda list -n [environment_name]` to get a list.
 
 More information on environment files can be found here:
 https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#

 - Create Binder. Use http://mybinder.org to create a  URL for your notebook Binder (you will need to enter your GitHub repo URL). You can also add a Launch Binder button directly to your GitHub repo, by including the following in your README.md:

```
launch with myBinder
[![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/<path to your repo>)
```

<a id='references'></a> 
### References

List relevant references. Here are some additional resources on creating professional, shareable notebooks you may find useful:

1. Notebook sharing guidelines from reproducible-science-curriculum: https://reproducible-science-curriculum.github.io/publication-RR-Jupyter/
2. Guide for developing shareable notebooks by Kevin Coakley, SDSC: https://github.com/kevincoakley/sharing-jupyter-notebooks/raw/master/Jupyter-Notebooks-Sharing-Recommendations.pdf
3. Guide for sharing notebooks by Andrea Zonca, SDSC: https://zonca.dev/2020/09/how-to-share-jupyter-notebooks.html
4. Jupyter Notebook Best Practices: https://towardsdatascience.com/jupyter-notebook-best-practices-f430a6ba8c69
5. Introduction to Jupyter templates nbextension: https://towardsdatascience.com/stop-copy-pasting-notebooks-embrace-jupyter-templates-6bd7b6c00b94  
    5.1. Table of Contents (Toc2) readthedocs: https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html  
    5.2. Steps to install toc2: https://stackoverflow.com/questions/23435723/installing-ipython-notebook-table-of-contents
6. Rule A, Birmingham A, Zuniga C, Altintas I, Huang SC, et al. (2019) Ten simple rules for writing and sharing computational analyses in Jupyter Notebooks. PLOS Computational Biology 15(7): e1007007. https://doi.org/10.1371/journal.pcbi.1007007. Supplementary materials: example notebooks (https://github.com/jupyter-guide/ten-rules-jupyter) and tutorial (https://github.com/ISMB-ECCB-2019-Tutorial-AM4/reproducible-computational-workflows)
7. Languages supported by Jupyter kernels: https://github.com/jupyter/jupyter/wiki/Jupyter-kernels
8. EarthCube notebooks presented at EC Annual Meeting 2020: https://www.earthcube.org/notebooks
9. Manage your Python Virtual Environment with Conda: https://towardsdatascience.com/manage-your-python-virtual-environment-with-conda-a0d2934d5195
10. Venv - Creation of Virtual Environments: https://docs.python.org/3/library/venv.html