<div><img style="float: left; padding-right: 3em;" src="https://avatars.githubusercontent.com/u/19476722" width="150" /><div/>

# Earth Data Science Coding Challenge!
Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is **readable** and **reproducible**. 

## Don't get **caught** by these Jupyter notebook gotchas

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*o0HleR7BSe8W-pTnmucqHA.jpeg" width=300 style="padding: 1em; border-style: solid; border-color: grey;" />

  > *Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36*

These are the most common issues that will keep you from getting started and delay your code review:

1. When you try to run some code on GitHub Codespaces, you may be prompted to select a **kernel**.
   * The **kernel** refers to the version of Python you are using
   * You should use the **base** kernel, which should be the default option. 
   * You can also use the `Select Kernel` menu in the upper right to select the **base** kernel
2. Before you commit your work, make sure it runs **reproducibly** by clicking:
   1. `Restart` (this button won't appear until you've run some code), then
   2. `Run All`

## Check your code to make sure it's clean and easy to read

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSO1w9WrbwbuMLN14IezH-iq2HEGwO3JDvmo5Y_hQIy7k-Xo2gZH-mP2GUIG6RFWL04X1k&usqp=CAU" height=200 />

* Format all cells prior to submitting (right click on your code).
* Use expressive names for variables so you or the reader knows what they are. 
* Use comments to explain your code -- e.g. 
  ```python
  # This is a comment, it starts with a hash sign
  ```

## Label and describe your plots

![Source: https://xkcd.com/833](https://imgs.xkcd.com/comics/convincing.png)

Make sure each plot has:
  * A title that explains where and when the data are from
  * x- and y- axis labels with **units** where appropriate
  * A legend where appropriate


## Icons: how to use this notebook
We use the following icons to let you know when you need to change something to complete the challenge:
  * &#128187; means you need to write or edit some code.
  
  * &#128214;  indicates recommended reading
  
  * &#9998; marks written responses to questions
  
  * &#127798; is an optional extra challenge
  

---

# Introduction to Multispectral Remote Sensing Data: Urban Green Space

For this assignment, you will visualize and quantify differences in vegetation health by neighborhood in Chicago, IL.

We will be developing this code over several weeks in order to practice writing **modular** code. Last week, you should have:
1. Selected two neighborhoods
2. FOR EACH neighborhood:
   1. Downloaded NAIP multispectral data for the neighborhood
   2. Calculated NDVI
   3. Calculated and save summary statistics of to a file

This week, you will:
1. Add **caching** and **garbage collection** to your analysis from the previous week. This will make sure you are making effective use of your disk space and internet connection.
2. Modularize the workflow using **functions** and/or **classes**
3. Run the workflow for all the Chicago neighborhoods and create a cloropleth plot using the summary statistics you calculated.

You should also create a portofolio piece focusing on a different city

## STEP 1: Get set up

### Package imports
Use the cell below to import the packages you need in the rest of the notebook (and **ONLY** the packages you need in the rest of the notebook).

You may also want to use this cell to define the **earth analytics data directory** and make sure it exists.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 2: Area of Interest

### Select a small number of neighborhoods to test your code and loops

In the cell below, download **and cache** a shapefile of the City of Chicago neighborhoods from [the City of Chicago Data Portal](https://data.cityofchicago.org/).

To cache downloads and calculations, you will need to use a **conditional statement**, like the following example code where `condition` is some boolean value you computed:

```python
if condition:
    do_something()
```

Note that, like for `for` loops, conditionals use **indentation** to determine what happens only when the condition is `True` and what happens no matter what.

Conditional statements can also have multiple parts, although you won't need that for this first caching step:

```python
if condition1:
    do_something()
elif condition2:
    do_something_else()
else:
    do_yet_another_thing()
```

YOUR TASK:
1. IF you don't have a City of Chicago neighborhood file saved already:
   1. Download and open up the shapefile
   2. Save it to a file using the `.to_file()` method of `GeoDataFrame`s (or some other method from earlier in the semester)
2. Load in the City of Chicago dataset from a file
3. Check that your caching is working. One way to do this is to make sure that you print something to indicate when the download is happening.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 3: Download and process raster data

You should have three loops from last week. Convert the operations from each loop into a **function**, starting with the following sample code:

```python
def download_neighborhood_data(name, geometry, start, end)
    """
    Download NAIP raster for a given geometry, start date, and end date

    Parameters
    ==========
    name : str
      The name used to label the download
    geometry : shapely.POLYGON
      The geometry to derive the download extent from. 
      Must have a `.bounds` attribute.
    start : str
      The start date as 'YYYY-MM-DD'
    end : str
      The end date as 'YYYY-MM-DD'

    Returns
    =======
    downloader : earthpy.earthexplorer.EarthExplorerDownloader
      Object with information about the download, including the data directory.
    """
    <Put your code here>
    return downloader

for neighborhood_name, details in neigh_gdf.interrows():
    download_neighborhood_data(neighborhood_name, details.geometry)

```
One important step of writing function is identifying the **Parameters** and **Returns**. In this case, I have done this for you; for later functions you will need to do this yourself. One way to identify the Parameters is to identify each object or variable used in the code (note that this does not usually include imported classes and functions). 

I am also supplying you with a **docstring** that explains the Parameters and Returns, and specifies their types. Update the docstring if you decide to do something different for your function. When writing docstrings, please follow the [numpy docstring styleguide](https://numpydoc.readthedocs.io/en/latest/format.html#sections)

YOUR TASK:

1. Replace `<Put your code here>` with the download code from last week
2. Open up your summary statistics file, if it exists.
3. Add a **conditional** to your code so that it will skip this download if the summary statistics **already exist** in your summary statistics file!
   
    > HINT: I did this using the `pass` statement, which moves on to the next iteration of the loop. This way you can test if the statistics **do** exist in the file, rather than whether they **do not**. However, there are lots of ways to do this -- do what makes sense to you!
    
4. Test that the code still works for the two-neighborhood `GeoDataFrame`. You should also check that the caching is working (although you may need to wait until you have saved some statistics to do this!)

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR TASK: 

1. Write a function for the loop that loads and merges the arrays.
2. Document your function with a docstring
3. Check that your function works for the Lincoln Park neighborhood

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR TASK:

1. Write a function that computes the NDVI summary statistics and adds them to the statistics file (if the statistics are not already present)
    > HINT: use `mode='a'` to *append* a line to the file instead of writing over existing content
    
2. Document your function with a docstring
3. Check that your function works for the Lincoln Park Neighborhood

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

Putting in all together... YOUR TASK:

1. Create a loop. Start off with just the two neighborhood `GeoDataFrame`.
2. Run each of your functions in the loop, checking that they work. **MAKE SURE YOU INCLUDE CACHING CODE!**
3. Write a line of code at the end of your loop to **delete the raster data files** once you have saved the statistics you want, checking that it works. Use the `shutil.rmtree()` function.
4. Replace the two neighborhood `GeoDataFrame` with the full Chicago `GeoDataFrame`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 4: Plot

YOUR TASK:
1. Join your `GeoDataFrame` of Chicago neighborhoods with your NDVI statistics `DataFrame`
2. Create a Chloropleth plot using one of the statistics for the color scale
3. Write a plot headline and description.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()