<div><img style="float: left; padding-right: 3em;" src="https://avatars.githubusercontent.com/u/19476722" width="150" /><div/>

# Earth Data Science Coding Challenge!
Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is **readable** and **reproducible**. 

## Don't get **caught** by these Jupyter notebook gotchas

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*o0HleR7BSe8W-pTnmucqHA.jpeg" width=300 style="padding: 1em; border-style: solid; border-color: grey;" />

  > *Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36*

These are the most common issues that will keep you from getting started and delay your code review:

1. When you try to run some code on GitHub Codespaces, you may be prompted to select a **kernel**.
   * The **kernel** refers to the version of Python you are using
   * You should use the **base** kernel, which should be the default option. 
   * You can also use the `Select Kernel` menu in the upper right to select the **base** kernel
2. Before you commit your work, make sure it runs **reproducibly** by clicking:
   1. `Restart` (this button won't appear until you've run some code), then
   2. `Run All`

## Check your code to make sure it's clean and easy to read

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSO1w9WrbwbuMLN14IezH-iq2HEGwO3JDvmo5Y_hQIy7k-Xo2gZH-mP2GUIG6RFWL04X1k&usqp=CAU" height=200 />

* Format all cells prior to submitting (right click on your code).
* Use expressive names for variables so you or the reader knows what they are. 
* Use comments to explain your code -- e.g. 
  ```python
  # This is a comment, it starts with a hash sign
  ```

## Label and describe your plots

![Source: https://xkcd.com/833](https://imgs.xkcd.com/comics/convincing.png)

Make sure each plot has:
  * A title that explains where and when the data are from
  * x- and y- axis labels with **units** where appropriate
  * A legend where appropriate


## Icons: how to use this notebook
We use the following icons to let you know when you need to change something to complete the challenge:
  * &#128187; means you need to write or edit some code.
  
  * &#128214;  indicates recommended reading
  
  * &#9998; marks written responses to questions
  
  * &#127798; is an optional extra challenge
  

---

# Introduction to Multispectral Remote Sensing Data: Urban Green Space

For this assignment, you will visualize and quantify differences in vegetation health by neighborhood in Chicago, IL.

We will be developing this code over several weeks in order to practice writing **modular** code. To start, you will:
1. Download National Agricultural Imagery Program (NAIP) multispectral data for a single neighborhood in Chicago
2. Plot True Color (RGB) and Color Infrared (CIR) images of the area
3. Calculate summary statistics of the NDVI.

Eventually, you will use modular Python code to obtain those summary statistics for every neighborhood. You will create chloropleth maps of neighborhood greenery statistics, and relate those values to US Census data on income.

YOU DO NOT NEED TO COMPLETE YOUR PORTFOLIO PIECE FOR THIS WEEK - but you will create one for the final analysis which you can start working on now.

## STEP 1: Get set up

### Package imports
Use the cell below to import the packages you need in the rest of the notebook (and **ONLY** the packages you need in the rest of the notebook).

In addition to packages you have already used, you will need the following submodules:
  * `earthpy.earthexplorer`
  * `rioxarray.merge`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 2: Area of Interest

## Site Description

Research the history and context of green spaces in Chicago, IL and write a short site description. You could include information about:
  * climate and native vegetation
  * culture and history of the Humboldt Park neighborhood
  * urban greenspace development programs that may affect NDVI observations

Make sure to cite your sources.

WRITE YOUR SITE DESCRIPTION HERE

### The Humboldt Park Neighborhood

In the cell below, download a shapefile of the City of Chicago neighborhoods from [the City of Chicago Data Portal](https://data.cityofchicago.org/).

YOUR TASK:
1. Find the url for the City of Chicago Neighborhood boundaries as a Shapefile
2. Download and open up the shapefile
3. Select the 'Humboldt Park' neighborhood for this practice analysis

> HINT: The test is expecting a `GeoDataFrame`. Depending on how you get your single row, `geopandas` may turn it into a `GeoSeries`. Just as with selecting a single column as a `DataFrame`, you should be able to use double sqare brackets `[[]]` to get your result as a `GeoDataFrame`.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# BEGIN TESTS
ans_hp_gdf = _

points_hp_gdf = 0

if isinstance(ans_hp_gdf, gpd.GeoDataFrame):
    print("\u2705 Great job! Your data are stored in a GeoDataFrame!")
    points_hp_gdf += 2
else:
    print("\u274C Oops, the data are not stored in a GeoDataFrame.")

if round(ans_hp_gdf.to_crs(32616).length.sum(), 2)==14054.35:
    points_hp_gdf += 8
    print("\u2705 You downloaded the correct neighborhood boundaries!")
else:
    print("\u274C The data were not downloaded correctly.")

print('You earned {} of 10 points'.format(points_hp_gdf))
points_hp_gdf
# END TESTS

### Site Map

In the cell below, make a plot of the Humboldt Park neighborhood boundary over a tile source map of your choice to verify that your data download worked as expected.

> HINT: Reproject the neighborhood shapefile to `EPSG:3857` (Web Mercator) to get it to display on top of a tile source basemap.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 3: Data Download

### NAIP Multispectral Data

Use multispectral data from the [National Agricuture Imagery Program](https://naip-usdaonline.hub.arcgis.com/) for this analysis. Multispectral imagery can be enhanced using [false color images](https://earthobservatory.nasa.gov/features/FalseColor/page6.php) or [spectral indices]() in order to highlight phenomena such as vegetation health, wetness, or heat. In this analysis, you will produce a color infrared (CIR) false color image as well as a normalized difference vegetation index (NDVI) image. Both of these methods will enhance differences in vegetation health captured by the data.


YOUR TASK: In the cell below, describe the data you use in this analysis, including a citation. 

WRITE YOUR DATA DESCRIPTION AND CITATION HERE

### Download NAIP Data using the Earth Explorer M2M Interface
The data that you will use for this week is available Earth Explorer. However, you will need more data that you can reasonably download using the web interface for Earth Explorer. Instead, you will need to write some code to download data using the Earth Explorer [Machine to Machine (M2M) interface](https://m2m.cr.usgs.gov/).

**You will need to [sign up for access the the M2M interface](https://ers.cr.usgs.gov/profile/access) to complete this assignment -- please note that it can take a day or two to get access**

YOUR TASK:
  1. Copy the following starter code into the cell below:
     
```python
bbox = etee.BBox(*gdf.total_bounds)
naip_downloader = etee.EarthExplorerDownloader(
    dataset="NAIP", 
    label='hp-green-space', 
    bbox=bbox,
    start='2021-01-01', 
    end='2021-12-31',
    store_credential=True)
naip_downloader.submit_download_request()
naip_downloader.download(override=False)
```

Please leave the label value as `'hp-green-space'` so that I can reproduce your analysis!

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# BEGIN TESTS
points_naip_download = 0

data_dir = os.path.join(
    et.io.HOME, 'earth-analytics', 'data', 'hp-green-space', '*.tif')
if glob(data_dir):
    print("\u2705 Great job! Your downloaded and unzipped data!")
    points_naip_download += 10
else:
    print("\u274C Oops, your data didn't get downloaded.")

print('You earned {} of 10 points'.format(points_naip_download))
points_naip_download
# END TESTS

### Load in your data

YOUR TASK:
 1. Load in the data
 2. Clip the data to the `total_bounds` of the Humboldt Park neighborhood

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# BEGIN TESTS
ans_clip = _
points_clip = 0

correct_bounds = [438507, 442626, 4637577, 4640482]
ans_bounds = [
    round(float(ans_clip.x.min())),
    round(float(ans_clip.x.max())),
    round(float(ans_clip.y.min())),
    round(float(ans_clip.y.max()))]
if ans_bounds==correct_bounds:
    print("\u2705 Great job! Your clipped the data!")
    points_clip += 10
else:
    print("\u274C Oops, your data didn't get clipped correctly.")

print('You earned {} of 10 points'.format(points_clip))
points_clip
# END TESTS

## STEP 4: Map NAIP Data

You will use the `.hvplot.rgb()` method to plot color images. For this to work correctly, your data MUST be formatted as expected, with three bands in Red - Green - Blue order. If you include a fourth band, it will be interpreted as transparency (alpha).

YOUR TASK: reformat your data for plotting
  1. Research the NAIP data to learn which band is which
  2. Research Color InfraRed imagery to find out which band is represented with which color
  3. Create two `DataArrays` - one to plot as RGB, and one to plot as CIR

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### RGB Map

YOUR TASK: Generate a True Color RGB image of the Humboldt Park neighborhood. Make sure that your image is not overly distorted by setting the `data_aspect` parameter.

> HINT: use the `rasterize=True` parameter for dynamic zoom (and faster plotting)

OPTIONAL EXTRA CHALLENGE: overlay the Humboldt Park neighborhood boundary on your plot. NOTE that your will need to reproject the boundary to match your `DataArray` to get it to show up.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### CIR Map

Now make a Color InfraRed map.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## In the cell below, answer the following questions:

1. What does the CIR image highlight?
2. Which band, and which the wavelengths contained in that band, allow a CIR image to highlight the thing that you identified above?


WRITE YOUR ANSWERS ABOUT CIR IMAGES HERE

## STEP 5: Compute NDVI and summary statistics

YOUR TASK:
  1. Compute the Normalized Difference Vegetation Index from the NAIP data. You can use the `normalized_diff` function from the `earthpy.spatial` library if you like.
  2. Plot the NDVI.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

EXTRA CHALLENGE (5 pts extra credit): Display your NDVI plot next to the CIR image. You should notice a lot of similarities! NOTE that setting the `frame_width` and `frame_height` to match seems to be the best way to get the two images to be the same size.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

YOUR TASK:
  1. Clip your NDVI data to the boundary of Humboldt Park. HINT: quickly plot your data to make sure this step is working.
  2. Compute:
     * minimum
     * maximum
     * median
     * 25th and 75th percentiles (Use the `np.percentile(da, percentile)` function from the `numpy` package)
     * mean
     * standard deviation
       
     of the NDVI in Humboldt Park.
  3. Save your results to a `pd.DataFrame`, and then export them to a file using the `.to_csv()` method. You may want to use the `index=False` parameter to avoid an extra column.

HINT: To get a single number instead of a `DataArray` when summarizing, you can use the `float()` method

In [None]:
# YOUR CODE HERE
raise NotImplementedError()