<div><img style="float: left; padding-right: 3em;" src="https://avatars.githubusercontent.com/u/19476722" width="150" /><div/>

# Earth Data Science Coding Challenge!
Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is **readable** and **reproducible**. 

## Don't get **caught** by these Jupyter notebook gotchas

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*o0HleR7BSe8W-pTnmucqHA.jpeg" width=300 style="padding: 1em; border-style: solid; border-color: grey;" />

  > *Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36*

These are the most common issues that will keep you from getting started and delay your code review:

1. When you try to run some code on GitHub Codespaces, you may be prompted to select a **kernel**.
   * The **kernel** refers to the version of Python you are using
   * You should use the **base** kernel, which should be the default option. 
   * You can also use the `Select Kernel` menu in the upper right to select the **base** kernel
2. Before you commit your work, make sure it runs **reproducibly** by clicking:
   1. `Restart` (this button won't appear until you've run some code), then
   2. `Run All`

## Check your code to make sure it's clean and easy to read

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSO1w9WrbwbuMLN14IezH-iq2HEGwO3JDvmo5Y_hQIy7k-Xo2gZH-mP2GUIG6RFWL04X1k&usqp=CAU" height=200 />

* Format all cells prior to submitting (right click on your code).
* Use expressive names for variables so you or the reader knows what they are. 
* Use comments to explain your code -- e.g. 
  ```python
  # This is a comment, it starts with a hash sign
  ```

## Label and describe your plots

![Source: https://xkcd.com/833](https://imgs.xkcd.com/comics/convincing.png)

Make sure each plot has:
  * A title that explains where and when the data are from
  * x- and y- axis labels with **units** where appropriate
  * A legend where appropriate


## Icons: how to use this notebook
We use the following icons to let you know when you need to change something to complete the challenge:
  * &#128187; means you need to write or edit some code.
  
  * &#128214;  indicates recommended reading
  
  * &#9998; marks written responses to questions
  
  * &#127798; is an optional extra challenge
  

---

# Introduction to Multispectral Remote Sensing Data: Urban Green Space

For this assignment, you will visualize and quantify differences in vegetation health by neighborhood in Chicago, IL.

We will be developing this code over several weeks in order to practice writing **modular** code. Last week, you should have:
1. Downloaded National Agricultural Imagery Program (NAIP) multispectral data for a single neighborhood in Chicago
2. Plotted True Color (RGB) and Color Infrared (CIR) images of the area
3. Calculated summary statistics of the NDVI and saved them to a file.

This week, you will:
1. Select two neighborhoods
2. FOR EACH neighborhood, you will then:
   1. Download NAIP multispectral data for the neighborhood
   2. Calculate NDVI
   3. Calculate and save summary statistics of to a file

Eventually, you will use modular Python code to obtain those summary statistics for every neighborhood while making efficient use of the Codespace computing resources. You will create chloropleth maps of neighborhood greenery statistics, and relate those values to US Census data on income.

YOU DO NOT NEED TO COMPLETE YOUR PORTFOLIO PIECE FOR THIS WEEK - but you will create one for the final analysis which you can start working on now.

## STEP 1: Get set up

### Package imports
Use the cell below to import the packages you need in the rest of the notebook (and **ONLY** the packages you need in the rest of the notebook).

In addition to packages you have already used, you will need the following submodules:
  * `earthpy.earthexplorer`
  * `rioxarray.merge`

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 2: Area of Interest

### Select a small number of neighborhoods to test your code and loops

In the cell below, download a shapefile of the City of Chicago neighborhoods from [the City of Chicago Data Portal](https://data.cityofchicago.org/).

YOUR TASK:
1. Find the url for the City of Chicago Neighborhood boundaries as a Shapefile
2. Download and open up the shapefile
3. **Select the 'Humboldt Park' and 'Lincoln Park' neighborhood for this practice analysis**

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# BEGIN TESTS
ans_hp_gdf = _

points_hp_gdf = 0

if isinstance(ans_hp_gdf, gpd.GeoDataFrame):
    print("\u2705 Great job! Your data are stored in a GeoDataFrame!")
    points_hp_gdf += 2
else:
    print("\u274C Oops, the data are not stored in a GeoDataFrame.")

if round(ans_hp_gdf.to_crs(32616).length.sum(), 2)==32901.1:
    points_hp_gdf += 8
    print("\u2705 You selected the correct neighborhood boundaries!")
else:
    print("\u274C The data were not selected correctly.")

print('You earned {} of 10 points'.format(points_hp_gdf))
points_hp_gdf
# END TESTS

### Site Map

In the cell below, make a plot of the Humboldt Park and Lincoln Park neighborhood boundaries over a tile source map of your choice to verify that your data download worked as expected.

> HINT: Reproject the neighborhood shapefile to `EPSG:3857` (Web Mercator) to get it to display on top of a tile source basemap.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 3: Set up your loop

To complete this analysis, you will need to run the same code on the 'Humboldt Park' and 'Lincoln Park' neighborhoods. In order to keep your code DRY (Don't Repeat Yourself), there are several code structures you could use...but this notebook will walk you through building a `for` loop.
    
I recommend building `for` loops one line at a time, testing at each additional line, which is what you will do in this notebook. This technique will help you to catch errors and incorrect code while they are small.

The syntax for a `for` loop is:

```python
for my_item in my_iterator:
    do_something()
```

Some things to keep in mind about `for` loops:
  * `my_item` is called the **looping variable**. It changes every time through the loop, cycling through each element of `my_iterator`. 
  * `my_iterator` must be something that **iterates**, like a list or tuple. To iterate through each row of a `DataFrame`, you can use the `df.iterrows()` method to turn the `DataFrame` into an iterable that returns two values (the index and the row) for each row.
  * Notice the **indentation** -- only the indented block of code after the `for/in` statement will be run repeatedly. This is different from other coding languages that use parentheses `()` or braces `{}` to surround the code that should be repeated.
  * Another common error with `for` loops is to forget the colon `:` at the end of the `for/in` statement.



YOUR TASK:
  1. Copy the following sample code into the cell below, which iterates through each row of a `GeoDataFrame:
  
  ```python
  for i, row in gdf.iterrows():
      print(i)
      print(row)
  ```
  
  2. Change `gdf` to your test `GeoDataFrame` with the 'Humboldt Park' and 'Lincoln Park' neighborhoods ONLY.
  3. Run the cell and take a look at what `i` and `row` are. Give your **looping variables** descriptive names.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 4: Download Data

### Bounding Box

Next, prepare the `BBox` object for each neighborhood. 


YOUR TASK:
  1. Using last week's assignment, put the code to define an `etee.BBox` object into the cell below **inside the same `for` loop** you wrote above.
  
  > What does **inside the `for` loop** mean? It means **indented** after the `for/in` statement. In the previous example, both `print(i)` and `print(row)` were inside the `for` loop.
  
  2. Switch to using the **looping variables** in your loop instead of static variables -- Replace `GeoDataFrame.total_bounds` with the `bounds` attribute of the `geometry` of the row you are working on.

  3. Test your work! How can you make sure that your code does what it is supposed to do? For now, this could be as simple as printing out some details about the `BBox` to make sure that the bounds change each time through the loop.
  
> HINT: You'll notice that printing the `BBox` itself doesn't give you any information about the coordinates. Use the `help()` function to see all the attributes of the `BBox` object, as in:
> `help(etee.BBox)`

> HINT: print out something like `'\nCreating BBox'` as the first line of your loop to separate each iteration of the loop. The `\n` will translate into an empty line.

For an EXTRA CHALLENGE: write tests like the ones in previous assignments, or even use a testing framework like `pytest`. HINT: ChatGPT does a pretty good job of writing tests!

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Prepare downloader

Next, prepare your `EarthExplorerDownloader` object with the bounding box and a new label.

YOUR TASK:
  1. Copy your `for` loop to the next cell
  2. Add the next line of code from the previous assignment **inside the `for` loop**. **DO NOT** add the steps for requesting the download OR downloading yet!
  3. Change the `label` parameter to a **lower case** version of the neighborhood name with **dashes `-` instead of spaces**
  4. Test your code. You should check at a minimum that the `.label` and `.bbox` attributes of the `etee.EarthExplorerDownloader` are what you expect them to be.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Download data

You know the drill...YOUR TASK:

1. Copy your loop as it stands into the cell below
2. Add the lines to request a download and download the data
3. Run your code and check if it worked, for example by printing out the paths to all the `.tif` files in your two download directories.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Iterate through the neighborhoods to load data

You could continue building the loop you have. However, then you will have to wait for the API check every time you run your loop. It's important to be efficient with your time as well as the computer's!

YOUR TASK:
 1. Copy your loop and delete all the download code
 2. For each neighborhood, generate and print out the path to the data for that neighborhood.
 3. To test, check that each path exists

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Get paths to the data

The entire Humboldt Park neighborhood was in a single tile of NAIP data, so we didn't have to worry about spatially merging or **mosaicing** multiple tiles. Notice that the Lincoln Park neighborhood covers two tiles. You'll need to load all the tiles for each neighborhood!

YOUR TASK:
1. Start with your loop from above that generates the data directory for each neighborhood
2. Get a list of all the `.tif` files for each neighborhood
3. Test your code by printing the path to each `.tif` file and whether or not it exists. Note that your will need to use another iterative structure or function to do so - I recommend **list comprehension**.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Load data

YOUR TASK:
1. Start with your `for` loop that gets a list of paths to all the `.tif` files for each neighborhood.
2. Use **list comprehension** (or another iterative structure) to load in each `.tif` file and a `DataArray`
3. Test that your code works by printing out the coordinates for each `DataArray`, or something similar.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Merge data

Because the Lincoln Park neighborhood covers two tiles, you will need to merge the `DataArray`s for that neighborhood. Below is an example of how to do that using the `rioxarray.merge` module:

```python
rxrmerge.merge_arrays(list_of_das)
```

YOUR TASK:
1. Start with your `for` loop that gets a list of `DataArray`s for each neighborhood.
2. Merge the `DataArray`s (NOTE it's ok to merge a single array for now. It will take longer than necessary but still run.)
3. Merging also takes some time. Save your work by putting each merged array in a **Python dictionary** with the name of the neighborhood as the key, using the following code as a guide:

```python
# Create an empty dictionary
my_dict = {}
# Add da under the neighborhood name
dict[name] = da
```

4. Test that your code works by again printing the coordinates from each array. Did the merge work as expected?

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

## STEP 5: Compute NDVI and summary statistics

For this last step, I'm asking you to add a lot more code than previous steps. Use what you learned break down this larger step into manageable tasks! Some tips:

 * Write some pseudocode (like an outline of your code) before you start. I like to format mine as comments
 * Write one line at a time
 * Test each line before moving on. The `print()` function is your friend!
 * Think carefully about your variable names inside the `for` loop - e.g. you don't want to name a variable in the `for` loop `humboldt_park_ndvi`, because in later iterations that won't be true -- it will be Lincoln Park NDVI.

> IMPORTANT: If you can't get this loop to work, you can still get a LOT of partial credit by including your pseudocode, comments, and explanations about what you were trying to do.

YOUR TASK:
1. Start a `for` loop iterating over your dictionary of `DataArray`s. You can use the `.items()` method of dictionaries to get an iterable version of a dictionary. For example:

```python
for key, value in my_dictionary.items():
    ...
```

2. Add the code to compute summary statistics of NDVI for each neighborhood `DataArray`, saving the data as a `DataFrame` in a list.
3. Combine all the `DataFrame`s into one with the `pd.concat()` function
4. Save the result to a file.

> HINT: you should NOT use et.norm_difference() to compute NDVI. It causes problems because Lake Michigan has 0 values in the red AND nir bands. You can compute NDVI yourself using Python operators.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()