<img style="float: left;" src="earth-lab-logo-rgb.png" width="150" height="150" />

# Earth Analytics Education - EA  Python Course Spring 2021

## Important  - Assignment Guidelines

1. Before you submit your assignment to GitHub, make sure to run the entire notebook with a fresh kernel. To do this first, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart & Run All)
2. Always replace the `raise NotImplementedError()` code with your code that addresses the activity challenge. If you don't replace that code, your notebook will not run.

```
# YOUR CODE HERE
raise NotImplementedError()
```

3. Any open ended questions will have a "YOUR ANSWER HERE" within a markdown cell. Replace that text with your answer also formatted using Markdown.
4. **DO NOT RENAME THIS NOTEBOOK File!** If the file name changes, the autograder will not grade your assignment properly.
6. When you create a figure, comment out `plt.show()` to ensure the autograder can grade your plots. For figure cells, DO NOT DELETE the code that says `DO NOT REMOVE LINE BELOW`.

```
### DO NOT REMOVE LINE BELOW ###
student_plot1_ax = nb.convert_axes(plt)
```

* Only include the package imports, code, and outputs that are required to run your homework assignment.
* Be sure that your code can be run on any operating system. This means that:
   1. the data should be downloaded in the notebook to ensure it's reproducible
   2. all paths should be created dynamically using the `os.path.join`

## Follow to PEP 8 Syntax Guidelines & Documentation

* Run the `autopep8` tool on all cells prior to submitting (HINT: hit shift + the tool to run it on all cells at once!
* Use clear and expressive names for variables. 
* Organize your code to support readability.
* Check for code line length
* Use comments and white space sparingly where it is needed
* Make sure all python imports are at the top of your notebook and follow PEP 8 order conventions
* Spell check your Notebook before submitting it.

For all of the plots below, be sure to do the following:

* Make sure each plot has a clear TITLE and, where appropriate, label the x and y axes. Be sure to include UNITS in your labels.


### Add Your Name Below 
**Your Name:**

<img style="float: left;" src="colored-bar.png"/>

---

# What the fork?!  Who did this?!

You have started working on a project that was previously managed by a colleague who has left your organization.

However, after trying to run the notebook, you see that the code in this notebook has some issues with: 
* reproducibility, 
* file names, 
* and code that does not execute successfully. 

There are also some PEP 8 issues that make it hard to follow the workflow.

Your task is to:

1. Make this notebook run without errors.
2. Make sure all of the code follows PEP 8 Standards. 
3. Make the code more readable by using white space, and removing unused imports and code.
4. Ensure reproducibility by checking that ALL NEEDED imports are at the top and that the data are downloaded in the notebook. Be sure to remove unused imports.

Throughout the notebook, you will be asked to describe how you modified / improved the code. You will provide your explanation using Markdown cells. 

HINT: you may consider cleaning up the notebook first and then going back and answering the questions. Or consider taking notes about what you changed in each cell and then answering the questions. This is just an optional approach. 

HINT 2: You can use the autopep8 tool in Jupyter to automagically fix spacing issues in your code. **Shift +** hitting the tool icon in jupyter will run it on every cell!! Use this to make your life easier! 

(However, remember that that autopep8 does not fix all PEP 8 issues!)

### IMPORTANT
In the cells below you can delete all instances of: 
```
# YOUR CODE HERE
raise NotImplementedError()
```

In this notebook you will be fixing code - NOT WRITING NEW CODE

![Colored Bar](colored-bar.png)

In [None]:
# Autograding imports - do not modify this cell
import matplotcheck.notebook as nb
import matplotcheck.autograde as ag
import matplotcheck.base as pt

In [None]:
#import required libraries here
import numpy as np
import xarray as xr
import rioxarray as rxr
import warnings
import matplotlib.pyplot as plt
import rioxarray
import earthpy as et
import tweepy as tw
import os
import python
import seaborn as sns
import itertools
import collections
import tweepy as tw
import nltk
from nltk import bigrams
from nltk.corpus import stopwords
import seaborn as sns
from shapely.geometry import mapping
from matplotlib.colors import ListedColormap, BoundaryNorm
from matplotlib.patches import Patch

os.chdir("home/jovyann/earth-a") 
warnings.simplefilter('ignore')

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# Test package imports - DO NOT MODIFY THIS CELL!

try:
    crs = rxr
    print("\u2705 Score! rioxarray has been imported as a rxr!")
except NameError:
    print("\u274C rioxarray has not been imported as a rxr, please make sure to import is properly.")

try:
    empty_array = xr
    print("\u2705 Score! xarray has been imported as a xr!")
except NameError:
    print("\u274C xarray has not been imported as a xr, please make sure to import is properly.")

try:
    mapping_func = mapping
    print("\u2705 Score! Mapping has been imported from shapely.geometry!")
except NameError:
    print("\u274C Mapping has notbeen imported from shapely.geometry. You need mapping to clip your data.")

## Question 1 (8 points) 

1. What modifications did you make to the cell above to ensure it follows PEP 8 standards? 
2. What modifications did you make to the cell above to ensure it follows reproducibility conventions?

Add your answer to the Markdown cell below.

YOUR ANSWER HERE

In [None]:
#Import SOAP plot locations

soap_data_path = os.path.join("spatial-vector-lidar", "california", "neon-soap-site")

# Open up your plot locations from shapefile that is in the vector_data directory under neon_soap_site
soap_centroids_path = os.path.join(soap_data_path, "Vector_Data", "SOAP_plot_centroids.shp")

soap_plot_points = gpd.read_file(soap_centroids_path)

# YOUR CODE HERE
raise NotImplementedError()

## Question 2 (10 points)

1. Why did the cell above fail to run?
2. Were there any PEP 8 or reproducibility issues in this cell? Explain your answer.

Add your answer to the Markdown cell below.

YOUR ANSWER HERE

In [None]:
# DO NOT MODIFY - AUTOGRADE

points = 0
try:
    isinstance(soap_plot_points, gpd.GeoDataFrame)
    print("Great! Your diff chm object is the correct type - GeoDataFrame")
    points += 6
except AssertionError as message:
    print("AssertionError:",
          "Oops, your chm object should be a numpy array.")
points


# Part 2 - Spatial Things

Open the SOAP lidar CHM file. Read in the data.

In [None]:
soap_chm_path = "C:\\earth-analytics\\data\\spatial-vector-lidar\\california\\neon-soap-site\\2013\\lidar\\SJER_lidarCHM.tif"

# TODO in final release notebook: uncomment these lines
# with rio.open(soap_chm_path) as soap_lidar_chm_src:
#     # Create metadata object to view later
#     soap_lidar_chm_meta = soap_lidar_chm.profile
#     #plotting extent
#     soap_plot_extent = plotting_extent(soap_lidar_chm_src)

# TODO in final release notebook: uncomment this line  
#soap_lidar_chm_src.read(1, masked=True)


# YOUR CODE HERE
raise NotImplementedError()

## Question 3 (12 points)

1. Why is a context manager useful when opening up files?
2. How did you change the code above to make it run? (Note: it should read in the lidar canopy height model for the NEON SOAP field site.)

Add your answer to the Markdown cell below.

YOUR ANSWER HERE

In [None]:
# Ignore this cell for autograded tests


## Plot Soaproot Saddle (SOAP) Plot Locations on Top of the SOAP CHM

In the cell below, create a plot of the canopy height model for SOAP that you imported above. Overlay the plot locations on top of your plot. 

HINT: The lesson below has an example of vector points overlayed on top of a raster. 

https://www.earthdatascience.org/courses/use-data-open-source-python/spatial-data-applications/lidar-remote-sensing-uncertainty/summarize-and-compare-lidar-insitu-tree-height/#create-map-of-plot-locations-sized-by-tree-height

In [None]:
# #plot SOAP Canopy Height Model with the plot locations overlayed on top 
import earthpy.plot as ep

f, ax = plt.subplots(figsize=(5, 8))

ep.plot_bands(soap_lidar_chm, cmap="Greys", ax=ax, title="My Plot")
soap_plot_points.plot(color="purple")



# YOUR CODE HERE
raise NotImplementedError()

## Question 4 (15 points) 

1. How did you fix the code above? (10)
2. What does the rasterio `plotting_extent()` function do? (5)

Add your answer to the Markdown cell below.

YOUR ANSWER HERE

In [None]:
# Ignore this cell for autograded tests


# Plots of Spatial Data

In the cell below, open up the following layers:

* `data/spatial-vector-lidar/global/ne_10m_roads/ne_10m_roads.shp`
* `data/spatial-vector-lidar/california/CA_Counties/CA_Counties_TIGER2016.shp`

Clip the roads layer to the boundary of the Fresno, Madera and Tulare counties. 

In [None]:
#calculate length of roads in Fresno, Madera, and Tular Counties
import earthpy.clip as cl

#ne_roads = gpd.read_file('data//spatial-vector-lidar/global/ne_10m_roads/ne_10m_roads.shp')


#cali_county_boundary = gpd.read_file('data/spatial-vector-lidar/california/CA_Counties/CA_Counties_TIGER2016.shp')


#three_counties = cali_county_boundary[cali_county_boundary['NAME'].isin(["Fresno", "Madera", "Tulare"])]
# Clip the data
# TODO UNCOMMENT THE LINE BELOW FROM RELEASE
#county_roads = gpd.clip_shp(ne_roads, three_counties)


# YOUR CODE HERE
raise NotImplementedError()


## Question 5 (8 points)

In the Markdown cell below, answer the following:

1. What fixes did you implement to ensure the code above ran? (2)
2. Were there any PEP 8 or reproducibility issues in this cell? Explain your answer. (2)
2. Why is the CRS of a spatial data object important when processing data? (2)
3. Can you perform a **spatial join** or a **clip** on a two datasets that are in different CRS's? Explain your answer. (2)

YOUR ANSWER HERE

In [None]:
# Ignore this cell for autograded tests

points = 0
try:
    isinstance(county_roads, gpd.GeoDataFrame)
    print("Great! Your diff chm object is the correct type - GeoDataFrame")
    points += 4
except AssertionError as message:
    print("AssertionError:",
          "Oops, your clipped layer should be a GeoDataFrame.")
points


# Munging Insitu Data

In the cell below, summarize the insitu data for the NEON SOAP field site to calculate max and mean. Rename those summary columns to: `insitu_max` and `insitu_mean`.

In [None]:
# Import SOAP insitu data, calculate mean and max from insitu data, rename the calculated columns: insitu_max, insitu_mean

soap_base_path = os.path.join("spatial-vector-lidar", "california", "neon-soap-site")
soap_insitu_path = os.path.join(soap_base_path, "2013", "insitu", "veg-structure", "D17_2013_SOAP_vegStr.csv")
soap_insitu = pd.read_csv(soap_insitu_path)
soap_insitu_mean_max = soap_insitu[["siteid", "plotid", "stemheight"]]
soap_insitu_mean_max = soap_insitu_mean_max.groupby('plotid', as_index=False).stemheight.agg(['max', 'mean']).reset_index()
soap_insitu_mean_max = soap_insitu_mean_max.rename(columns={'in_max': 'insitu_max', 'in_mean': 'insitu_mean'})


# YOUR CODE HERE
raise NotImplementedError()


## Question 6 (4 points)

In the Markdown cell below, answer the following:
1. What fixes did you implement to ensure the cell above ran?
2. Were there any PEP 8 or reproducibility issues in this cell? Explain your answer.

YOUR ANSWER HERE

In [None]:
# Ignore this cell for autograded tests

points = 0
if set(['insitu_max','insitu_mean']).issubset(soap_insitu_mean_max.columns):
    print("Great! Your Data Frame has the correct columns: 'insitu_max' & 'insitu_mean' ")
    points += 4
else:
    print("Oops, looks like your column names are not correct.")
points

![Colored Bar](colored-bar.png)

## Question 7 (4 points)

In the Markdown cell below, answer the following questions.

1. What `type` of Python object is `soap_lidar_chm`? (2)
    * Your answer should look like this but be the correct type for the object above: **string**.
2. Consider the original lidar data that you imported above to create the `soap_lidar_chm` object in Python. What type of file and it's associated extensions(s) are used to import the lidar data? 
    * **example answer: `text file`, `file.txt`** (2)

NOTE: The SOAP object is not a text file! I just gave you examples so you could see what I was looking for here! 

YOUR ANSWER HERE

![Colored Bar](colored-bar.png)

## Question 8 (6 points)

1. What `type` of Python object is `county_roads`? (2)
2. Consider the original roads layer that you imported above to create the `county_roads` object in Python. What type of file and it's associated extensions(s) are used to import the county_roads layer? (2)
3. What are the 3 file extensions required for a shapefile to be opened in Python? (2)

YOUR ANSWER HERE

![Colored Bar](colored-bar.png)

## Question 9 (4 points)

1. What `type` of Python object is `soap_insitu_mean_max`? (2)
2. Consider the original insitu data that you imported above to create the `soap_insitu_mean_max` object in Python. What type of file and it's associated extensions(s) are used to import the insitu data. (2)

YOUR ANSWER HERE

![Colored Bar](colored-bar.png)