<img style="float: left;" src="earth-lab-logo-rgb.png" width="150" height="150" />

# Earth Analytics Education - Bootcamp Course Fall 2020

## Important  - Assignment Guidelines

1. Before you submit your assignment to GitHub, make sure to run the entire notebook with a fresh kernel. To do this first, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart & Run All)
2. Always replace the `raise NotImplementedError()` code with your code that addresses the activity challenge. If you don't replace that code, your notebook will not run.

```
# YOUR CODE HERE
raise NotImplementedError()
```

3. Any open ended questions will have a "YOUR ANSWER HERE" within a markdown cell. Replace that text with your answer also formatted using Markdown.
4. **DO NOT RENAME THIS NOTEBOOK File!** If the file name changes, the autograder will not grade your assignment properly.
6. When you create a figure, comment out `plt.show()` to ensure the autograder can grade your plots. For figure cells, DO NOT DELETE the code that says `DO NOT REMOVE LINE BELOW`.

```
### DO NOT REMOVE LINE BELOW ###
student_plot1_ax = nb.convert_axes(plt)
```

* Only include the package imports, code, and outputs that are required to run your homework assignment.
* Be sure that your code can be run on any operating system. This means that:
   1. the data should be downloaded in the notebook to ensure it's reproducible
   2. all paths should be created dynamically using the `os.path.join`

## Follow to PEP 8 Syntax Guidelines & Documentation

* Run the `autopep8` tool on all cells prior to submitting (HINT: hit shift + the tool to run it on all cells at once!
* Use clear and expressive names for variables. 
* Organize your code to support readability.
* Check for code line length
* Use comments and white space sparingly where it is needed
* Make sure all python imports are at the top of your notebook and follow PEP 8 order conventions
* Spell check your Notebook before submitting it.

For all of the plots below, be sure to do the following:

* Make sure each plot has a clear TITLE and, where appropriate, label the x and y axes. Be sure to include UNITS in your labels.


### Add Your Name Below 
**Your Name:**

<img style="float: left;" src="colored-bar.png"/>

---

# Week 12 Spatial Vector Data Homework Template

## Assignment Background

In this assignment, you will explore an area in California called the [San Joaquin Experimental Range](https://www.neonscience.org/field-sites/field-sites-map/SJER) known by the acronym SJER.

The data that you will use for this week is available from **earthpy** using the following download: 

`et.data.get_data('spatial-vector-lidar')`

To begin, add all of the needed package imports, download the data and set your working directory in the cells below. 

In [None]:
# Autograding imports - do not modify this cell
import matplotcheck.notebook as nb
import matplotcheck.autograde as ag
import matplotcheck.vector as vct
import pandas as pd

# This will hide one of geopandas warnings in cell 3
import warnings
warnings.filterwarnings('ignore',
                        'GeoSeries.notna',
                        UserWarning)

In [None]:
# Import packages, download data, and set working directory here

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
# Tests that the working directory is set to earth-analytics/data

path = os.path.normpath(os.getcwd())
student_wd_parts = path.split(os.sep)

wd_import_points = 0

if student_wd_parts[-2:] == ['earth-analytics', 'data']:
    print("\u2705 Great - it looks like your working directory is set correctly to ~/earth-analytics/data")
    wd_import_points += 4
else:
    print("\u274C Oops, the autograder will not run unless your working directory is set to earth-analytics/data")

print("\n \u27A1 You received {} out of 9 points for setting your working directory and imports.".format(
    wd_import_points))
wd_import_points


## Challenge 1a: Open And Clip Your Vector Data

The NEON **SJER** field site is located in California. Your first task is to explore the area by creating a map of California roads that has symbology that represents different road types.

### Open the roads layer and clip it using the SJER_crop extent

In the cell below:

1. Open the `california/madera-county-roads/tl_2013_06039_roads.shp` file located in your `spatial-vector-lidar` data download using GeoPandas. 
2. Clip the data using the SJER boundary (`california/neon-sjer-site/vector_data/SJER_crop.shp`) layer. 
3. Open the SJER plot locations data (`california/neon-sjer-site/vector_data/SJER_plot_centroids.shp`). 
4. Set all `RTTYP` that are "none" to "Unknown" using the syntax: `roads-object-name["RTTYP"].fillna("Unknown", inplace=True)`

Call the **clipped and reprojected roads shapefile geodataframe object** at the 
end of the cell to ensure the tests below run.


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL

## Challenge 1b: Create a Figure Of Your Study Area

In the cell below, add code to create your challenge figure using the 
objects that you generated above.

Create a map that shows the madera roads layer, SJER plot locations and the SJER boundary (`california/neon-sjer-site/vector_data/SJER_crop.shp`). All data should be cropped to your
SJER boundary crop extent (your Area Of Interest or AOI)

### Important Notes For Your Figure

1. Plot the roads so different **road types** are represented using unique symbology using the `RTTYP` attribute.
2. Add the plot locations to your map. Color each location according to the attribute **plot type** using unique symbology for each `plot_type` in the data.
3. Add a **title** to your figure.
4. Be sure that your plot legend is not covering your final map.
5. **IMPORTANT:** be sure that all of the data are cropped to the **same spatial extent**
and **crs**. This means that you have to clip and reproject your data prior to plotting it.

In [None]:
# PLOT 1 - Place only the code required to create a plot of your data here
# Additional processing code can go above this code cell

# YOUR CODE HERE
raise NotImplementedError()

### DO NOT REMOVE LINE BELOW ###
plot01_roads_plot_locs = nb.convert_axes(plt, which_axes="current")

In [None]:
# DO NOT MODIFY THIS CELL

hw_plot1 = vct.VectorTester(plot01_roads_plot_locs)
results = []

test_plot_title = ag.run_test(hw_plot1.assert_title_contains,
                              strings_expected='',
                              points=1,
                              correct_message="\u2705 Your plot has a title, nice!",
                              error_message="\u274C Your plot is missing a title.")


### Figure 1 - Manual Points - 
* Points for plot title being expressive

## Question (5 points)

What does the RTTYP road type acronyms **M** and **S** stand for? 
Please your answer in the markdown cell BELOW.

HINT: use the `tl_2013_06039_roads.shp.xml` file in your data download to help you figure out the answer to this question
HINT2: you can also find good results using Google (on the column name)! 

YOUR ANSWER HERE

## Challenge 2: Figure 2 - Roads in Del Norte, Modoc & Siskiyou Counties

Create a plot of roads that are located in: Del Norte, Modoc & Siskiyou Counties. To do this, you will need the following layers:

* Counties in California: `california/CA_Counties/CA_Counties_TIGER2016.shp`
* Roads: `global/ne_10m_roads/ne_10m_roads.shp` 

To create this plot, you will need to:

1. Reproject the roads and the county data to `epsg=5070`
2. Select the three counties that you want to work with in the counties dataset. One fast way to do this is using syntax as follows: 

`roads_df[roads_df['NAME'].isin(["Siskiyou", "Modoc", "Del Norte"])]`

3. Clip the roads data to the boundary of the counties that you wish to look at.
4. Assign each road segment an attribute that identifies it as within each county.

Color the roads in each county using a unique color.

HINT: use the `legend=True` argument in `.plot()` to create a legend.
Because you are only creating a legend for one layer, you can quickly use `.plot()`
rather than `ax.legend()` which is what you used to create the figure above.


### IMPORTANT: 

* Both layers need to the in the SAME coordinate reference system for you to work with them together. REPROJECT both data layers to albers `.to_crs(epsg=5070)`
* Clip the roads to the boundary of the three_counties layer that you created which only contains the 3 selected counties: `"Siskiyou", "Modoc", "Del Norte"`
* To assign each road to its respective county, you will perform a spatial join using `.sjoin()`.
* You will need to redefine the CRS of the roads layer, after the clip, which you can do using something like:

```county_roads.crs = ca_cty_albers.crs```

**NOTE:** you can ignore the warning from pyproj: `'+init=<authority>:<code>' syntax is deprecated.`, which will be resolved by GeoPandas in the next release.

In the cell below, add the code needed to 

* Open each layer
* Reproject the data 
* Clip and subset the data 

At the end of the cell, be sure to call the clipped roads layer.

In [None]:
# In this cell, add the code needed to open, reproject and clip / subset the data

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
student_three_counties = _

three_counties_points = 0

if isinstance(student_three_counties, gpd.geodataframe.GeoDataFrame):
    print("\u2705 Great! Your clipped object is a GeoDataFrame!")
    three_counties_points += 1
else:
    print("\u274C Oops, your clipped object is not a GeoDataFrame.")

if student_three_counties.crs.to_epsg() == 5070:
    print("\u2705 Great! Your clipped object has the correct CRS!")
    three_counties_points += 2
else:
    print("\u274C Oops, your clipped object does not have the correct CRS.")

In [None]:
# DO NOT MODIFY THIS CELL

## Challenge 2b: Figure

In the cell beloow,  add code to create the figure described above.

In [None]:
# Figure 2 - Place only the code required to plot your data here
# Additional processing code can go above this code cell

# YOUR CODE HERE
raise NotImplementedError()

### DO NOT REMOVE LINE BELOW ###
plot02_county_roads_clip = nb.convert_axes(plt, which_axes="current")

In [None]:
# DO NOT MODIFY THIS CELL
hw_plot2 = vct.VectorTester(plot02_county_roads_clip)


plot_title = ag.run_test(hw_plot2.assert_title_contains,
                         strings_expected='',
                         points=1,
                         correct_message="\u2705 Plot title exists!",
                         error_message="\u274C Check your plot title exists.")

print(plot_title['message'])


## Challenge 3:  Calculate Total Length of Road Siskiyou, Modoc, Del Norte County in California

Create a dataframe that shows the total length of road in these counties used in plot 2: Siskiyou, Modoc, and Del Norte. To calculate this, use the data you created for plot 2.

To calculate length of each line in your geodataframe, you can use the syntax `gdf.length`. Create a new column using the syntax:

`gdf["length"] = gdf.length`

You can summarize the data to calculate total length using pandas `.groupby()` on the county column name.

Note: you can use: `pd.options.display.float_format = '{:.4f}'.format` if you'd like to turn off scientific notation for your outputs.

It should look something like this:


||length|
|----|----|
|NAME|| 
|Del Norte| road length here|
|Modoc| road length here|
|Siskiyou| road length here|


At the end of the cell, call the dataframe object

In [None]:
# TABLE 1 - Place the code required to create the dataframe

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
# Test that the cali_roads_summary is of type dataframe and named correctly

# Let's make sure you created an object with the correct name and of the correct type above!

student_length_dataframe = _

length_points = 0

if len(student_length_dataframe) == 3:
    print("\u2705 Correct number of entries in the dataframe, good job!")
    length_points += 2
else:
    print("\u274C Incorrect amount of entries in the dataframe.")

if student_length_dataframe.length.dtype == 'float':
    print("\u2705 Length column has the correct datatype!")
    length_points += 2
else:
    print("\u274C Length column does not have the correct datetype.")

In [None]:
# DO NOT MODIFY THIS CELL

## Challenge 4: Plot 3 - Global  Estimated Population 

Create a plot of quantile maps of the. sum. of global estimated population by region. To do this, you will use the following layers:

1. Download the natural earth data from the following URL:
`https://ndownloader.figshare.com/files/25515986`

The URL below might look odd as it has two "http" strings in it, but it is how the url's are organized on natural earth and should work. 

Recall that using `et.data.get_data(url=url)` will download the data to the following directory: `earth-analytics/data/earthpy-downloads/` 

After you have downloaded the data, import the data and 
1. subset the data to include the following columns: `["REGION_WB", "POP_EST", 'geometry']`
2. Dissolve the data by region (`REGION_WB`) column and aggregate by `sum`. 
    * HINT: you can provide the aggfun= argument with a `[list]` of function names in quotes and it will summarize numeric columns using each function.
3. Create a figure that shows the sum of estimated population (`POP_EST`) by region.
    
**NOTE:** you can ignore the warning from merge: `UserWarning: merging between different levels can give an unintended result`.

In the cell below:

* Open up the data
* Subset the columns needed using `dataframe[["column-1", "column-2", "column-3"]]`
* use the `dissolve` function to dissolve the data by the `REGION_WB` column

Call your final dissolved geodataframe at the end of the cell

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
student_dissolve_dataframe = _

dissolve_points = 0

if len(student_dissolve_dataframe) == 8:
    print("\u2705 Correct number of entries in the dataframe, good job!")
    dissolve_points += 2
else:
    print("\u274C Incorrect amount of entries in the dataframe.")

if isinstance(student_dissolve_dataframe, gpd.GeoDataFrame):
    print("\u2705 Data is stored in a GeoDataFrame, good job!")
    dissolve_points += 2
else:
    print("\u274C Data is not stored in a GeoDataFrame.")

print("\n \u27A1 You received {} out of 4 points.".format(
    dissolve_points))
dissolve_points

In [None]:
# PLOT 3 - Place only the code required to plot your data here
# Additional processing code can go above this code cell

# YOUR CODE HERE
raise NotImplementedError()

### DO NOT REMOVE LINE BELOW ###
plot04_global_population = nb.convert_axes(plt, which_axes="current")

In [None]:
# DO NOT MODIFY THIS CELL

In [None]:
# DO NOT MODIFY THIS CELL

hw_plot4_rank = vct.VectorTester(plot04_global_population)

results = []

plot_title_rank = ag.run_test(hw_plot4_rank.assert_title_contains,
                              strings_expected='',
                              points=1,
                              correct_message="\u2705 Plot title exists!",
                              error_message="\u274C Check that your plot title exists.")



## Manual points for geometries in plot 

## Do not modify this cell (10 points)

* Here we will grade pep8 format and imports listed at the top following pep 8 conventions.
* Notebook begins with cell [1] and runs without modifications. 
* Be sure that your code can be run on any operating system. This means that:
    * the data should be downloaded in the notebook to ensure it's reproducible
    * all paths should be created dynamically using the os package to ensure that they work across operating systems