<div><img style="float: left; padding-right: 3em;" src="https://avatars.githubusercontent.com/u/19476722" width="150" /><div/>

# Earth Data Science Coding Challenge!
Before we get started, make sure to read or review the guidelines below. These will help make sure that your code is **readable** and **reproducible**. 

## Don't get **caught** by these Jupyter notebook gotchas

<img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/1*o0HleR7BSe8W-pTnmucqHA.jpeg" width=300 style="padding: 1em; border-style: solid; border-color: grey;" />

  > *Image source: https://alaskausfws.medium.com/whats-big-and-brown-and-loves-salmon-e1803579ee36*

These are the most common issues that will keep you from getting started and delay your code review:

1. When you try to run some code on GitHub Codespaces, you may be prompted to select a **kernel**.
   * The **kernel** refers to the version of Python you are using
   * You should use the **base** kernel, which should be the default option. 
   * You can also use the `Select Kernel` menu in the upper right to select the **base** kernel
2. Before you commit your work, make sure it runs **reproducibly** by clicking:
   1. `Restart` (this button won't appear until you've run some code), then
   2. `Run All`

## Check your code to make sure it's clean and easy to read

<img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSO1w9WrbwbuMLN14IezH-iq2HEGwO3JDvmo5Y_hQIy7k-Xo2gZH-mP2GUIG6RFWL04X1k&usqp=CAU" height=200 />

* Format all cells prior to submitting (right click on your code).
* Use expressive names for variables so you or the reader knows what they are. 
* Use comments to explain your code -- e.g. 
  ```python
  # This is a comment, it starts with a hash sign
  ```

## Label and describe your plots

![Source: https://xkcd.com/833](https://imgs.xkcd.com/comics/convincing.png)

Make sure each plot has:
  * A title that explains where and when the data are from
  * x- and y- axis labels with **units** where appropriate
  * A legend where appropriate


## Icons: how to use this notebook
We use the following icons to let you know when you need to change something to complete the challenge:
  * &#128187; means you need to write or edit some code.
  
  * &#128214;  indicates recommended reading
  
  * &#9998; marks written responses to questions
  
  * &#127798; is an optional extra challenge
  

---

# Vector data enhances map-making through the inclusion of point locations, political and natural boundaries, and areas of interest

In this notebook, you will create two maps of different places in California using vector data (shapefiles). In the process, you will learn how to:
  * Project/reproject vector data to a common Coordinate Reference System (CRS)
  * Clip vector data to an area of interest
  * Display vector data on a map
  * Calculate lengths using vector data

## Set up
### Data Citation

You will use the following data from NEON and [Natural Earth Data](https://www.naturalearthdata.com/) to make your map:
  * SJER boundary (NEON)
  * SJER plot centroids (NEON)
  * Roads (Natural Earth)
  * County boundaries (Natural Earth)

[Natural Earth Data](https://www.naturalearthdata.com/) is a good open source for political and physical boundaries that are important for map-making. For this notebook, we have compiled the Natural Earth Data for you. You can also access it using the `geoviews` library, from the same developers as `hvplot` and `holoviews`. 

YOUR TASK: Look through the Natural Earth Data and NEON Spatial Data Maps websites above and cite the data as recommended by those organizations, or in APA style.

DATA CITATIONS HERE

### Set up your analysis

The data that you will use for this week is all available from **earthpy** using the following download: 

`et.data.get_data('spatial-vector-lidar')`

In the cell below the autograding imports:
  1. Add all of the needed package imports - You will need the `geopandas` package, as well as a couple of others that you've used in the past.
  2. Download the data and assign the download path a name
  3. Set your working directory:
     * Use a conditional to ensure that this code will run correctly whether or not your chosen working directory exists
     * You can choose whatever working directory works best for the analysis, but it must be reproducible on any platform. 

In [None]:
# Import packages, download data, and set working directory here

# YOUR CODE HERE
raise NotImplementedError()


## Open And Clip Your Vector Data

The NEON **SJER** field site is located in California. Your first task is to explore the area by creating a map of California roads that has symbology that represents different road types.

### Open the roads layer and clip it using the SJER_crop extent

In the cell below:

1. Open the `california/madera-county-roads/tl_2013_06039_roads.shp` and `california/neon-sjer-site/vector_data/SJER_crop.shp` files located in your `spatial-vector-lidar` data download using GeoPandas. 
2. Reproject the roads data to be the same CRS as the area of interest using the `.to_crs()` method. They should both have the CRS of `EPSG:32611`.
  > HINT: You can get the crs of the area of interest by accessing the `.crs` property of the `GeoDataFrame`
4. Clip the data using the SJER boundary (`california/neon-sjer-site/vector_data/SJER_crop.shp`) layer. 
5. Open the SJER plot locations data (`california/neon-sjer-site/vector_data/SJER_plot_centroids.shp`). 
6. Set all `RTTYP` that are "none" to "Unknown" using the syntax: `roads-object-name["RTTYP"].fillna("Unknown", inplace=True)`

Call the **clipped and reprojected roads shapefile geodataframe object** at the 
end of the cell to ensure the tests below run.


In [None]:
# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
student_sjer_roads_clip = _
initial_clip_points = 0

if isinstance(student_sjer_roads_clip, gpd.geodataframe.GeoDataFrame):
    print("\u2705 Great! Your clipped object is a GeoDataFrame!")
    initial_clip_points += 1
else:
    print("\u274C Oops, your clipped object is not a GeoDataFrame.")

if student_sjer_roads_clip.crs == 'epsg:32611':
    print("\u2705 Great! Your clipped object has the correct CRS!")
    initial_clip_points += 1
else:
    print("\u274C Oops, your clipped object does not have the correct "
          "CRS.")
    
total_bounds_student = [
    round(b, 2) for b in student_sjer_roads_clip.total_bounds]
total_bounds_ans = [254570.57, 4107303.08, 258867.41, 4112361.92]
if total_bounds_student == total_bounds_ans:
    print("\u2705 Great! Your clipped object has the correct extent.")
    initial_clip_points += 2
else:
    print("\u274C Oops, your clipped object does not have the correct extent")

print("\n \u27A1 You received {} out of 4 points.".format(
    initial_clip_points))

initial_clip_points

## Create a Figure Of the SJER Study Area

In the cell below, add code to create your challenge figure using the 
objects that you generated above.

Create a map that shows the madera roads layer, SJER plot locations and the SJER boundary (`california/neon-sjer-site/vector_data/SJER_crop.shp`). All data should be cropped to your
SJER boundary crop extent (your Area Of Interest or AOI)

### Important Notes For Your Figure

1. Create a map of the plot locations. Color each location according to the attribute **plot_type**.
2. Plot the roads so different **road types** are represented using unique symbology using the `RTTYP` attribute. Setting the `line_color` by attribute will not work for non-point geometries. You will need to use a `for` loop, starting with the following example code:
    ```python
    for rttyp, gdf in roads_gdf.groupby('RTTYP'):
        madera_plot *= df.hvplot(line_color=roads_symb[rttyp], label=rttyp)
    ```

    > HINT: You will need to have a python dictionary called `roads_symb` that specifies a color for each road type to make this code work.
3. Add a **title** to your figure. You may also wish to set `xaxis` and `yaxis` to `'bare'`, the `data_aspect` to `'equal'`

> **IMPORTANT:** be sure that all of the data are cropped to the **same spatial extent** and **crs**. You should have done this in the previous cell, but make sure to double-check if you are having trouble plotting.

In [None]:
# YOUR CODE HERE
raise NotImplementedError()

### Take a look at the metadata

What does the RTTYP road type acronyms **M** and **S** stand for? 
Please your answer in the markdown cell BELOW. Use the `tl_2013_06039_roads.shp.xml` file in your data download to help you figure out the answer to this question


## Roads in Del Norte, Modoc & Siskiyou Counties

Create a plot of roads that are located in: Del Norte, Modoc & Siskiyou Counties. To do this, you will need the following layers:

* Counties in California: `california/CA_Counties/CA_Counties_TIGER2016.shp`
* Roads: `spatial-vector-lidar/global/ne_10m_roads/ne_10m_roads.shp` 

To create this plot, you will need to:

1. Reproject the roads and the county data to `epsg=5070`
2. Select the three counties that you want to work with in the counties dataset. One fast way to do this is using syntax as follows: 

`roads_df[roads_df['NAME'].isin(["Siskiyou", "Modoc", "Del Norte"])]`

3. Clip the roads data to the boundary of the counties that you wish to look at.
4. Assign each road segment an attribute that identifies it as within each county.

Color the roads in each county using a unique color.

HINT: use the `legend=True` argument in `.plot()` to create a legend.
Because you are only creating a legend for one layer, you can quickly use `.plot()`
rather than `ax.legend()` which is what you used to create the figure above.


### IMPORTANT: 

* Both layers need to the in the SAME coordinate reference system for you to work with them together. REPROJECT both data layers to albers `.to_crs(epsg=5070)`
* Clip the roads to the boundary of the three_counties layer that you created which only contains the 3 selected counties: `"Siskiyou", "Modoc", "Del Norte"`
* To assign each road to its respective county, you will perform a spatial join using `.sjoin()`.

In the cell below, add the code needed to 

* Open each layer
* Reproject the data 
* Clip and subset the data 

At the end of the cell, be sure to call the clipped roads layer.

In [None]:
# In this cell, add the code needed to open, reproject and clip / subset the data

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
student_three_counties = _
answer_total_bounds = [-2292272.17, 2271444.08, -1965771.03, 2452647.92]
three_counties_points = 0

if isinstance(student_three_counties, gpd.geodataframe.GeoDataFrame):
    print("\u2705 Great! Your clipped object is a GeoDataFrame!")
    three_counties_points += 1
else:
    print("\u274C Oops, your clipped object is not a GeoDataFrame.")

if student_three_counties.crs.to_epsg() == 5070:
    print("\u2705 Great! Your clipped object has the correct CRS!")
    three_counties_points += 1
else:
    print("\u274C Oops, your clipped object does not have the "
          "correct CRS.")
    
student_total_bounds = [
    round(b, 2) for b in student_three_counties.total_bounds]
if student_total_bounds == answer_total_bounds:
    print("\u2705 Great! Your clipped object has the correct extent.")
    three_counties_points += 2
else:
    print("\u274C Oops, your clipped object does not have the correct "
          "extent")

print("\n \u27A1 You received {} out of 4 points."
      .format(three_counties_points))

three_counties_points

## Challenge 2b: Figure

In the cell below,  add code to create the figure described above.


In [None]:
# Figure 2 - Place only the code required to plot your data here
# Additional processing code can go above this code cell

# YOUR CODE HERE
raise NotImplementedError()

## Challenge 3:  Calculate Total Length of Road Siskiyou, Modoc, Del Norte County in California

Create a dataframe that shows the total length of road in these counties used in plot 2: Siskiyou, Modoc, and Del Norte. To calculate this, use the data you created for plot 2.

To calculate length of each line in your geodataframe, you can use the syntax `gdf.length`. Create a new column **named length** using the syntax:

`gdf["length"] = gdf.length`

You can summarize the data to calculate total length using pandas `.groupby()` on the county column name.

Note: you can use: `pd.options.display.float_format = '{:.4f}'.format` if you'd like to turn off scientific notation for your outputs.

It should look something like this:


||length|
|----|----|
|NAME|| 
|Del Norte| road length here|
|Modoc| road length here|
|Siskiyou| road length here|


At the end of the cell, call the dataframe object

In [None]:
# TABLE 1 - Place the code required to create the dataframe

# YOUR CODE HERE
raise NotImplementedError()

In [None]:
# DO NOT MODIFY THIS CELL
# Test that the cali_roads_summary is of type dataframe and named correctly

# Let's make sure you created an object with the correct name and of the correct type above!

student_length_dataframe = _

length_points = 0

if len(student_length_dataframe) == 3:
    print("\u2705 Correct number of entries in the dataframe, good job!")
    length_points += 2
else:
    print("\u274C Incorrect amount of entries in the dataframe.")

if student_length_dataframe.length.dtype == 'float':
    print("\u2705 Length column has the correct datatype!")
    length_points += 2
else:
    print("\u274C Length column does not have the correct datetype.")
    
if round(student_length_dataframe.length.sum(), 2) == 838764.66:
    length_points += 6
    print("\u2705 Great! The summary roads data are correct!")
else:
    print("\u274C Oops, the roads summary data are not correct.")

print("\n \u27A1 You received {} out of 5 points.".format(
    length_points))
length_points