# STEP 1: Site map

We’ll need some Python libraries to complete this workflow.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It: Import necessary libraries</div></div><div class="callout-body-container callout-body"><p>In the cell below, making sure to keep the packages in order, add
packages for:</p>
<ul>
<li>Working with DataFrames</li>
<li>Working with GeoDataFrames</li>
<li>Making interactive plots of tabular and vector data</li>
</ul></div></div>

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>What are we using the rest of these packages for? See if you can
figure it out as you complete the notebook.</p></div></div>

The libraries that were already input are for a couple of things. os and pathlib are used to manage the creation of directories, and can be used to move the working directory. 

Xarray, rioxarray, and earthpy will be used to work with raster data.

Geopandas and pandas will be used to process dataframes.

I'm not actually quite sure what json is for - the package is for interfacing with JavaScript, but I'm not sure when we worked with JavaScript.


In [1]:
# import libraries

import json
import os
import pathlib

import geopandas as gpd
import pandas as pd
import rioxarray as rxr
import xarray as xr

import earthpy
import hvplot.pandas
import hvplot.xarray
import matplotlib


We have one more setup task. We’re not going to be able to load all our
data directly from the web to Python this time. That means we need to
set up a place for it.

> **GOTCHA ALERT!**
>
> A lot of times in Python we say “directory” to mean a “folder” on your
> computer. The two words mean the same thing in this context.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><p>In the cell below, replace ‘my-data-folder’ with a
<strong>descriptive</strong> directory name.</p></div></div>

In [2]:
project = earthpy.Project(
    "Gila River Vegetation", dirname='vegetation-data')
project.get_data()

The cell above seemed to utilize some sort of conditional - I had already downloaded the data in the 00 notebook, and this cell ran super quickly. Interesting...

# STEP 0: Set up

To get started on this notebook, you’ll need to restore any variables
from previous notebooks to your workspace. To save time and memory, make
sure to specify which variables you want to load.

In [3]:
%store -r

## Study Area: **Gila River Indian Community**

### Earth Data Science data formats

In Earth Data Science, we get data in three main formats:

| Data type | Descriptions | Common file formats | Python type |
|------------------|------------------|------------------|------------------|
| Time Series | The same data points (e.g. streamflow) collected multiple times over time | Tabular formats (e.g. .csv, or .xlsx) | pandas DataFrame |
| Vector | Points, lines, and areas (with coordinates) | Shapefile (often an archive like a `.zip` file because a Shapefile is actually a collection of at least 3 files) | geopandas GeoDataFrame |
| Raster | Evenly spaced spatial grid (with coordinates) | GeoTIFF (`.tif`), NetCDF (`.nc`), HDF (`.hdf`) | rioxarray DataArray |

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-read"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Read More</div></div><div class="callout-body-container callout-body"><p>Check out the sections about about <a
href="https://www.earthdatascience.org/courses/use-data-open-source-python/intro-vector-data-python/spatial-data-vector-shapefiles/">vector
data</a> and <a
href="https://www.earthdatascience.org/courses/intro-to-earth-data-science/file-formats/use-spatial-data/use-raster-data/">raster
data</a> in the textbook.</p></div></div>

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>For this coding challenge, we are interested in the boundary of the
<span data-__quarto_custom="true" data-__quarto_custom_type="Shortcode"
data-__quarto_custom_context="Inline"
data-__quarto_custom_id="2"></span>, and the health of vegetation in the
area measured on a scale from -1 to 1. In the cell below, answer the
following question: <strong>What data type do you think the boundary
will be? What about the vegetation health?</strong></p></div></div>

The boundary will be a vector, as boundaries are lines in space, but not continuous. Vegetation health will be a raster, as it is continuous data across the area we're looking at.

### Load the **Gila River Indian Community** boundary

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><ul>
<li>Locate the Tribal Subdivision files in your download directory</li>
<li>Change <code>'subdivision-directory'</code> to the actual
location</li>
<li>Load the data into Python and check that it worked</li>
</ul></div></div>

In [5]:
# Load in the boundary data
aitsn_gdf = gpd.read_file(project.project_dir / 'tl_2020_us_aitsn')

# Check that it worked
aitsn_gdf

Unnamed: 0,AIANNHCE,TRSUBCE,TRSUBNS,GEOID,NAME,NAMELSAD,LSAD,CLASSFP,MTFCC,FUNCSTAT,ALAND,AWATER,INTPTLAT,INTPTLON,geometry
0,2430,653,02419073,2430653,Red Valley,Red Valley Chapter,T2,D7,G2300,A,922036695,195247,+36.6294607,-109.0550394,"POLYGON ((-109.2827 36.64644, -109.28181 36.65..."
1,2430,665,02419077,2430665,Rock Point,Rock Point Chapter,T2,D7,G2300,A,720360268,88806,+36.6598701,-109.6166836,"POLYGON ((-109.85922 36.49859, -109.85521 36.5..."
2,2430,675,02419081,2430675,Rough Rock,Rough Rock Chapter,T2,D7,G2300,A,364475668,216144,+36.3976971,-109.7695183,"POLYGON ((-109.93053 36.40672, -109.92923 36.4..."
3,2430,325,02418975,2430325,Indian Wells,Indian Wells Chapter,T2,D7,G2300,A,717835323,133795,+35.3248534,-110.0855000,"POLYGON ((-110.24222 35.36327, -110.24215 35.3..."
4,2430,355,02418983,2430355,Kayenta,Kayenta Chapter,T2,D7,G2300,A,1419241065,1982848,+36.6884391,-110.3045616,"POLYGON ((-110.56817 36.73489, -110.56603 36.7..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
479,1310,100,02418907,1310100,1,District 1,28,D7,G2300,N,139902197,0,+33.0600842,-111.5806313,"POLYGON ((-111.63622 33.11798, -111.63405 33.1..."
480,4290,550,02612186,4290550,Mission Highlands,Mission Highlands,00,D7,G2300,N,6188043,0,+48.0754384,-122.2507432,"POLYGON ((-122.27579 48.07128, -122.27578 48.0..."
481,0855,400,02418941,0855400,Fort Thompson,Fort Thompson District,07,D7,G2300,N,535432708,38653364,+44.1559680,-099.4467700,"POLYGON ((-99.66452 44.25269, -99.66449 44.255..."
482,0335,300,02784108,0335300,Indian Point,Indian Point Segment,T3,D7,G2300,N,326985,0,+48.0604594,-092.8466753,"POLYGON ((-92.85187 48.05944, -92.85186 48.059..."


You might notice in this dataset that some of the names are not easily
searchable. For example, the Gila River subdivisions are named “District
1-7”! So, how do we know what to search for? We recommend making an
**interactive** plot of the data so that you can find the information
you need, e.g.:

In [None]:
# Plot the aitsn_gdf to take a look at what it contains

aitsn_gdf.hvplot(
    geo=True, tiles='EsriImagery', 
    frame_width=500,
    legend=False, fill_color=None, edge_color='white',
    # This parameter makes all the column values in the dataset visible.
    hover_cols='all')



<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>What column could you use to uniquely identify the subdivisions of
the reservation you want to study using this interactive map? What value
do you need to use to filter the <code>GeoDataFrame</code>?</p></div></div>

It looks like the AIANNHCE column might be what we need to use to filter the GDF to get just the Gila River rows. District 1-7 have the value 1310. We can use a conditional, like aitsn_gdf['AIANNCHE'] == 1310

Now that you have the info you need, it’s also a good idea to check the
data type. For example, we suggest looking at the `AIANNHCE` column…but
is that value some kind of **number** or an **object** like a text
string? We can’t tell just by looking, which is where our friend the
`.info()` method comes in:

In [7]:
aitsn_gdf.info()

<class 'geopandas.geodataframe.GeoDataFrame'>
RangeIndex: 484 entries, 0 to 483
Data columns (total 15 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   AIANNHCE  484 non-null    object  
 1   TRSUBCE   484 non-null    object  
 2   TRSUBNS   484 non-null    object  
 3   GEOID     484 non-null    object  
 4   NAME      484 non-null    object  
 5   NAMELSAD  484 non-null    object  
 6   LSAD      484 non-null    object  
 7   CLASSFP   484 non-null    object  
 8   MTFCC     484 non-null    object  
 9   FUNCSTAT  484 non-null    object  
 10  ALAND     484 non-null    int64   
 11  AWATER    484 non-null    int64   
 12  INTPTLAT  484 non-null    object  
 13  INTPTLON  484 non-null    object  
 14  geometry  484 non-null    geometry
dtypes: geometry(1), int64(2), object(12)
memory usage: 56.8+ KB


<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-respond"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Reflect and Respond</div></div><div class="callout-body-container callout-body"><p>What is the data type of the <code>AIANNHCE</code> column? How will
that affect your code?</p></div></div>

The AIANNHCE column is an object, not an integer as I thought. This will require one change - otherwise the conditional will still work.

We'll need to use aitsn_gdf.AIANNHCE == '1310' instead.

Let’s go ahead and select the Gila River subdivisions, and make a site
map.

<link rel="stylesheet" type="text/css" href="./assets/styles.css"><div class="callout callout-style-default callout-titled callout-task"><div class="callout-header"><div class="callout-icon-container"><i class="callout-icon"></i></div><div class="callout-title-container flex-fill">Try It</div></div><div class="callout-body-container callout-body"><ol type="1">
<li>Replace <code>identifier</code> with the value you found from
exploring the interactive map. Make sure that you are using the correct
<strong>data type</strong>!</li>
<li>Change the plot to have a web tile basemap, and look the way you
want it to.</li>
</ol></div></div>

In [8]:
# Select and merge the subdivisions you want
gdf = aitsn_gdf.loc[aitsn_gdf.AIANNHCE == '1310'].dissolve()

# Plot the results with web tile images
gdf.hvplot(
    geo=True, tiles="EsriImagery",
    frame_width=500,
    legend=False, fill_color=None, edge_color='white',
    hover_cols='all'
)



# STEP -1: Wrap up

Don’t forget to store your variables so you can use them in other
notebooks! Replace `var1` and `var2` with the variable you want to save,
separated by spaces.

In [10]:
%store aitsn_gdf project gdf

Stored 'aitsn_gdf' (GeoDataFrame)
Stored 'project' (Project)
Stored 'gdf' (GeoDataFrame)


Finally, be sure to `Restart` and `Run all` to make sure your notebook
works all the way through!