# Mapping and Basic Spatial Analysis with GeoPandas
**Introduction to Python Programming for Earth Scientists**, session #13, 11 october 2023

## Goals
* Display maps from a GeoPandas data frame
* Convert data frames to other projections
* Calculate areas of rows in data frames

Last class we looked at the Pandas library which organized data into "data frames'.  Today we are going to look at the GeoPandas library.  This extends the pandas data frame to include a 'geometry' column which contains spatial information.  First lets import the geopandas library and load up a shapefile (a file format contianing geographic information) as a data frame.

In [None]:
# set up th emodules we will use
# you will need to be in the Develop kernel

import geopandas as gpd
import os
import math

## What is the GeoPandas Module?
Geopandas is a python module that extends the concept of a pandas dataframe with support for a column containing geometry (points, lines, or polygons) and operations for working with geospatial data.

## Reading Files
Like with Pandas we will be reading in a file with a function that will return a geopandas opject.  Instead of using a `.csv` file, we will be using a "hapefile"  This typically contains:
* geometric objects
* a table with information about each object
* Spatial and other metadata information 

These are stored in seperate files with the same name and different extentions.  Conventially the files are worked with together and referenced by the `.shp` extenion.

To make things simpler, geopandas can load all the files together from a zip.  Loading shapefiles in geopandas is done with the `read_file` function.

In [None]:
# here we set the path name as a variable
geology_path="geology_a_co.zip"

In [None]:
# create a new geo dataframe
geo_df = gpd.read_file(geology_path)

Now we can view our data frame just like a pandas data frame

In [None]:
geo_df.head()

As you can see, 7335 rows, which correspond to different mapped geologic units across the state.  There are a variety of different columns which refer to different aspects of the units.  Some, like UNIT_AGE and ROCKTYPE1 are more obvious than others (like FIPS_C).  There is also the geometry column.  That's what makes this a geo data frame!

## Filtering data

We can filter this like we would a pandas dataframe.  This syntax is valid for Pandas AND GeoPandas dataframes

In [None]:
# Select all of the rows of the data frame where the ROCKTYPE1 field is sandstone, 
# and print the head
sandstone_df = geo_df[geo_df['ROCKTYPE1']=='sandstone']
sandstone_df.head()

As you can see, even though we have a new data frame it keeps the row indexing of the original data frame.

### A peak under the hood of data frame selection
This syntax `df[df["key"] == value]` is a little odd, but it helps to understand exactly whats happening, particularly, what the thing we are passing as our data frame selector (`df['key'] ==value`) is.

First, when we select a key in a data frame, we are returning that column:

In [None]:
# Select a key
geo_df['ROCKTYPE1']

Here we have the rocktype value for every row.  Now what happens if we use this in a boolean expresion, as we did when we selected all of the sandstone units?

In [None]:
geo_df['ROCKTYPE1']=='sandstone'

You can see we get a data frame column of booleans.  This acts as the filter that we pass to our data frame.  When ever we pass a column of booleans to a data frame selection, (instead of say, a column name), it returns only the rows that are `True` in our boolean filter that we pass.  We could construct our own list of booleans and pass that and it would work the same way.

 ## <font color = green> IN-CLASS PRACTICE </font> 
 Can you print out the number of granite units in colorado?

In [None]:
# Select the granite units from the data frame

# Get the total number of units and print that value.

## Mapmaking with Geopandas
This is a GEOdata frame so we can do GEOGRAPHIC things, like make maps.  To do this, we call the 'plot' method of our data frame

In [None]:
geo_df.plot()

Not the most exciting map, but you can see the unit boundaries.  Also note the axes.  The numbers represent degrees of latitude and longitude, like we'd expect for a map.  

Lets get a little more detail on this by coloring via rocktype.  We do this with the "column" parameter of the plot method.  We'll also include a parameter to make sure they add a legend

In [None]:
geo_df.plot(column='ROCKTYPE1', legend=True)

Very pretty but this legend is in the way!  How could I move it?

 ## <font color = green> IN-CLASS PRACTICE </font> 
Can you make a plot of just the sandstone units of Colorado colored by age?

## Applying functions to columns

This is great, and we can easily pick out rocks with the same age but its not imedietly clear which rocks are older than the others if you're not familiar with the geologic time scale.  What if we want to assign age a NUMBER instead of a name?  I have a file mapping geologic periods to the age (in millions of years).  Note that I've chosen ages that are related to stratigraphic ages given but there is no perfect mapping.  The numbers should be taken with a grain of salt, and are for this exercise only.  Lets read that in 

In [None]:
# open my csv file
timescale_file = open('timescale.csv', 'r')
# read the lines in as a list
ts_lines = timescale_file.readlines()
# pull out the header
header = ts_lines[0]
# make an empty dictionary
ts_dict = {}
# loop over my line list, starting with the second line (index 1)
for line in ts_lines[1:]:
    # split the line into a period (string) and age (int)
    period, age = line.split(',')
    # add that key value pair to my dictionary
    ts_dict[period]=float(age)
print(ts_dict)


now we can use this dictionary to assign a new column to our data frame.  To do this we need a function that uses are dictionary to map values, I've made a simple one below

In [None]:
def get_my_age(age_string):
        return ts_dict[str(age_string)]

Now we can use the apply method to run the function on every row of our data frame

In [None]:
geo_df['age (ma)'] = geo_df['UNIT_AGE'].apply(get_my_age)

What this line does is:
1. `geo_df['UNIT_AGE']` - pull out the age column from the data frame
2. `.apply(get_my_age)` - run the `get_my_age` function on every row of 'UNIT_AGE' column
3. `geo_df['age (ma)'] =` - Assign the result of everything on the right side of the equal side  to the 'age_ma' column of the geo_df data frame, making a new column if necessary.

And now we can plot that numerical value

In [None]:
geo_df.plot(column='age (ma)', legend=True,cmap='OrRd')

Wow!  Look at all those OLD rocks.  It turns out the oldest rocks on the map correspond to units mapped as being from the early proterozoic.  We can view just them like this:

In [None]:
geo_df[geo_df['UNIT_AGE']=='Early Proterozoic'].plot(column='UNIT_AGE', legend=True)

I wonder how much of colorado is OLD rock.  Well it turns out that geopandas has an area attribute to get that information from each geometry!

In [None]:
geo_df.area

Pay attention to these SMALL areas and that warning!  The coordinates of our data frame are latitude and logitude.  This is telling us that unit - is 0.012 square...degrees?  This isn't a good way to think about area.  As the warning indicates, we want to "project" this data into a coordinate system that uses meters as its unit.  We're going to use UTM zone 13N (EPSG 32613), don't worry too much about what this means.

In [None]:
geo_df_utm=geo_df.to_crs(epsg=32613)

In [None]:
geo_df_utm.area

That looks much more reasonable!  Now lets get a total area of early proterzoic rocks

In [None]:
early_proto_area = geo_df_utm[geo_df_utm['UNIT_AGE']=='Early Proterozoic'].area.sum()
print("There are", early_proto_area, "square meters of Early Proterozoic rocks in Colorado")

I don't actually know how many square meters colorado is, what percentage of colorado is that?
 ## <font color = green> IN-CLASS PRACTICE </font> 


In [None]:
# Calculate and print the percentage of Colorado that is Early Proterozoic rock
# Print no more than two decimal places

Now what is the most common type of rock in colorado by area?

In [None]:
# Figure out what rock type is most common