# Lab 2

## Spatial Object and Data Types (Vector)

To start off we need to import some pivotal packages you will be working with this semester; shapely, pandas and geopandas

Shapely documentation: https://shapely.readthedocs.io/en/latest/ 

Pandas documentation: https://pandas.pydata.org/

GeoPandas documentation: http://geopandas.org/

Shapely is the all-around geometrical library, Pandas is great for working with tabular data, and GeoPandas is its spatial sibling!

In [None]:
import os
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point, LineString, Polygon

If you get an error like "package doesn't exist" or the like, then we need to load that package.  Please look in the guide folder for the _download package_ file.  Follow instructions for all packages you need to download. Probably just geopandas but maybe shapely too. Use pip!

Now that we have all the packages loaded in correctly, lets try and access some documentation another way. Within the python shell it is possible to access information from within the package itself. Lets specify a few here. 

As you can see, accessing documentation is as simple as using the help() function.  You can then dive into specific functions within a library by using the. So to access documentation on the geodataframe object we use the "gpd.geodataframe".

### Spatial Objects

As we discussed in the previous lecture, vectors include three basic types of spatial objects. Lets go ahead and get started on the first one; Points. 

### Points

A point is very basically: a discrete location that is geographically defined by its coordinates.  We will discuss coordinate planes at a later point but for now the key information that defines a point is:
* Latitude - North vs. South Position (0 - 90 Degrees)
* Longitude - East vs. West position (0 - 180 Degress)


You can create a point object using Shapely.

In [None]:
my_first_point = Point(7.4, 1.3)

While not typically something you will do later in your programming development, it is always good to make sure you know exactly what type of object you are dealing with.  Particularly after instantiating one. For any object in python you can use the built in function:

type(parameter)

This will provide you with the type of the object passed into the parameter space. Pretty basic but it will help you to make sure you actually have the right type of object after you communicate it to python. 

In [None]:
print(type(my_first_point))

The Point() class is used from the shapely package to create a point.  You can pass this class x,y and even z coordinates for three-dimensional spaces. For now we are going to keep things simple with just the xy coordinates. 

Point objects have a variety of attributes that are stored. To access the coordinates for instances you can use .xy, or just .x or .y.

In [None]:
my_first_point.x

In [None]:
my_first_point.y

Anyone remember junior high or high school algebra? X's always go first and Y's always go second.  So when we said Point(7.4, 1.3) we were really telling python Point(x,y). Also you can think of these when we start projecting them onto a coordinate plane (Map) as longitude(x) and latitude(y). As mentioned before longitude is east/west and latitude is north/south. 

### Lines

Lines are the next basic type of spatial data. This can be a variety of things like railroads, highways and rivers. Lets start with making our own line. In shapely lines are called LineString.

In [None]:
my_first_line = LineString([(0, 2), (2, 2), (4, 2)])

Print the type and lets make sure it is the right object.

In [None]:
print(type(my_first_line))

Lines also contain a variety of attributes. Since this object is in essence a list of points (also can be refered to as tuples), you can access specific points by using python indexing. Lets call all X values by using the index value for the first element.  (Remeber 0 is always the first object)

In [None]:
first_line_xs = my_first_line.xy[0]

In [None]:
first_line_xs

As you can see here all of the first values within each tuple [(x1,y1), (x2,y2), (x3,y3)] are printed.  So our array is [x1, x2, x3].

## Polygons

Finally a major feature in spatial data is the polygon.  These are the units which most data is aggregated into. As we talked about before these include things like countries, states, or counties. Lets make our first polygon object.

In [None]:
my_first_polygon = Polygon([(2, 2), (6, 8), (-4, 8)])

But wait!? I thought lines were tuple pairs, how can a polygon be the same format? Good question! The answer lies in the definition of the class; LineString vs Polygon. So when we specify a LineString we are telling python that the given tuple pairs (coordinates) begin with the first value, then proceed through each subsequent point, and end at the last value. It's like a race that starts and ends in different places.  

A Polygon however, is something that is enclosed, think of an oval track.  It does not start and end in the same sense as a line because the circuit (Any EE/CE people?) is closed.  When we specify a Polygon we are telling python these coordinates define the outside of some bounded object.

Lets check on the type to make sure we got it right.

In [None]:
print(type(my_first_polygon))
print(my_first_polygon.geom_type)

Great so we have now defined our first point, line, and polygon.

Lets put it all together and plot it!

We are going to skip over the specifics of matplotlib again for now but just know: 
1. Its a library
2. %matplotlib notebook allows for you to utilize this library in a particularly useful/interactive way when using and ipython kernel.

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt

In [None]:
fig, ax = plt.subplots(1, figsize=(4, 4))
plt.scatter(my_first_point.x,my_first_point.y)
plt.plot(my_first_line.xy[0], my_first_line.xy[1])
plt.plot(my_first_polygon.exterior.xy[0], my_first_polygon.exterior.xy[1])
plt.ylabel("Y - Coordinates")
plt.xlabel("X - Coordinates")
plt.title("My First Spatial Objects Plot")

So what are we looking at here? Let's put all our data together. We know that our:
* point = (7.4, 1.3)
* line = [(0, 2), (2, 2), (4, 2)]
* polygon = [(2, 2), (6, 8), (-4, 8)]

So does our data show this? Yes! Congrats you are now basically a GIS expert. Lets go into depth what exactly we did when plotting these spatial objects.



Notice how for each of the objects we used an attribute function to get the feature we wanted specifically.
* For the point we called .x and .y to extract the specific point.
* For the line we called .xy[0] to get the list of x's, and .xy[1] to get the list of y's.
* For the polygon we only wanted to see the exterior of the object so we called .exterior.xy[0] for all x's and .exterior.xy[1] for all the y's.

We have done the boring stuff, lets transition over to some practical application with spatial objects and different data types. 

# Application - Florida

In [None]:
## Check what folder we are currently in
os.getcwd()


In [None]:
## Lets load into the needed folder for this work.
os.chdir('data/Florida')

In [None]:
os.listdir()

First thing we want to do is load in our base file.  We are going to be looking at Florida for this brief application. Shapefiles have an extension of .shp. Shapefiles also only work if their accompanying files are also present in the folder which you are loading them from. (These can include .shx, .dbf, .prj, .xml, .sbn, .sbx, .cpg)

In [None]:
florida = gpd.read_file('florida.shp') 

To load our data, all we need is the gpd.read_file function. Then we just pass the name of the shapefile in single quotes. 

Now that we know a little more about geopandas input file types, lets take a look at the data we loaded in as florida. Easiest thing to do first is to plot it out. 

In [None]:
florida.plot()

Yep that looks like florida.  Also notice that longitude and latitude occupy the x and y coordinates. Great we've loaded in a polygon, lets see what the data inside our file looks like.

In [None]:
florida.head()

This shapefile comes from the natural earth data. It has a variety of information stored within it other than the spatial components of florida. Things like and administrative code, wikipedia link, alternative language names, and most importantly the geometry column. This column is the one that gives us the important geospatial components. Lets see what else we can do with this data.

In [None]:
## The middle point of Florida
florida.centroid

In [None]:
## The outer bounds of Florida
florida.bounds

Now lets load in some more data.

In [None]:
roads = gpd.read_file('florida_roads.shp')

In [None]:
roads.plot()

The roads of florida do a good job of outlining the state itself. This makes sense as many of Florida's major cities are on a coastline (Orlando being an outlier). So the road system will likely connect these exterior units. Let's see what this data looks like.

In [None]:
roads.head()

Hmm interesting.  It looks like this data contains a number of different types of roads. Lets take a look at these and narrow it down to only the major interstates (like I-4, I-95, and I-10).

In [None]:
roads['type'].unique()

In [None]:
interstates = roads.loc[roads['type'] == "Major Highway"]

In [None]:
interstates.plot()

Alright now we have a polygon and a file that contains lines. How about a point file? 

A major political issue the United States is currently faced with is that of mass shooting events.  Being one of the only countries in the world that experiences these phenomena at such a scale it has become a contentious issue that warrants serious thought. 

One attempt to collect information on mass shootings has been undertaken by the investigative news organizations _Mother Jones_.  As a disclaimer, while I have no reason to believe this database is invalid, I always suggest to be vigiliant in checking the methodologies used to collect data.  Not all data is created equal, and being critical of sources is a valuable quality in any researcher.  

In [None]:
mass_shoot = pd.read_csv('mother_jones.csv', encoding='latin-1')

In [None]:
mass_shoot.head(5)

So we have a data set here (pandas dataframe) which looks a lot like our previous GeoDataFrame. But the type is not spatial.  In order to make this data spatial like our shapefiles, we need to provide it with a geometry.  Geometry as you will remember are the points, lines, and polygon units.  Luckily this data set does have a couple of columns that are perfect for creating a spatial data frame. These columns are longitude and latitude.

In [None]:
mass_geometry = [Point(xy) for xy in zip(mass_shoot.longitude, mass_shoot.latitude)]

Alright this line of code is a little complex so lets break it down to make sure it all makes sense. 
* [] encloses the entire operation because we want a list of geometric points. Try printing out geometry to see what it looks like.
* Point(xy) - so here we are saying that each element in the list will be a spatial Point() object. 
* for xy - xy is two elements, the x and y coordinates and we are saying for each of these in..
* in zip() - zip means to put two elements together. For example if you had a list of first_names = john , joe, jrue and last_names = white, blue, green then if we zipped these two our resulting object would be john white, joe blue, and jrue green.
* mass_shoot.longitude, mass_shoot.latitude - these are the two elements we are putting together to make the geometry object.

Now that we have a geometry object we have all the neccessary components to make the GeoDataFrame!

In [None]:
mass_gdf = gpd.GeoDataFrame(mass_shoot, geometry = mass_geometry)

In [None]:
from fiona.crs import from_epsg
mass_gdf.crs = from_epsg(4326)

In [None]:
mass_gdf.geometry

In [None]:
fig, ax = plt.subplots(1, figsize=(12, 8))
mass_gdf.plot(ax=ax,marker='P', markersize=mass_gdf['fatalities']*10, color='red',alpha=.8)
florida.plot(ax=ax, alpha=0.4)
interstates.plot(ax=ax, color='green', alpha=0.4)
ax.set_aspect('equal')
plt.ylabel("Y - Coordinates")
plt.xlabel("X - Coordinates")
plt.title("Florida Mass Shootings")