Note: many sources were used for putting this Notebook together, often code and comments are included as-is from the original source. Sources are provided ahead of the content that was taken from them. Thank you to the creators of the many wonderful Geopandas resources already in existance!

# Exploring Geopandas 

First we need to import our libraries

In [None]:
%matplotlib inline

import matplotlib as mpl
import matplotlib.pyplot as plt

from shapely.geometry import Point

import pandas as pd

import geopandas as gpd
from geopandas import GeoSeries, GeoDataFrame

Let's start by making some simple graphs
Source: https://github.com/geohackweek/tutorial_contents

In [None]:
# Let's check the version of the libraries we're using. Do yours look the same as mine?
mpl.__version__, pd.__version__, gpd.__version__

There are two main data structures in GeoPandas, a GeoSeries and a GeoDataFrame. These are subclasses the Pandas Series and DataFrame, respectively.

In [None]:
# Let's create a GeoSeries, a vector where each entry in the vector is a set of shapes corresponding to one observation.
# We'll use a list of shapely Point objects using the Point constructor (note: you can also make Lines and Polygons)
gs = GeoSeries([Point(-120, 45), Point(-121.2, 46), Point(-122.9, 47.5)])
gs

In [None]:
# Check the type and length of our GeoSeries
type(gs), len(gs)

In [None]:
# Coordinates are of no use unless you know their reference system. Set the projection/crs to WGS 84, aka EPSG 4326
gs.crs = {'init': 'epsg:4326'}

In [None]:
# We can plot our points with the plot function, with some customizations
gs.plot(marker='*', color='red', markersize=100, figsize=(4, 4))

# We limit the bounds to our area, but this will happen by default
plt.xlim([-123, -119.8])
plt.ylim([44.8, 47.7]);

In [None]:
# Now let's make a GeoDataFrame, a tabular data structure that contains a GeoSeries
# Let's define a simple dictionary of lists, that we’ll use again later.
data = {'name': ['House', 'Work', 'Pet Store'],
        'lat': [45, 46, 47.5],
        'lon': [-120, -121.2, -122.9]}
print(data)

In [None]:
# Review of using dictionaries
print(list(data.keys()))
print(list(data.values()))
print(data['name'])
print(data['name'][1])

In [None]:
# Review the built-in zip method
x = [1, 2, 3]
y = [4, 5, 6]
zipped = zip(x, y)
list(zipped)

In [None]:
# Now we create a list of Point shapely objects out of the X & Y coordinate lists
# Very important - the geometry is what makes the data spatial
geo = [Point(xy) for xy in zip(data['lon'], data['lat'])]
geo

In [None]:
type(geo)

In [None]:
# We’ll wrap up by creating a GeoSeries where we explicitly define the index values
# The index is how it orients it in a row format
gs = GeoSeries(geo, index=data['name'])
gs

In [None]:
# Create a DataFrame using our GeoSeries
df = pd.DataFrame(data)
print(df)

In [None]:
# This turns our latitude and longitude 
print(type(data['lon']))
data['lon']

In [None]:
# ...into a pandas DataFrame
print(type(df))
df['lon']

In [None]:
# Finally we use the DataFrame and the “list-of-shapely-Point-objects” approach to create a GeoDataFrame. 
gdf = GeoDataFrame(df, geometry=geo)

In [None]:
# There’s nothing new to visualize, but this time we’re using the plot method from a GeoDataFrame, not from a GeoSeries. 
gdf.plot(marker='*', color='green', markersize=50, figsize=(3, 3))

# Using Geopandas Datasets
Source: http://geopandas.org/mapping.html

We have now made some point objects, but it's more fun to work with real data. Geopandas comes with some datasets that we can use!

In [None]:
# Load some sample data:
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))

In [None]:
# Let's examine the top few rows
world.head()

In [None]:
# Plot it
world.plot()

In [None]:
# Let's play with plotting the head and tail 
world.head(10).plot()
# world.tail().plot()
# world.tail(5).plot()

In [None]:
# Plot a few objects by sorting by a column
world.sort_values('pop_est', ascending = False).head(3).plot()

In [None]:
# cities is another geopandas dataset. It includes points for the capitals of each country.
cities = gpd.read_file(gpd.datasets.get_path('naturalearth_cities'))

In [None]:
# Again we'll look at the top few rows
cities.head()

In [None]:
# Plot the cities using the default style
cities.plot()

In [None]:
# Plot the cities using custom style
cities.plot(marker='*', color='green', markersize=5)

YOUR TURN #1: Play around with making the city dots different colors and sizes

In [None]:
# Your code here




In [None]:
# We can exclude Antarctica by name
world = world[(world.pop_est>0) & (world.name!="Antarctica") & (world.name!="Fr. S. Antarctic Lands")]

In [None]:
world.plot()

In [None]:
# The data came with a gdp_md_est (estimated GDP) and a pop_est (estimated population) column, 
# so we can use this data to make a new column, gdp_per_cap (GDP per capita).
# Create a new column named `gdp_per_cap` and use existing columns to calculate the GDP.
world['gdp_per_cap'] = world.gdp_md_est / world.pop_est

In [None]:
# Let's take a look at the new column
world.head()

In [None]:
# What are the 5 countries with the highest GDP?
world.sort_values('gdp_per_cap', ascending = False).head()

In [None]:
# We can plot the map, coloring our countries by their gdp_per_cap value, creating a choropleth map
world.plot(column='gdp_per_cap')

GeoPandas uses color schemes from [Color Brewer](http://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3). Check it out!

In [None]:
# We can change the style using the cmap (short for `color map`) property
world.plot(column='gdp_per_cap', cmap='Oranges')

# Try setting the cmap property to YlGn

See colormap options [here](https://matplotlib.org/users/colormaps.html).

In [None]:
# You can change the default classification scheme, note I also changed the cmap, for fun
base = world.plot(column='gdp_per_cap', cmap='BuPu', scheme='quantiles')

YOUR TURN #2: The scheme option can be set to 'equal_interval', 'quantiles', or 'fisher_jenks'. 
Try out each one. See the difference?

More info on classification schemes [here](http://pysal.readthedocs.io/en/latest/library/esda/mapclassify.html).

In [None]:
# Your code here




In [None]:
# We can plot the cities on top of our new choropleth map

# Create a variable to hold our choropleth map, call it base
base = world.plot(column='gdp_per_cap', cmap='OrRd', scheme='quantiles')

# Now when you plot the cities, set an ax property to the variable you just created
cities.plot(ax=base, marker='o', color='black', markersize=5)

# Now you will get them both on the same map

Geopandas is geo-enabled [Pandas](https://pandas.pydata.org/), a Python data science library, so we have everything that comes with Pandas already!

In [None]:
# Make a plot of gdp_per_cap
fig, ax = plt.subplots(figsize=(10,8))
_ = ax.hist(world['gdp_per_cap'], bins=40)
ax.set_title("")

In [None]:
fig, ax = plt.subplots(figsize=(10,8))
y = world.sort_values('gdp_per_cap', ascending = False)['gdp_per_cap'][:5]
x = world.sort_values('gdp_per_cap', ascending = False)['name'][:5]
ax.bar(x,y)

# Managing Projections

Source: http://geopandas.org/projections.html

We saw before how we can set a projection. We can also check a projection and re-project.

When you are doing spatial analysis all of your data MUST be in the same coordinate reference system!

In [None]:
# Check original projection/coordinate system or CRS
world.crs

In [None]:
# It's WGS-84, a spherical coordinate system AKA 'epsg:4326'
# It can't be shown on a flat surface but when you try typically gets projected into a Platte Carree projection
# This the the CRS used by GPS/Satellites and GeoJSON
base = world.plot()
base.set_title("WGS84 (lat/lon)")

In [None]:
# Set the projection to the Mercator projection (epsg=3395)
world = world.to_crs(epsg=3395)
base = world.plot()
base.set_title("Mercator")

In [None]:
# Set the projection to Polar Stereographic (epsg=3995)
world = world.to_crs(epsg=3995)
base = world.plot()
base.set_title("Polar Stereographic")

Try re-running one of the cells above where you created a plot with just world, it will now be in Polar Stereographic!

YOUR TURN #3: Make a basemap with the Mercator projection and add `cities` to it. Hint: you will have to change the crs of `cities`

In [None]:
# Your code here



You might see some streaky lines through your Mercator projection map. This is an oddity!