# Creating Web Maps in Python with GeoPandas and Folium

## Introduction


In this post, I demonstrate the use of the Python package ```folium``` to create a web map from a GeoDataFrame. Folium is built on the Leaflet javascript library, which is a great tool for creating interactive web maps. However, I use Python for all of my data wrangling and analytical tasks, so it's really nice to be able to have the web-mapping capablities from within the same environment. Using Folium and GeoPandas together makes this really easy to do.

In this example, I plot the point locations of crimes in San Francisco overlayed on a chloropleth of census tract crime density. Viewing these two layers together on a web map creates a nice way to get an overal sense of crime distribution while I also being able to view individual crime information. As I demonstrate below, these Python packages provide a nice clean and customizable way of doing this. 


In [1]:
#Import the necessary Python moduless
import pandas as pd
import geopandas as gpd
from geopandas.tools import sjoin
import folium
from folium.plugins import MarkerCluster
from folium.element import IFrame
import shapely
from shapely.geometry import Point
import unicodedata

## Data Prep
The goal of this section is to create two GeoDataFrames - one of crime points and one of census tract boundaries with crime densities. Both of these will then be plotted on a web map as separate layers.

### Read in Crime Data and Create a GeoDataFrame
First I read in a CSV file of San Francisco Police Incidents for the current year into a Pandas DataFrame. I downloaded the raw data from the San Francisco [Open Data Portal](https://data.sfgov.org/). Because there are so many crime incidents I select a subset of the data - crimes in the "assault" category that were commited in the last 10 days. As shown below, this leaves me with 329 police incidents. 

The data in this form is in a Pandas DataFrame and I want to convert it to a GeoPandas GeoDataframe (a spatial version of the former). The data comes with lat/long coordinates, which I use these to create Shapely Point Geometries (these are the values in the "geometry" field for each record). I specify the coordinate system as ESPG 4326 which represents the standard WGS84 coordinate system.


In [2]:
#read in CSV file specifying date field and encoding. Sort by date
all_crime = pd.read_csv('SFPD_Incidents_-_Current_Year__2016_.csv', parse_dates=['Date'],\
                        encoding='utf-8').sort_values('Date').reset_index(drop=True)

In [3]:
#Identify those crimes that are categorized as assaults
is_assault = all_crime.Category=='ASSAULT' 

#Identify those crimes that were committed in the most recent 10 days of the dataset
recent = all_crime.Date.isin(all_crime.Date.unique()[-10:]) 

#Subset the data to get assaults commited within the last 10 days
assaults = all_crime[is_assault&recent].drop_duplicates('IncidntNum').reset_index(drop=True)

#Create a GeoSeries of crime locations by converting coordinates to Shapely geometry objects
#Specify the coordinate system ESPG4326 which represents the standard WGS84 coordinate system
assault_geo = gpd.GeoSeries(assaults.apply(lambda z: Point(z['X'], z['Y']), 1),crs={'init': 'epsg:4326'})

#Create a geodataframe from the pandas dataframe and the geoseries of shapely geometry objects
assault_gdf = gpd.GeoDataFrame(assaults, geometry=assault_geo)
print '{} assaults in the last 10 days'.format(str(len(assault_gdf)))
assault_gdf.head()

329 assaults in the last 10 days


Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId,geometry
0,160885198,ASSAULT,THREATS AGAINST LIFE,Sunday,2016-10-30,14:20,TENDERLOIN,NONE,0 Block of MCALLISTER ST,-122.412597,37.781119,"(37.7811192121542, -122.412596970637)",16088519819057,POINT (-122.412596970637 37.7811192121542)
1,160883142,ASSAULT,BATTERY OF A POLICE OFFICER,Sunday,2016-10-30,02:00,CENTRAL,"ARREST, BOOKED",400 Block of BROADWAY ST,-122.405065,37.798013,"(37.7980134745487, -122.405065483077)",16088314204154,POINT (-122.405065483077 37.7980134745487)
2,160883158,ASSAULT,BATTERY,Sunday,2016-10-30,02:00,PARK,"ARREST, BOOKED",500 Block of BUENAVISTAWEST AV,-122.443282,37.766458,"(37.7664576548857, -122.443281739239)",16088315804134,POINT (-122.443281739239 37.76645765488571)
3,160883073,ASSAULT,BATTERY,Sunday,2016-10-30,01:35,CENTRAL,"ARREST, BOOKED",400 Block of POWELL ST,-122.408432,37.788777,"(37.7887772719153, -122.408431861057)",16088307304134,POINT (-122.408431861057 37.7887772719153)
4,160882928,ASSAULT,BATTERY OF A POLICE OFFICER,Sunday,2016-10-30,00:29,MISSION,"ARREST, BOOKED",0 Block of HOFF ST,-122.420576,37.764182,"(37.7641819463712, -122.420575720933)",16088292804154,POINT (-122.420575720933 37.7641819463712)


### Calculate Census Tract Crime Density
Next I read in a Shapefile of Census Tracts in San Francisco which I also downloaded from the SF Open Data Portal. With GeoPandas I can read a Shapefile directly into Python really easily. Then in one line of code, I spatially join census tracts to assaults (determine the census tract of each assault), and generate counts of assaults per census tract. Note that I use the ```to_crs``` function to convert assaults to the same coordinate system as Census Tracts (EPSG 3310) prior to spatially joining them.

Lastly, I calculate the number of assaults per square mile, which is the metric that I'm interested in plotting.

In [4]:
#Read tracts shapefile into GeoDataFrame
tracts = gpd.read_file('sf_census_tracts.shp').set_index('CTFIPS10')
#Generate counts of Assaults per Census Tract
tract_counts = gpd.tools.sjoin(assault_gdf.to_crs(tracts.crs), tracts.reset_index()).groupby('CTFIPS10').size()

#Calculate Assault Density. Note conversion of square meters to square miles.
tracts['AssaultsPSqMi'] = (tract_counts/(tracts.geometry.area*3.86102e-7)).fillna(0)
tracts = tracts.reset_index()
tracts.head()

Unnamed: 0,CTFIPS10,geometry,AssaultsPSqMi
0,6075010100,POLYGON ((-212660.1301711957 -20053.0335317570...,3.423806
1,6075010200,(POLYGON ((-212986.3528985226 -20191.607399463...,19.871972
2,6075010300,POLYGON ((-212512.5989250286 -20763.4272515336...,0.0
3,6075010400,POLYGON ((-211456.8561540585 -20837.2873740978...,7.721341
4,6075010500,POLYGON ((-211050.6276144625 -20707.0181740056...,15.009851


## Using Folium to Plot Data
### Create Chloropleth Layer of Tract Crime Density
First, I will create a chloropleth map of census tract crime density and then I will add crime points to on top of the chloropleth. First I create a basemap, while specifying the starting coordinates and the zoom level. Folium has a number of built-in tilesets from OpenStreetMap, MapQuest, MapBox, and CartoDB, but in this example I use the default which is OpenStreetMap.



Next I want to map Census Tract crime density as another layer symbolized as a choropleth. Leaflet maps vector geometries as geojson objects, so I first convert the tracts GeoDataFrame to a geosjon string using the to_json() method. I then specify the dataframe that contains the density data, the ID field, and the density field. There are a few other optional parameters such as fill color, fill opacity, line opacity, and chloropleth break points that I also specify. 

Lastyly 

In [15]:
map1 = folium.Map([37.7556, -122.4399], zoom_start=12)


In [16]:
def ch(map_obj, gdf, id_field, value_field):
    gjson = gdf.to_crs({'init': 'epsg:4326'}).to_json()
    map_obj.choropleth(geo_str = gjson, data = gdf,
                columns = [id_field, value_field], key_on = 'feature.properties.{}'.format(id_field),
                fill_color = 'YlOrRd', fill_opacity = 0.5, line_opacity = 0.2,  
                threshold_scale=folium.utilities.split_six(tracts[value_field]))
a=ch(map1, tracts, 'CTFIPS10','AssaultsPSqMi')

In [34]:
cent=list(tracts.to_crs({'init': 'epsg:4326'}).unary_union.centroid.coords)

In [35]:
cent

[(-122.44089367386786, 37.75588456365734)]

In [45]:
def ch(gdf, id_field, value_field):
    cent=gdf.to_crs({'init': 'epsg:4326'}).unary_union.centroid
    _map = folium.Map([cent.y, cent.x], zoom_start=12)
    gjson = gdf.to_crs({'init': 'epsg:4326'}).to_json()
    _map.choropleth(geo_str = gjson, data = gdf,
                columns = [id_field, value_field], key_on = 'feature.properties.{}'.format(id_field),
                fill_color = 'YlOrRd', fill_opacity = 0.5, line_opacity = 0.2,  
                threshold_scale=folium.utilities.split_six(tracts[value_field]))
    return _map
a=ch(tracts, 'CTFIPS10','AssaultsPSqMi')

In [8]:
map1 = folium.Map([37.7556, -122.4399], zoom_start=12)
gjson = tracts.to_crs({'init': 'epsg:4326'}).to_json()
map1.choropleth(geo_str = gjson, data = tracts,
                columns = ['CTFIPS10', 'AssaultsPSqMi'], key_on = 'feature.properties.CTFIPS10',
                fill_color = 'YlOrRd', fill_opacity = 0.5, line_opacity = 0.2,  
                threshold_scale=folium.utilities.split_six(tracts['AssaultsPSqMi']))



### Create Crime Point Cluster Layer
Now, I'm at the point where I'm ready to plot the data. First I will plot the crime point locations, which will be on of the two layers on my map. To make it more useful I will create a popups that display information. Folium lets you create HTML-rich popups called IFrames. I use this feature only in the most basic form just to display 4 lines of information - crime description, date, time, and address. There are obviously much more creative things that can be done with an IFrame popup (tables, graphs, sub-maps, etc) but for my purposes this is all I need. 

Rather than display each individual point, I will use Folium / Leaflet's marker clustering feature, which makes it easier to visualize large numbers of points. This function takes a list of lat/long coordinates and a list of pop-ups that correspond to these coordinates. I loop through each crime record and append the relevant information in order to create these lists.



Folium allows for the binding of data between Pandas DataFrames/Series and Geo/TopoJSON geometries. Color Brewer sequential color schemes are built-in to the library, and can be passed to quickly visualize different combinations:

In [47]:
popups, locations = [], [] 
for i, row in assaults.iterrows():
    #append lat and long coordinates to "locations" list
    locations.append([row.geometry.y, row.geometry.x])
    #extract values for crime description, crime address, crime date, and crime time to be used in the pop-up
    desc=unicodedata.normalize("NFKD", row['Descript']) 
    date=row['Date'].strftime("%Y-%m-%d")
    time=row['Time']
    add=row['Address']
    #create string of HTML code used for IFrame that displays each of these 4 pieces of information with linebreaks between
    label="""{}<br>{}<br>{}<br>""".format(desc, add, date, time)
    #append an IFrame with the HTML string and frame dimensions to the "popups" list
    popups.append(IFrame(label, width=300, height=100))
    
#Create a folium feature group for this layer, since we will be displaying multiple layers
crime_pt_lyr = folium.FeatureGroup(name='Assaults')
#Add the clustered points of crime locations and popups to this layer
crime_pt_lyr.add_children(MarkerCluster(locations=locations, popups=popups))

a.add_children(crime_pt_lyr)
folium.LayerControl().add_to(a)
a

In [10]:
popups, locations = [], [] 
for i, row in assaults.iterrows():
    #append lat and long coordinates to "locations" list
    locations.append([row.geometry.y, row.geometry.x])
    #extract values for crime description, crime address, crime date, and crime time to be used in the pop-up
    desc=unicodedata.normalize("NFKD", row['Descript']) 
    date=row['Date'].strftime("%Y-%m-%d")
    time=row['Time']
    add=row['Address']
    #create string of HTML code used for IFrame that displays each of these 4 pieces of information with linebreaks between
    label="""{}<br>{}<br>{}<br>""".format(desc, add, date, time)
    #append an IFrame with the HTML string and frame dimensions to the "popups" list
    popups.append(IFrame(label, width=300, height=100))
    
#Create a folium feature group for this layer, since we will be displaying multiple layers
crime_pt_lyr = folium.FeatureGroup(name='Assaults')
#Add the clustered points of crime locations and popups to this layer
crime_pt_lyr.add_children(MarkerCluster(locations=locations, popups=popups))

map1.add_children(crime_pt_lyr)
folium.LayerControl().add_to(map1)
#map1

<folium.map.LayerControl at 0x139817f0>

In [None]:
map1.save('sf_assaults.html')