In [1]:
import pandas as pd

Our initial dataset can be retrieved from this link: <https://data.sfgov.org/api/views/tmnf-yvry/rows.csv?accessType=DOWNLOAD>.

It is larger than the maximum 100 MB file size that github allows.  Until we decide how we're going to filter it and can check it in, just make certain it's downloaded into the same directory as the ipython notebook file.

Once the dataset is in place, the following code can read in the CSV as a dataframe and report the number of rows.

In [2]:
df = pd.read_csv('./SFPD_Incidents_-_from_1_January_2003.csv')
print(df.size)

26498147


In order to confirm that we're getting the expected data, report the value in the 'Date' column for row 0.

In [3]:
date = df.get_value(0, 'Date')
print(date)

01/19/2015


This is a neat way to perform a select to filter rows on a dataframe. Note that I'm creating a smaller dataframe by selecting the rows where the 'Date' column is equal to the date read above for row 0.  Be careful, though, and note that the row indices still match those from the original dataset, which will cause you fits if you're trying to iterate over row indices with the smaller dataset since it no longer has contiguous indices.

In [4]:
df_on_date = df[df.Date == date]
df_on_date

Unnamed: 0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,Address,X,Y,Location,PdId
0,150060275,NON-CRIMINAL,LOST PROPERTY,Monday,01/19/2015,14:00,MISSION,NONE,18TH ST / VALENCIA ST,-122.421582,37.761701,"(37.7617007179518, -122.42158168137)",15006027571000
506,150065087,NON-CRIMINAL,LOST PROPERTY,Monday,01/19/2015,21:30,NORTHERN,NONE,1500 Block of VANNESS AV,-122.422063,37.789920,"(37.7899200014841, -122.422062730264)",15006508771000
5441,150146784,LARCENY/THEFT,PETTY THEFT FROM LOCKED AUTO,Monday,01/19/2015,14:00,SOUTHERN,NONE,2ND ST / FOLSOM ST,-122.396707,37.785543,"(37.7855429503041, -122.396707150502)",15014678406243
15286,150055674,NON-CRIMINAL,"AIDED CASE, MENTAL DISTURBED",Monday,01/19/2015,22:00,NORTHERN,NONE,1100 Block of POLK ST,-122.419908,37.787077,"(37.7870766696212, -122.419908299532)",15005567464020
15314,150055919,ASSAULT,"FIREARM, DISCHARGING IN GROSSLY NEGLIGENT MANNER",Monday,01/19/2015,00:15,CENTRAL,"ARREST, BOOKED",NORTHPOINT ST / POWELL ST,-122.412143,37.806758,"(37.8067579023106, -122.412143310586)",15005591904083
15315,150055919,OTHER OFFENSES,CONSPIRACY,Monday,01/19/2015,00:15,CENTRAL,"ARREST, BOOKED",NORTHPOINT ST / POWELL ST,-122.412143,37.806758,"(37.8067579023106, -122.412143310586)",15005591926080
15316,150055919,ASSAULT,THREAT OR FORCE TO RESIST EXECUTIVE OFFICER,Monday,01/19/2015,00:15,CENTRAL,"ARREST, BOOKED",NORTHPOINT ST / POWELL ST,-122.412143,37.806758,"(37.8067579023106, -122.412143310586)",15005591927171
15317,150055925,TRESPASS,TRESPASSING,Monday,01/19/2015,00:12,MISSION,NONE,3100 Block of 16TH ST,-122.422389,37.764832,"(37.7648319521443, -122.422388599852)",15005592527195
15318,150055931,DISORDERLY CONDUCT,"DISTURBING THE PEACE, FIGHTING",Monday,01/19/2015,00:50,RICHMOND,NONE,1800 Block of DIVISADERO ST,-122.440040,37.786670,"(37.7866702442116, -122.440040391343)",15005593119024
15319,150055931,NON-CRIMINAL,AIDED CASE,Monday,01/19/2015,00:50,RICHMOND,NONE,1800 Block of DIVISADERO ST,-122.440040,37.786670,"(37.7866702442116, -122.440040391343)",15005593151040


We probably need to be able to do visualizations involving
layering over a base map of San Francisco based upon coordinates
expressed like we have available in the dataset.

Some links:
* http://bokeh.pydata.org/en/latest/docs/user_guide/geo.html
* https://github.com/pbugnion/gmaps
* https://github.com/python-visualization/folium

Given that a couple of these required it, I went ahead and grabbed a Google Maps API key for a project named
"CMPE 188 - Crime Predictors". It is "AIzaSyAs6Ugy0oz0R5YAxep9-kQ170t0U2fjELQ".

So, here's an attempt with the sample code from the Bokeh-based instructions, chosen because Bokeh appears to be built into the Conda distribution.  Also, the other of couple packages I tried didn't work out very well, as you can see from going back through the commit history.

Anyway, please note that I had to run the following command before I could get the inline map to display at all.

```
jupyter nbextension enable --py --sys-prefix widgetsnbextension
```

In [5]:
from bokeh.io import output_file, output_notebook, show
from bokeh.models import (
  GMapPlot, GMapOptions, ColumnDataSource, Circle, DataRange1d, PanTool, WheelZoomTool, BoxSelectTool
)

In [6]:
# Choose one, I believe.  The first is more reliable, but
# the second is more appropriately inline, when it works.
output_file("gmap_plot.html")
#output_notebook()

In [7]:
map_options = GMapOptions(lat=37.761701, lng=-122.421582, map_type="roadmap", zoom=11)

plot = GMapPlot(
    x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options
)
plot.title.text = "San Francisco Police Departments incident locations on " + date

# For GMaps to function, Google requires you obtain and enable an API key:
#
#     https://developers.google.com/maps/documentation/javascript/get-api-key
#
# Replace the value below with your personal API key:
plot.api_key = "AIzaSyAs6Ugy0oz0R5YAxep9-kQ170t0U2fjELQ"

# Extract the coordinates for the incidents on the extracted date.
my_lats = []
my_lons = []
for row in df_on_date.itertuples():
    my_lats.append(float(row.Y))
    my_lons.append(float(row.X))

source = ColumnDataSource(
    data=dict(
        lat=my_lats,
        lon=my_lons,
    )
)

circle = Circle(x="lon", y="lat", size=15, fill_color="blue", fill_alpha=0.8, line_color=None)
plot.add_glyph(source, circle)

plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool())

In [8]:
show(plot)