In [1]:
from datascience import *
import numpy as np

import matplotlib.pyplot as plots
from mpl_toolkits.mplot3d import Axes3D
plots.style.use('fivethirtyeight')
%matplotlib inline

# License plates

We're going to look at some data colected by the Oakland Police Departament. The have automated license plate readers on their police cars, and they've built up a database of license plates that they've see -- and where and when they saw each one.

# Data collection

First, we'll gather the data. It turns out the data is publicly available on the Oakland public records site. I downloaded it and combined it into a single CSV file by myself before lecture.

In [9]:
lprs = Table.read_table('all-lprs.csv.gz', compression='gzip')

In [10]:
lprs

red_VRM,red_Timestamp,Location
1275226,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
27529C,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1158423,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1273718,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1077682,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1214195,01/19/2011 02:06:00 AM,"(37.798281000000003, -122.27575299999999)"
1062420,01/19/2011 02:06:00 AM,"(37.79833, -122.27574300000001)"
1319726,01/19/2011 02:05:00 AM,"(37.798475000000003, -122.27571500000001)"
1214196,01/19/2011 02:05:00 AM,"(37.798499999999997, -122.27571)"
75227,01/19/2011 02:05:00 AM,"(37.798596000000003, -122.27569)"


Let's start by renaming some columns, and then take a look at it.

In [12]:
lprs.relabel('red_VRM', 'Plate')
lprs.relabel('red_Timestamp', 'Timestamp')
lprs

Plate,Timestamp,Location
1275226,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
27529C,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1158423,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1273718,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1077682,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)"
1214195,01/19/2011 02:06:00 AM,"(37.798281000000003, -122.27575299999999)"
1062420,01/19/2011 02:06:00 AM,"(37.79833, -122.27574300000001)"
1319726,01/19/2011 02:05:00 AM,"(37.798475000000003, -122.27571500000001)"
1214196,01/19/2011 02:05:00 AM,"(37.798499999999997, -122.27571)"
75227,01/19/2011 02:05:00 AM,"(37.798596000000003, -122.27569)"


Phew, that's a lot of data: we can see about 2.7 million license plate read here.
Let's start by seeing what can be learned about someone, using this data -- assuming you know their license plate.

# Searching for Individuals

As a warmup, we'll take a look at ex-Mayor Jean Quan's car, and where it has been seen. Her license plate number is 6FCH845. (How did I learn that? Turns out she was in the news for getting $1000 of parking tickets, and [the news articles]() included a picture of her car, with the license plate visible. You'd be amazed by what's out there on the Internet...)

In [15]:
lprs.where('Plate', '6FCH845')

Plate,Timestamp,Location
6FCH845,11/01/2012 09:04:00 AM,"(37.79871, -122.276221)"
6FCH845,10/24/2012 11:15:00 AM,"(37.799695, -122.274868)"
6FCH845,10/24/2012 11:01:00 AM,"(37.799693, -122.274806)"
6FCH845,10/24/2012 10:20:00 AM,"(37.799735, -122.274893)"
6FCH845,05/08/2014 07:30:00 PM,"(37.797558, -122.26935)"
6FCH845,12/31/2013 10:09:00 AM,"(37.807556, -122.278485)"


Ok, so her car shows up 6 times in this data set. However, it's hard to make sense of those coordinates. I don't know about you, but I can't read GPS so well.

So, let's work out a way to show shere her car has been seen on a map. We'll need to extract the latitude and longitude, as the data isn't quite in the format that the mapping software expects: the mapping software expects the latitude to be in one column and the longitude in anote. Let's write some Python code to do that, by splitting the Location string into two pieces: the stuff before the comman (the latitude) and the stuff afet (the longitude).

In [16]:
'37.79871, -122.276221) '.split(',')

['37.79871', ' -122.276221) ']

In [21]:
def get_latitude(s):
    before, after = s.split(',')         # Break it into two parts
    lat_string = before.replace('(', '') # Get rid og the annoying '('
    return float(lat_string)             # Convert the string to a number

def get_longitude(s):
    before, after = s.split(',')                  # Break it into two parts
    long_string = after.replace(')', '').strip() # Get rid og the annoying ')' and spaces
    return float(long_string)                     # Convert the string to a number

Let's test it to make sure it works correctly.

In [23]:
get_latitude('37.79871, -122.276221)')

37.79871

In [24]:
get_longitude('37.79871, -122.276221)')

-122.276221

Good, now we're ready to add these as extra columns to the table.

In [25]:
lprs = lprs.with_columns(
    'Latitude', lprs.apply(get_latitude, 'Location'),
    'Longitude', lprs.apply(get_longitude, 'Location'),
)
lprs

Plate,Timestamp,Location,Latitude,Longitude
1275226,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)",37.7983,-122.276
27529C,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)",37.7983,-122.276
1158423,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)",37.7983,-122.276
1273718,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)",37.7983,-122.276
1077682,01/19/2011 02:06:00 AM,"(37.798304999999999, -122.27574799999999)",37.7983,-122.276
1214195,01/19/2011 02:06:00 AM,"(37.798281000000003, -122.27575299999999)",37.7983,-122.276
1062420,01/19/2011 02:06:00 AM,"(37.79833, -122.27574300000001)",37.7983,-122.276
1319726,01/19/2011 02:05:00 AM,"(37.798475000000003, -122.27571500000001)",37.7985,-122.276
1214196,01/19/2011 02:05:00 AM,"(37.798499999999997, -122.27571)",37.7985,-122.276
75227,01/19/2011 02:05:00 AM,"(37.798596000000003, -122.27569)",37.7986,-122.276


And at last, we can draw a map with a marker everywhere that her car has been seen.

In [26]:
jean_quan = lprs.where('Plate', '6FCH845').select('Latitude', 'Longitude', 'Timestamp')
Marker.map_table(jean_quan)

Ok, so it's been seen near the Oakland police department. This should make you suspect we might be getting a bit of a biased sample. Why might the Oakland PD be the most common place where her car is seen? Can you come up with a plausble explanation for this?

 # Poking around
 
 Let's try another. And let's see if we can make the map a little more fancy. It'd be nice to distinguish between license plate reads that are seen during the daytime (on a weekday), vs the evening (on a weekday), vs on a weekend. So we'll color-code the markers. To do this, we'll write some Python code analyze the Timestamp and choose and appropiate color.