[<img src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/RubeRad/tcscs/blob/master/notebooks/30_gps_tracking.ipynb)

# Analyzing GPS tracking data

The New York Times article series, [*One Nation, Tracked*](https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html) showed how easy it is to identified individuals from their smartphone GPS tracks -- and how GPS tracks cannot be "anonymized" simply by omitting names:

<blockquote>In most cases, ascertaining a home location and an office location was enough to identify a person. Consider your daily commute: Would any other smartphone travel directly between your house and your office every day?

Describing location data as anonymous is “a completely false claim” that has been debunked in multiple studies, Paul Ohm, a law professor and privacy researcher at the Georgetown University Law Center, told us. “Really precise, longitudinal geolocation information is absolutely impossible to anonymize.”

“D.N.A.,” he added, “is probably the only thing that’s harder to anonymize than precise geolocation information.”</blockquote>

We're going to do some investigative sleuthing of our own, using some GPS track data (voluntarily submitted by willing participants).  [This full dataset](https://www.microsoft.com/en-us/download/details.aspx?id=52367) from Microsoft Research contains about 25M GPS points with IDs and timestamps, in cities around the world. I've selected a subset in one particular city.

# Looking at the dataset

Load the dataset into a pandas DataFrame, and probe it to get some initial understanding

In [None]:
import numpy             as np
import matplotlib.pyplot as plt
import pandas            as pd

In [None]:
# We name this DataFrame dfall, because we're going to be filtering it down later
dfall = pd.read_csv('https://raw.githubusercontent.com/RubeRad/tcscs/master/notebooks/tracks.csv')
dfall

In [None]:
dfall.describe()

In [None]:
# LONgitude =  East/West  = X 
# LATitude  = North/South = Y
plt.plot(dfall.LON, dfall.LAT)

## Exercise

* How much data is there?
* What time span does the data cover?
* Since 1deg$\approx$100km (thus 1m$\approx$0.00001deg), what geographic extent is the dataset?
* What city is this?

As loaded, the LAT, LON columns are numerical. This creates a new column called 'LATLON' which smushes them together as strings with 'N' and 'W'. We'll use this in a minute...

In [None]:
dfall['LATLON'] = dfall.LAT.astype(str) + 'N ' + (abs(dfall.LON)).astype(str) + 'W'
dfall

# Plotting the individuals separately

All that mess plotted above is as if the GPS tracks came from just one person. We can use the ID to separate per-person track data

In [None]:
dfall.ID.value_counts()

In [None]:
ids = dfall.ID.unique()
len(ids)

Now that we have a list of all the unique IDs, we can make a loop that filters the DataFrame down to just one ID at a time, and plot them separately

In [None]:
fig = plt.figure()
ax  = fig.add_subplot()

for idx,id in enumerate(ids):
    dfid = dfall[ dfall.ID==id ]            # filter for this ID
    lbl = '{}={}'.format(idx,id)            # make a label for the legend
    ax.plot(dfid.LON, dfid.LAT, label=lbl)  # plot (rotate through colors automatically)

ax.legend()

## Exercise

Do these things to make this plot more readable:

* Adjust the `figsize`
* `ax.set_aspect('equal')` (because it's a map, we don't want it squished)
* `ax.set_xlim(  ,  )`     (make room for the legend)

# Inspect a particular track

The map above of everybody's tracks might help choose a particular track to look at in isolation. Let's just start with the first one.

In [None]:
id = ids[0]
id

In [None]:
dfid = dfall[ dfall.ID==id ]
dfid

In [None]:
fig = plt.figure(figsize=(12,12))
ax  = fig.add_subplot()

# this is so LAT/LON are proportional like a map
#ax.set_aspect('equal')
# this makes sure matplotlib doesn't put the X axis in scientific notation
#ax.ticklabel_format(useOffset=False, style='plain')
# this rotates the longitudes so they don't write on top of each other
#plt.setp(ax.get_xticklabels(), rotation=30, horizontalalignment='right')

ax.plot(dfid.LON, dfid.LAT)

In [None]:
dfid.LAT.describe()

In [None]:
dfid.LON.describe()

## Exercise

* When/where does this person start/end their day (track)?
* About how far apart are the start/end points? (1deg$\approx$100km; 1m$\approx$0.00001deg)
* Where are the start/end points on the plot?
* About how big is the total extent of the track?
* Go to Google Maps, and plot directions from the first LATLON to the last LATLON


## Identifying extreme points of the track

What's that spike on the West side? `dfid.LON.describe()` says the minimum LON is -122.143624

In [None]:
# It might work to filter for ==-122.143624 exactly, 
# but just in case of rounding error...
dfid[ dfid.LON<-122.14362 ]

Where's that southernmost bump?

In [None]:
dfid[ dfid.LAT<47.62793 ]

## Exercise

* Add those two LATLON to the Google Map directions
* Check the DATETIME, make sure they are in the right order
* How far apart are those two trackpoints in space?
* How far apart are those two trackpoints in time?
* What is the likely mode of travel?

## Identifying other parts of the track

Let's pull out that knob at the NW corner, as distinct from the two other knobs at that maximum latitude, which are further east.

In [None]:
# parentheses are critically important here
knob = dfid[ (dfid.LAT>47.6325) & (dfid.LON>-122.1425) & (dfid.LON<-122.1415) ]
knob

In [None]:
knob.describe()

In [None]:
knob[ knob.LAT > 47.63289 ]

## Exercise

* Paste this LATLON also into the Google Map directions, in proper sequence

# Homework

Select a different ID, and repeat the exercises above

* Create a map of their track
* Identify the start and end points
* Choose and isolate at least two other points of interest from somewhere in the middle of their track
* Build up GoogleMap directions for the start/selected/end points -- in chronological order
* Conclude with one or more markup cells that escribe whatever kind of inferences we can make from this data:
  * What this person was doing?
  * Where do they live? work? shop? eat? recreate?
  * How fast are they moving? What is their mode of transport?
  * etc
  * Include a link to the URL of the GoogleMap directions