## Plotting My Google Location History

In [1]:
import json
import pandas as pd
import datetime
import folium
import requests

from sklearn.cluster import KMeans
from pandas import DataFrame

I have been using Android phones since about 2010 and for the most part of it I have had location history enabled. If you enjoy data like I do then you may agree that this service is pretty awesome because Google let's you download a copy of the data it collects from you using Google Takeout. I recently downloaded a copy of the places I have visited and got to analyzing.

My data set for the period consists of approximately 985 000 records starting from 2012-02-19. I wanted to try the out the folium plotting library which combines the data analysis strength of Python and the mapping prowess of Leaflets.js however rendering all the records on a single map proved problematic on my machine. Therefore in order to reduce the data set I used k means to assign each point to 1 of 100 clusters. After reducing the data set I then proceeded to use mean longitude and latitude per cluster as a proxy for my location history. This reduced data renders instantly and it is interactive which is pretty cool. 

I am looking forward to doing some more traveling.

In [2]:
%%time
with open(r'./Data/Takeout/Location History/LocationHistory.json', 'r') as f:
    data = json.load(f)

CPU times: user 3.99 s, sys: 257 ms, total: 4.25 s
Wall time: 4.78 s


In [3]:
len(data['locations'])

984901

In [5]:
locations = data['locations']

data = DataFrame(locations)[['latitudeE7', 'longitudeE7', 'timestampMs']]

data['date_time'] = pd.to_datetime(data.timestampMs, unit='ms')
data.set_index('date_time', inplace=True)

del data['timestampMs']

In [6]:
data.head()

Unnamed: 0_level_0,latitudeE7,longitudeE7
date_time,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-09-18 15:00:02.517,-261707340,280526522
2016-09-18 14:47:04.670,-261690408,280531879
2016-09-18 14:46:49.461,-261707550,280526884
2016-09-18 14:40:09.633,-261707550,280526884
2016-09-18 14:33:59.466,-261707295,280526589


## Plotting the points

Given that there is about 1 million records and rendering this on my machine poses a problem, I need some way of reducing the number of points. This got me thinking "Clustering could provide an elegant solution to this problem". 

I decided to use K-means with 100 centroids. My theory is that this will reduce the noise in my location history and provide me with 100 broad locations which I have visited. The results of this were quite satisfactory.

In [7]:
cls = KMeans(100)

cls.fit(data.values)

KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=100, n_init=10,
    n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,
    verbose=0)

In [17]:
summary = DataFrame(cls.cluster_centers_, columns=['latitudeE7', 'longitudeE7'])

summary.head()

Unnamed: 0,latitudeE7,longitudeE7
0,-261364000.0,280454100.0
1,429580000.0,171413900.0
2,95563190.0,1000475000.0
3,-235305000.0,-466225800.0
4,-339043800.0,191093900.0


In [16]:
_map = folium.Map()

for coord in summary.to_dict(orient='rows'):

    folium.Marker(
        [
            coord['latitudeE7'] / 10000000,
            coord['longitudeE7'] / 10000000,
        ],
        icon=folium.Icon()
    ).add_to(_map)

_map