# Probe Data - Map Matching
## Nick Paras | Kapil Garg

### Assignment 2

Input: Probe data and map [probe_data_map_matching.rar](https://canvas.northwestern.edu/courses/51440/files/3334329/download?wrap=1)

-The raw probe points in Germany collected in 9 months

-The link data for the links that probe points can be map-matched to.

Tasks:
-- map match probe points to road links

-- derive road slope for each road link

-- evaluate the derived road slope with the surveyed road slope in the link data file

**Please submit your code and slides presentation of your approach and results including evaluation comparing with the slopes in the link data file**

### Setup

We use **Python 3.6** and rely on a number of dependencies for computation and visualization. To easily install everything, we have included all of our dependencies in `environment.yml`. For quick setup, please create a conda environment with the following:

    $ conda create --name probe-data -f environment.yml

and then activate the conda environment with

    $ source activate probe-data

In [11]:
# Imports
import os
import math
import csv
import operator
import multiprocessing as mp
import itertools
import time
import json
import gmplot

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import nvector as nv

from datetime import datetime
from haversine import haversine
from functools import reduce
from bs4 import BeautifulSoup
from IPython.display import IFrame

# Custom classes
import link_classes as lc
import dist_functions as dist

## for reloading while editing custom classes
from importlib import reload
reload(lc)
reload(dist)

%matplotlib inline

# Constants
DATA_DIR = '../data'

GOOGLE_MAPS_KEY = ''
with open('config.json') as data_file:
    data = json.load(data_file)
    GOOGLE_MAPS_KEY = data['google-maps-key']
    
FRAME = nv.FrameE(a=6371e3, f=0)

In [12]:
# Utility functions
def bearing(start, end):
    """
    Computes the bearing in degrees between two geopoints
    
    Inputs:
        start (tuple of lat, long): starting geolocation
        end (tuple of lat, long): ending geolocation
    
    Outputs:
        (float): bearing in degrees between start and end
    """
    phi_1 = math.radians(start[0])
    phi_2 = math.radians(end[0])
    lambda_1 = math.radians(start[1])
    lambda_2 = math.radians(end[1])
    
    x = math.cos(phi_2) * math.sin(lambda_2 - lambda_1)
    y = math.cos(phi_1) * math.sin(phi_2) - (math.sin(phi_1) * math.cos(phi_2) * math.cos(lambda_2 - lambda_1))

    return (math.degrees(math.atan2(x, y)) + 360) % 360

## Loading Probe Data for Map Matching

Here we'll load our data from the two csv's into Pandas DataFrames.

In [13]:
probe_headers = ['sampleID', 
                 'dateTime', 
                 'sourceCode', 
                 'latitude', 
                 'longitude', 
                 'altitude', 
                 'speed', 
                 'heading']

probe_data = pd.read_csv(os.path.join(DATA_DIR, 'Partition6467ProbePoints.csv'), header=None, names=probe_headers)
probe_data.drop_duplicates(inplace=True)
probe_data['id'] = probe_data['sampleID'].map(str) + '_' + probe_data['dateTime']
probe_data['dateTime'] = pd.to_datetime(probe_data['dateTime'], format='%m/%d/%Y %I:%M:%S %p')
probe_data.head()

Unnamed: 0,sampleID,dateTime,sourceCode,latitude,longitude,altitude,speed,heading,id
0,3496,2009-06-12 06:12:49,13,51.496868,9.386022,200,23,339,3496_6/12/2009 6:12:49 AM
1,3496,2009-06-12 06:12:54,13,51.496682,9.386157,200,10,129,3496_6/12/2009 6:12:54 AM
2,3496,2009-06-12 06:12:59,13,51.496705,9.386422,201,21,60,3496_6/12/2009 6:12:59 AM
3,3496,2009-06-12 06:13:04,13,51.496749,9.38684,201,0,360,3496_6/12/2009 6:13:04 AM
4,3496,2009-06-12 06:13:09,13,51.496864,9.387294,199,0,360,3496_6/12/2009 6:13:09 AM


In [14]:
link_headers = ['linkPVID', 
                'refNodeID', 
                'nrefNodeID', 
                'length', 
                'functionalClass', 
                'directionOfTravel', 
                'speedCategory', 
                'fromRefSpeedLimit', 
                'toRefSpeedLimit', 
                'fromRefNumLanes', 
                'toRefNumLanes', 
                'multiDigitized', 
                'urban', 
                'timeZone', 
                'shapeInfo', 
                'curvatureInfo', 
                'slopeInfo']

# load raw link data
link_data = pd.read_csv(os.path.join(DATA_DIR, 'Partition6467LinkData.csv'), header=None, names=link_headers)
link_data.head()

Unnamed: 0,linkPVID,refNodeID,nrefNodeID,length,functionalClass,directionOfTravel,speedCategory,fromRefSpeedLimit,toRefSpeedLimit,fromRefNumLanes,toRefNumLanes,multiDigitized,urban,timeZone,shapeInfo,curvatureInfo,slopeInfo
0,62007637,162844982,162809070,335.04,5,B,7,30,30,0,0,F,T,0.0,51.4965800/9.3862299/|51.4994700/9.3848799/,,
1,567329767,162844982,162981512,134.56,5,B,7,0,0,0,0,F,T,0.0,51.4965800/9.3862299/|51.4966899/9.3867100/|51...,,
2,62007648,162877732,162844982,97.01,5,B,7,30,30,0,0,F,T,0.0,51.4962899/9.3849100/|51.4965800/9.3862299/,,
3,78670326,162877732,163152693,314.84,5,B,7,30,30,0,0,F,T,0.0,51.4962899/9.3849100/|51.4990000/9.3836099/,,
4,51881672,174713859,174587951,110.17,3,B,6,50,50,2,2,F,T,0.0,53.0643099/8.7903400/45.79|53.0650299/8.791470...,,0.00/-0.090|110.17/0.062


In [15]:
# create link data lookup dictionary
links = []
link_db = lc.LinkDatabase()
with open(os.path.join(DATA_DIR, 'Partition6467LinkData.csv'), 'r') as csvfile:
    rdr = csv.DictReader(csvfile, delimiter=',', fieldnames=link_headers)
    for r in rdr:
        rl = lc.RoadLink(r)
        links.append(rl)
        link_db.insert_link(rl)

## Trajectory-Based Road Matching
For this approach, we compute the absolute heading along each segment of road as a metric of the road shape. Then for each probe data point, we find the nearest set of road segments and compare the heading for the segment to the road segments to find the closest match. To address traveling the in the opposite direction of the road heading, we project all headings to be between 0$^{\circ}$-180$^{\circ}$ so that all comparisons are done equally.

### Compute heading for road segments

In [16]:
# Parse out shapeInfo and make averageLocation and heading columns
link_data['shapeArray'] = link_data['shapeInfo'].apply(lambda x: [[float(j) for j in i.split('/')[:2]] for i in x.split('|')])
link_data['averageLocation'] = link_data['shapeArray'].apply(lambda x: reduce(np.add, x) / len(x))
link_data['heading'] = link_data['shapeArray'].apply(lambda x: bearing(x[0], x[-1]))
link_data['flipped_heading'] = (link_data['heading'] + 180) % 360

### Find n-nearest road segments

In [7]:
def nearest_n_segments(lat, long, n):
    """
    Uses link_db to find nearest n road segments
    
    Inputs:
        lat (float): latitude of probe point
        lon (float): longitude of probe point
        n (int): number of roads road segments to return
        
    Output: 
        (list of tuples): (linkPVID, distance) of n-nearest road segments 
        
    """
    # find nearest n links
    output = []
    try:
        link_search = [(x, haversine(x.refLatLon, (lat, long))) for x in link_db.get_links(lat, long)]
        link_search.sort(key=operator.itemgetter(1))
        link_search = link_search[0:n]

        # extract only link PVIDs from search
        output = [(int(x[0].linkPVID), x[1]) for x in link_search]
    except KeyError:
        pass
    
    return output

### Find closest segment by heading

In [8]:
def closest_by_heading(road_links, probe_heading):
    """
    Returns link with closest heading for given probe_heading
    
    Inputs:
        road_links (list of tuples): list of nearest (linkPVID, distance) tuples
        probe_heading (float): heading of probe gps point
    
    Outputs:
        (float): linkPVID for closest link
    """
    if len(road_links) == 0:
        return -1
    
    # get relevant links
    road_link_df = pd.DataFrame({'linkPVID': [x[0] for x in road_links], 'distances': [x[1] for x in road_links]})
    link_headings = link_data[link_data['linkPVID'].isin(road_link_df['linkPVID'])]
    link_headings = link_headings.merge(road_link_df)    
    
    # compute metric from distance and difference in angle. check both directions
    link_headings['angle_diff'] = pd.DataFrame([np.abs(link_headings['heading'] - probe_heading), \
                                                np.abs(link_headings['flipped_heading'] - probe_heading)]).min()
    link_headings['metric'] = link_headings['distances'] * link_headings['angle_diff']
    
    # pick one with lowest metric and return its linkPVID
    link_headings = link_headings.sort_values(by='metric')
    return link_headings.head(1)['linkPVID']

### Putting it all together

In [9]:
# time code
t0 = time.time()

# sample only first sample_size to make computation faster
sample_size = 1000
# sample_size = len(probe_data) # for all data

# add road link
probe_data['linkPVID'] = 0

# parallelizable function
def link_road_parallel(indicies):
    """
    Links road to probe for set of indicies
    
    Input:
        indicies (list of floats): indicies to find nearest link for
    """
    output = [(0, 0) for x in range(indicies[1] - indicies[0])]
    n = 3
    counter = 0
    for row in probe_data[indicies[0]:indicies[1]].itertuples():
        output[counter] = (row.Index, closest_by_heading(nearest_n_segments(row.latitude, row.longitude, n),
                                                         row.heading))
        counter += 1
    
    return output

# run in parallel
N_CORES = mp.cpu_count()
C_SIZE = math.ceil(sample_size / N_CORES)

pool = mp.Pool(N_CORES)
r = pool.map(link_road_parallel, [[(C_SIZE * i), ((i + 1) * C_SIZE)] for i in range(N_CORES)])
linkings = list(itertools.chain.from_iterable(r))

# assign values to probe_data
stacked_values = np.dstack(linkings)[0]
probe_data.loc[stacked_values[0], 'linkPVID'] = stacked_values[1]
        
# finish timing
t1 = time.time()
print(str((t1 - t0) / 60) + ' minutes for ' + str(sample_size) + ' data points using ' + str(N_CORES) + ' CPU threads.')

0.08295915126800538 minutes for 1000 data points using 8 CPU threads.


In [14]:
# save out sample data
if sample_size < len(probe_data):
    probe_data[0:sample_size].to_csv('./trajectory_linked_data_sample.csv', index=False)

### Create output file
The output file has the following columns (columns in **bold** are pulled from the LinkData csv):
- sampleID: a unique identifier for the set of probe points that were collected from a particular phone.
- dateTime: date and time that the probe point was collected.
- sourceCode: a unique identifier for the data supplier (13 = Nokia).
- latitude: latitude in decimal degrees.
- longitude: longitude in decimal degrees.
- altitude: altitude in meters.
- speed: speed in KPH.
- heading: heading in degrees.
- linkPVID: published versioned identifier for the link.
- **directionOfTravel**: direction the vehicle was travelling on thelink (F = from ref node, T = towards ref node).
- **distFromRef**: distance from the reference node to the map-matched probe point location on the link in decimal meters.
- **distFromLink**: perpendicular distance from the map-matched probe point location on the link to the probe point in decimal meters.

In [11]:
# remove id column
try:
    del probe_data['id']
except KeyError:
    pass

# add direction and shape array columns from link_data to probe_data
probe_data = probe_data.merge(link_data[['linkPVID', 'directionOfTravel', 'shapeArray']], how='left', on=['linkPVID'])

In [14]:
# add dist from ref and link
probe_data['distFromRef'] = math.nan
probe_data['distFromLink'] = math.nan

for row in probe_data.itertuples(): 
    if type(row.shapeArray) is list:
        probe_point = FRAME.GeoPoint(float(row.latitude), float(row.longitude), degrees=True)

        link_refFrame = FRAME.GeoPoint(row.shapeArray[0][0], row.shapeArray[0][1], degrees=True)
        link_nrefFrame = FRAME.GeoPoint(row.shapeArray[-1][0], row.shapeArray[-1][1], degrees=True)
        
        probe_data.loc[row.Index, 'distFromRef'] = dist.dist_to_ref(probe_point, link_refFrame)
        probe_data.loc[row.Index, 'distFromLink'] = dist.dist_to_link(probe_point, link_refFrame, link_nrefFrame)

In [15]:
# remove unnecessary columns
probe_data = probe_data.drop(['shapeArray'], axis=1)
probe_data.head()

Unnamed: 0,sampleID,dateTime,sourceCode,latitude,longitude,altitude,speed,heading,linkPVID,directionOfTravel,distFromRef,distFromLink
0,3496,2009-06-12 06:12:49,13,51.496868,9.386022,200,23,339,62007637,B,35.124972,4.855292
1,3496,2009-06-12 06:12:54,13,51.496682,9.386157,200,10,129,62007637,B,12.429214,1.654407
2,3496,2009-06-12 06:12:59,13,51.496705,9.386422,201,21,60,567329767,B,19.238898,8.222105
3,3496,2009-06-12 06:13:04,13,51.496749,9.38684,201,0,360,62007637,B,46.237135,45.808419
4,3496,2009-06-12 06:13:09,13,51.496864,9.387294,199,0,360,62007637,B,80.145068,79.549972


In [16]:
# save out file
probe_data.to_csv('./trajectory_linked_data.csv', index=False)

### Plot data
Now, we plot both the probe data with its associated links

In [3]:
heading_linked_data = pd.read_csv('./heading_match.csv')
heading_linked_data.head()

Unnamed: 0,sampleID,dateTime,latitude,longitude,altitude,heading,sourceCode,speed,directionOfTravel,distFromLink,distFromRef,linkPVID
0,3496,2009-06-12 06:12:49,51.496868,9.386022,200,339,13,23,,,,0
1,3496,2009-06-12 06:12:54,51.496682,9.386157,200,129,13,10,B,1.654407,12.429214,62007637
2,3496,2009-06-12 06:12:59,51.496705,9.386422,201,60,13,21,B,8.222107,19.238898,567329767
3,3496,2009-06-12 06:13:04,51.496749,9.38684,201,360,13,0,B,45.808419,46.237135,62007637
4,3496,2009-06-12 06:13:09,51.496864,9.387294,199,360,13,0,B,79.549972,80.145068,62007637


In [17]:
def make_map_plot(method, sample_id, gmaps_api_key, data):
    probe_plot_data = data[(data['linkPVID'] != 0) & (data['sampleID'] == sample_id)]

    # create map object centered at mean lat, long
    gmap = gmplot.GoogleMapPlotter(np.mean(probe_plot_data['latitude']), np.mean(probe_plot_data['longitude']), 16)

    # plot data with color-coded probes and links
    unique_links = probe_plot_data['linkPVID'].unique()
    colors = list(gmap.color_dict.keys())[0:-1]
    color_index = 0

    for i in unique_links:
        # setup variables
        current_color = colors[color_index]
        probe_lats = probe_plot_data[probe_plot_data['linkPVID'] == i]['latitude']
        probe_longs = probe_plot_data[probe_plot_data['linkPVID'] == i]['longitude']

        link_lats = [x[0] for x in list(link_data[link_data['linkPVID'] == i]['shapeArray'])[0]]
        link_longs = [x[1] for x in list(link_data[link_data['linkPVID'] == i]['shapeArray'])[0]]
        
        gmap.scatter(probe_lats, probe_longs, marker=False, color=current_color, s=5)
        gmap.plot(link_lats, link_longs, color=current_color, edge_width=10, alpha=0.25)

        color_index = (color_index + 1) % len(colors)
        print('Link Segment: ' + str(i) + ', Color: ' + str(current_color))
    
    # print out file
    if not os.path.exists('./graphs'):
        os.makedirs('./graphs')
    file_name = './graphs/' + method + '_' + str(sample_id) + '.html'
    gmap.draw(file_name)

    def insertapikey(fname, apikey):
        """put the google api key in a html file"""
        def putkey(htmltxt, apikey, apistring=None):
            """put the apikey in the htmltxt and return soup"""
            if not apistring:
                apistring = 'https://maps.googleapis.com/maps/api/js?key=%s&callback=initMap'
            soup = BeautifulSoup(htmltxt, 'html.parser')
            body = soup.body
            src = apistring % (apikey, )
            tscript = soup.new_tag('script', src=src, async='defer')
            body.insert(-1, tscript)
            return soup
        htmltxt = open(fname, 'r').read()
        soup = putkey(htmltxt, apikey)
        newtxt = soup.prettify()
        open(fname, 'w').write(newtxt)

    insertapikey(file_name, gmaps_api_key)
    return IFrame(file_name, width=985, height=700)

In [33]:
# select data to plot
sample_id = 3496
make_map_plot('simple-trajectory', sample_id, GOOGLE_MAPS_KEY, heading_linked_data)

Link Segment: 62007637, Color: b
Link Segment: 567329767, Color: g
Link Segment: 62007648, Color: r
Link Segment: 78670326, Color: c


In [34]:
# select data to plot
sample_id = 5840302
make_map_plot('simple-trajectory', sample_id, GOOGLE_MAPS_KEY, heading_linked_data)

Link Segment: 79685530, Color: b
Link Segment: 79685644, Color: g
Link Segment: 540652103, Color: r
Link Segment: 586504921, Color: c
Link Segment: 540652102, Color: m
Link Segment: 51796317, Color: y
Link Segment: 540652572, Color: k
Link Segment: 79926343, Color: b
Link Segment: 79926342, Color: g
Link Segment: 540650979, Color: r
Link Segment: 79687459, Color: c
Link Segment: 79687447, Color: m


In [35]:
# select data to plot
sample_id = 778178
make_map_plot('simple-trajectory', sample_id, GOOGLE_MAPS_KEY, heading_linked_data)

Link Segment: 67942589, Color: b
Link Segment: 51900968, Color: g
Link Segment: 67942583, Color: r
Link Segment: 67942584, Color: c
Link Segment: 67914585, Color: m
Link Segment: 554724701, Color: y
Link Segment: 586484911, Color: k
Link Segment: 763541355, Color: b
Link Segment: 763541354, Color: g
Link Segment: 554724815, Color: r
Link Segment: 572196708, Color: c
Link Segment: 572196707, Color: m
Link Segment: 781679462, Color: y
Link Segment: 572216149, Color: k
Link Segment: 781679461, Color: b
Link Segment: 572196723, Color: g
Link Segment: 572216112, Color: r
Link Segment: 781679460, Color: c
Link Segment: 781679447, Color: m
Link Segment: 781679448, Color: y
Link Segment: 51901569, Color: k
Link Segment: 51901557, Color: b
Link Segment: 51931832, Color: g
Link Segment: 51931828, Color: r
Link Segment: 781679456, Color: c
Link Segment: 586503844, Color: m
Link Segment: 51901450, Color: y
Link Segment: 51901342, Color: k
Link Segment: 586503847, Color: b
Link Segment: 586503848, 