# Probe Data - Map Matching
## Nick Paras | Kapil Garg

### Assignment 2

Input: Probe data and map [probe_data_map_matching.rar](https://canvas.northwestern.edu/courses/51440/files/3334329/download?wrap=1)

-The raw probe points in Germany collected in 9 months

-The link data for the links that probe points can be map-matched to.

Tasks:
-- map match probe points to road links

-- derive road slope for each road link

-- evaluate the derived road slope with the surveyed road slope in the link data file

**Please submit your code and slides presentation of your approach and results including evaluation comparing with the slopes in the link data file**

### Setup

We use **Python 3.6** and rely on the dependencies:
* numpy
* scikit-learn
* matplotlib
* pandas

We also use Jupyter Notebooks for our code and reports. For quick setup, please create a conda environment with the following:

    $ conda create --name probe-data pandas matplotlib numpy scikit-learn

and then activate the conda environment with

    $ source activate probe-data


In [3]:
# Imports
import os
import math
import csv
import operator
import multiprocessing as mp
import itertools

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from datetime import datetime
from haversine import haversine
from functools import reduce

import link_classes as lc

%matplotlib inline

# Constants
DATA_DIR = '../data'

In [4]:
# Utility functions
def bearing(start, end):
    """
    Computes the bearing in degrees between two geopoints
    
    Inputs:
        start (tuple of lat, long): starting geolocation
        end (tuple of lat, long): ending geolocation
    
    Outputs:
        (float): bearing in degrees between start and end
    """
    phi_1 = math.radians(start[0])
    phi_2 = math.radians(end[0])
    lambda_1 = math.radians(start[1])
    lambda_2 = math.radians(end[1])
    
    x = math.cos(phi_2) * math.sin(lambda_2 - lambda_1)
    y = math.cos(phi_1) * math.sin(phi_2) - (math.sin(phi_1) * math.cos(phi_2) * math.cos(lambda_2 - lambda_1))

    return (math.degrees(math.atan2(x, y)) + 360) % 360

## Loading Probe Data for Map Matching

Here we'll load our data from the two csv's into Pandas DataFrames.

In [5]:
probe_headers = ['sampleID', 
                 'dateTime', 
                 'sourceCode', 
                 'latitude', 
                 'longitude', 
                 'altitude', 
                 'speed', 
                 'heading']

probe_data = pd.read_csv(os.path.join(DATA_DIR, 'Partition6467ProbePoints.csv'), header=None, names=probe_headers)
# probe_data['dateTime'] = pd.to_datetime(probe_data['dateTime'])
probe_data.head()

Unnamed: 0,sampleID,dateTime,sourceCode,latitude,longitude,altitude,speed,heading
0,3496,6/12/2009 6:12:49 AM,13,51.496868,9.386022,200,23,339
1,3496,6/12/2009 6:12:54 AM,13,51.496682,9.386157,200,10,129
2,3496,6/12/2009 6:12:59 AM,13,51.496705,9.386422,201,21,60
3,3496,6/12/2009 6:13:04 AM,13,51.496749,9.38684,201,0,360
4,3496,6/12/2009 6:13:09 AM,13,51.496864,9.387294,199,0,360


In [6]:
link_headers = ['linkPVID', 
                'refNodeID', 
                'nrefNodeID', 
                'length', 
                'functionalClass', 
                'directionOfTravel', 
                'speedCategory', 
                'fromRefSpeedLimit', 
                'toRefSpeedLimit', 
                'fromRefNumLanes', 
                'toRefNumLanes', 
                'multiDigitized', 
                'urban', 
                'timeZone', 
                'shapeInfo', 
                'curvatureInfo', 
                'slopeInfo']

# load raw link data
link_data = pd.read_csv(os.path.join(DATA_DIR, 'Partition6467LinkData.csv'), header=None, names=link_headers)
link_data.head()

Unnamed: 0,linkPVID,refNodeID,nrefNodeID,length,functionalClass,directionOfTravel,speedCategory,fromRefSpeedLimit,toRefSpeedLimit,fromRefNumLanes,toRefNumLanes,multiDigitized,urban,timeZone,shapeInfo,curvatureInfo,slopeInfo
0,62007637,162844982,162809070,335.04,5,B,7,30,30,0,0,F,T,0.0,51.4965800/9.3862299/|51.4994700/9.3848799/,,
1,567329767,162844982,162981512,134.56,5,B,7,0,0,0,0,F,T,0.0,51.4965800/9.3862299/|51.4966899/9.3867100/|51...,,
2,62007648,162877732,162844982,97.01,5,B,7,30,30,0,0,F,T,0.0,51.4962899/9.3849100/|51.4965800/9.3862299/,,
3,78670326,162877732,163152693,314.84,5,B,7,30,30,0,0,F,T,0.0,51.4962899/9.3849100/|51.4990000/9.3836099/,,
4,51881672,174713859,174587951,110.17,3,B,6,50,50,2,2,F,T,0.0,53.0643099/8.7903400/45.79|53.0650299/8.791470...,,0.00/-0.090|110.17/0.062


In [7]:
# create link data lookup dictionary
links = []
link_db = lc.LinkDatabase()
with open(os.path.join(DATA_DIR, 'Partition6467LinkData.csv'), 'r') as csvfile:
    rdr = csv.DictReader(csvfile, delimiter=',', fieldnames=link_headers)
    for r in rdr:
        rl = lc.RoadLink(r)
        links.append(rl)
        link_db.insert_link(rl)

## Trajectory-Based Road Matching
For this approach, we compute the absolute heading along each segment of road as a metric of the road shape. Then for each probe data point, we find the nearest set of road segments and compare the heading for the segment to the road segments to find the closest match. To address traveling the in the opposite direction of the road heading, we project all headings to be between 0$^{\circ}$-180$^{\circ}$ so that all comparisons are done equally.

### Compute heading for road segments

In [118]:
# Parse out shapeInfo and make averageLocation and heading columns
link_data['shapeArray'] = link_data['shapeInfo'].apply(lambda x: [[float(j) for j in i.split('/')[:2]] for i in x.split('|')])
link_data['averageLocation'] = link_data['shapeArray'].apply(lambda x: reduce(np.add, x) / len(x))
link_data['heading'] = link_data['shapeArray'].apply(lambda x: (bearing(x[0], x[-1]) + 180) % 180)

### Find n-nearest road segments

In [59]:
def nearest_n_segments(lat, long, n):
    """
    Uses link_db to find nearest n road segments
    
    Inputs:
        lat (float): latitude of probe point
        lon (float): longitude of probe point
        n (int): number of roads road segments to return
        
    Output: 
        (list of tuples): (linkPVID, distance) of n-nearest road segments 
        
    """
    # find nearest n links
    output = []
    try:
        link_search = [(x, haversine(x.refLatLon, (lat, long))) for x in link_db.get_links(lat, long)]
        link_search.sort(key=operator.itemgetter(1))
        link_search = link_search[0:n]

        # extract only link PVIDs from search
        output = [(int(x[0].linkPVID), x[1]) for x in link_search]
    except KeyError:
        pass
    
    return output

### Find closest segment by heading

In [60]:
def closest_by_heading(road_links, probe_heading):
    """
    Returns link with closest heading for given probe_heading
    
    Inputs:
        road_links (list of tuples): list of nearest (linkPVID, distance) tuples
        probe_heading (float): heading of probe gps point
    
    Outputs:
        (float): linkPVID for closest link
    """
    if len(road_links) == 0:
        return -1
    # get relevant links
    pvids = [x[0] for x in road_links]
    distances = [x[1] for x in road_links]
    link_headings = link_data[link_data['linkPVID'].isin(pvids)]
    
    # compute metric from distance and difference in angle
    angle_diff = np.abs(link_headings['heading'] - probe_heading)
    metrics = list(zip(distances * angle_diff, link_headings['linkPVID']))
    
    # pick one with lowest metric and return its linkPVID
    metrics.sort(key=operator.itemgetter(0))
    return metrics[0][1]

### Putting it all together

In [110]:
# time code
import time
t0 = time.time()

# sample only first sample_size to make computation faster
sample_size = 1000

# rescale all probe headings to be between 0-180
probe_data['heading'] = (probe_data['heading'] + 180) % 180 

# add road link
probe_data['linkPVID'] = 0

# parallelizable function
def link_road_parallel(indicies):
    """
    Links road to probe for set of indicies
    
    Input:
        indicies (list of floats): indicies to find nearest link for
    """
    output = [(0, 0) for x in range(indicies[1] - indicies[0])]
    counter = 0
    for row in probe_data[indicies[0]:indicies[1]].itertuples():
        output[counter] = (row.Index, closest_by_heading(nearest_n_segments(row.latitude, row.longitude, 3), row.heading))
        counter += 1
    
    return output

# run in parallel
N_CORES = mp.cpu_count()
C_SIZE = math.ceil(sample_size / N_CORES)

pool = mp.Pool(N_CORES)
r = pool.map(link_road_parallel, [[(C_SIZE * i), ((i + 1) * C_SIZE)] for i in range(N_CORES)])
linkings = list(itertools.chain.from_iterable(r))

# assign values to probe_data
stacked_values = np.dstack(linkings)[0]
probe_data.loc[stacked_values[0], 'linkPVID'] = stacked_values[1]
probe_data.head()
        
# finish timing
t1 = time.time()

In [191]:
# probe_data[0:sample_size].to_csv('./sample_linked_data.csv', index=False)

### Create output file
The output file has the following columns (columns in **bold** are pulled from the LinkData csv):
- sampleID: a unique identifier for the set of probe points that were collected from a particular phone.
- dateTime: date and time that the probe point was collected.
- sourceCode: a unique identifier for the data supplier (13 = Nokia).
- latitude: latitude in decimal degrees.
- longitude: longitude in decimal degrees.
- altitude: altitude in meters.
- speed: speed in KPH.
- heading: heading in degrees.
- linkPVID: published versioned identifier for the link.
- **direction**: direction the vehicle was travelling on thelink (F = from ref node, T = towards ref node).
- **distFromRef**: distance from the reference node to the map-matched probe point location on the link in decimal meters.
- **distFromLink**: perpendicular distance from the map-matched probe point location on the link to the probe point in decimal meters.

In [79]:
link_data[link_data['linkPVID'] == 567329767]['shapeArray']
link_lats = [x[0] for x in list(link_data[link_data['linkPVID'] == 567329767]['shapeArray'])[0]]
link_longs = [x[1] for x in list(link_data[link_data['linkPVID'] == 567329767]['shapeArray'])[0]]

In [80]:
print(list(link_data[link_data['linkPVID'] == 567329767]['shapeArray']))
print(link_lats)
print(link_longs)

[[[51.49658, 9.3862299], [51.4966899, 9.38671], [51.4968, 9.3873199], [51.49701, 9.3880399]]]
[51.49658, 51.4966899, 51.4968, 51.49701]
[9.3862299, 9.38671, 9.3873199, 9.3880399]


In [55]:
lats1 = [link_data['shapeArray'][0][0][0], link_data['shapeArray'][0][1][0]]
longs1 = [link_data['shapeArray'][0][0][1], link_data['shapeArray'][0][1][1]]

print(lats1)
print(longs1)

[51.49658, 51.49947]
[9.3862299, 9.3848799]


In [85]:
probe_data[probe_data['sampleID'] == 3496]

Unnamed: 0,sampleID,dateTime,sourceCode,latitude,longitude,altitude,speed,heading,linkPVID
0,3496,6/12/2009 6:12:49 AM,13,51.496868,9.386022,200,23,159,62007637
1,3496,6/12/2009 6:12:54 AM,13,51.496682,9.386157,200,10,129,62007637
2,3496,6/12/2009 6:12:59 AM,13,51.496705,9.386422,201,21,60,567329767
3,3496,6/12/2009 6:13:04 AM,13,51.496749,9.386840,201,0,0,567329767
4,3496,6/12/2009 6:13:09 AM,13,51.496864,9.387294,199,0,0,567329767
5,3496,6/12/2009 6:13:15 AM,13,51.496930,9.387716,198,5,89,567329767
6,3496,6/12/2009 6:13:20 AM,13,51.496957,9.387794,198,1,108,567329767
7,3496,6/12/2009 6:13:25 AM,13,51.496952,9.387805,197,0,130,62007637
8,3496,6/12/2009 6:13:30 AM,13,51.496949,9.387818,196,0,94,567329767
9,3496,6/12/2009 6:13:35 AM,13,51.496944,9.387840,196,0,46,567329767


### Plot data
Now, we plot both the probe data with its associated links

In [11]:
import gmplot
from IPython.display import IFrame

In [117]:
# select data to plot
probe_plot_data = probe_data[(probe_data['linkPVID'] != 0) & (probe_data['sampleID'] == 3496)]

# create map object centered at mean lat, long
gmap = gmplot.GoogleMapPlotter(np.mean(probe_plot_data['latitude']), np.mean(probe_plot_data['longitude']), 16)

# plot data with color-coded probes and links
unique_links = probe_plot_data['linkPVID'].unique()
colors = ['b', 'g', 'r', 'c', 'm', 'y']
color_index = 0

for i in unique_links:
    # setup variables
    current_color = colors[color_index]
    probe_lats = probe_plot_data[probe_plot_data['linkPVID'] == i]['latitude']
    probe_longs = probe_plot_data[probe_plot_data['linkPVID'] == i]['longitude']
    
    link_lats = [x[0] for x in list(link_data[link_data['linkPVID'] == i]['shapeArray'])[0]]
    link_longs = [x[1] for x in list(link_data[link_data['linkPVID'] == i]['shapeArray'])[0]]
    
    gmap.scatter(probe_lats, probe_longs, marker=False, color=current_color, s=5)
    gmap.plot(link_lats, link_longs, color=current_color, edge_width=10, alpha=0.25)
    
    color_index += 1

gmap.draw('mymap.html')
IFrame('mymap.html', width=985, height=700)