# FUNCTIONS FOR DIFFERENT METRICS <a name="Top"></a>

Levels of metrics:
1. Difference between latitude and longitude values (Zindi challenge)
2. Air distance in kilometers
3. Road distance in kilometers
4. Driving distance in minutes
5. Evaluation of driving distance with threshold ("Golden Hour")

In this notebook we will use the Uber movement data to find distances between hexbins. To map the Uber movement data, Open Street Map data is used. For more on the Uber data read:
https://medium.com/uber-movement/working-with-uber-movement-speeds-data-cc01d35937b3
More on Open Street Map:
https://wiki.openstreetmap.org/wiki/Downloading_data


## Table of contents
***
[Imports and setup](#Imports_setup)<br>
[Extract, transform and load the data](#ETL)<br>
[Data cleaning](#Data_cleaning)<br>
[Data analysis](#Data_analysis)<br>

</br>
</br>
</br>

## Imports and setup <a name="Imports_setup"></a>
***
### Importing packages

In [21]:
import pandas as pd
import h3
from geopy.distance import geodesic

import sys  
sys.path.insert(0, '../Scripts')
import capstone_functions as cf

### Setup

In [81]:
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.options.display.float_format = '{:,.3f}'.format

### Functions

In [96]:
def get_metrics(coord_a, coord_b):
    '''
    Returns a list with the five levels of metics:
    Metric / index 0: Euklidean distance between latitude and longitude values (Zindi challenge)
    Metric / index 1: Air distance in kilometers
    Metric / index 2: Road distance in kilometers
    Metric / index 3: Driving distance in minutes
    Metric / index 4: Binary value: 1 if driving distance below threshold ("Golden Hour"), 0 if above
    '''
    
    if type(coord_a) == str:
        lat_a = h3.h3_to_geo(coord_a)[0]
        long_a = h3.h3_to_geo(coord_a)[1]
    elif type(coord_a) == list or tuple:
        lat_a = coord_a[0]
        long_a = coord_a[1]
    
    if type(coord_b) == str:
        lat_b = h3.h3_to_geo(coord_b)[0]
        long_b = h3.h3_to_geo(coord_b)[1]
    elif type(coord_b) == list or tuple:
        lat_b = coord_b[0]
        long_b = coord_b[1]
    
        
    metric = []
    metric.append(get_distance_zindi(lat_a, long_a, lat_b, long_b))
    metric.append(get_distance_air(lat_a, long_a, lat_b, long_b))
    metric.append(get_distance_road(lat_a, long_a, lat_b, long_b))
    metric.append(get_distance_time(lat_a, long_a, lat_b, long_b))
    metric.append(get_distance_golden(lat_a, long_a, lat_b, long_b))
    
    return metric



def get_distance_zindi(lat_a, long_a, lat_b, long_b):
    '''
    Returns the Euklidean distance between latitude and longitude values like in the Zindi-score.
    '''
    
    return ((lat_a - lat_b)**2 + (long_a - long_b)**2) ** 0.5



def get_distance_air(lat_a, long_a, lat_b, long_b):
    '''
    Returns the Euklidean distance between two pairs of coordinates in km.
    '''
    
    return geodesic((lat_a, long_a), (lat_b, long_b)).km



def get_distance_road(lat_a, long_a, lat_b, long_b):
    '''
    Returns the road distance between two pairs of coordinates in km.
    '''
    
    detour_coef = 1.3   # Known as Henning- or Hanno-coefficient
    
    return get_distance_air(lat_a, long_a, lat_b, long_b) * detour_coef



def get_distance_time(lat_a, long_a, lat_b, long_b):
    '''
    Returns the time that is needed to cover the road distance between two pairs of coordinates in minutes.
    '''
    
    assumed_avg_speed = 40   # km per h
    
    return get_distance_road(lat_a, long_a, lat_b, long_b) / assumed_avg_speed * 60



def get_distance_golden(lat_a, long_a, lat_b, long_b):
    '''
    Returns False if the time distance is lower than the "golden-hour" threshold and True if the ambulance took too long.
    '''
    
    golden_hour = 60   # It is still an hour, after all
    
    return get_distance_time(lat_a, long_a, lat_b, long_b) > golden_hour

[Back to top](#Top)<br>

</br>
</br>
</br>

## Extract, transform and load the data <a name="ETL"></a>
***

In [3]:
df_accidents = pd.read_csv('../Inputs/Train.csv', parse_dates=['datetime'])
print(df_accidents.shape)
df_accidents.head()

(6318, 4)


Unnamed: 0,uid,datetime,latitude,longitude
0,1,2018-01-01 00:25:46,-1.189,36.931
1,2,2018-01-01 02:02:39,-0.663,37.209
2,3,2018-01-01 02:31:49,-0.663,37.209
3,4,2018-01-01 03:04:01,-1.288,36.827
4,5,2018-01-01 03:58:49,-1.189,36.931


In [14]:
osm = [line[:-1] for line in open('../Inputs/map')]

[Back to top](#Top)<br>

</br>
</br>
</br>

## Data cleaning <a name="Data_cleaning"></a>
***

[Back to top](#Top)<br>

</br>
</br>
</br>

## Data analysis <a name="Data_analysis"></a>
***

In [34]:
df_accidents = cf.assign_hex_bin(df_accidents, 'latitude', 'longitude')
df_accidents.head()

Unnamed: 0,uid,datetime,latitude,longitude,h3_zone_5,h3_zone_6,h3_zone_7
0,1,2018-01-01 00:25:46,-1.189,36.931,857a6e43fffffff,867a6e417ffffff,877a6e416ffffff
1,2,2018-01-01 02:02:39,-0.663,37.209,857a4513fffffff,867a45107ffffff,877a45102ffffff
2,3,2018-01-01 02:31:49,-0.663,37.209,857a4513fffffff,867a45107ffffff,877a45102ffffff
3,4,2018-01-01 03:04:01,-1.288,36.827,857a6e43fffffff,867a6e42fffffff,877a6e42cffffff
4,5,2018-01-01 03:58:49,-1.189,36.931,857a6e43fffffff,867a6e417ffffff,877a6e416ffffff


In [97]:
get_metrics([-1.189, 36.931], [-0.663, 37.209])

[0.5949453756438513,
 65.88091474890801,
 85.64518917358042,
 128.46778376037062,
 True]

In [98]:
get_metrics([-1.189, 36.931], '857a4513fffffff')

[0.5650614344056945,
 62.574488542323074,
 81.34683510502,
 122.02025265753001,
 True]

In [100]:
get_metrics('857a6e43fffffff', '857a4513fffffff')

[0.6322775757134343,
 70.04323414016568,
 91.05620438221538,
 136.5843065733231,
 True]

[Back to top](#Top)<br>