# FUNCTIONS FOR DIFFERENT METRICS <a name="Top"></a>

Levels of metrics:
1. Difference between latitude and longitude values (Zindi challenge)
2. Air distance in kilometers
3. Road distance in kilometers
4. Driving distance in minutes
5. Evaluation of driving distance with threshold ("Golden Hour")

In this notebook we will use the Uber movement data to find distances between hexbins. To map the Uber movement data, Open Street Map data is used. For more on the Uber data read:
https://medium.com/uber-movement/working-with-uber-movement-speeds-data-cc01d35937b3
More on Open Street Map:
https://wiki.openstreetmap.org/wiki/Downloading_data


## Table of contents
***
[Imports and setup](#Imports_setup)<br>
[Extract, transform and load the data](#ETL)<br>
[Data cleaning](#Data_cleaning)<br>
[Data analysis](#Data_analysis)<br>

</br>
</br>
</br>

## Imports and setup <a name="Imports_setup"></a>
***
### Importing packages

In [21]:
import pandas as pd
import h3
from geopy.distance import geodesic

import sys  
sys.path.insert(0, '../Scripts')
import capstone_functions as cf

### Setup

In [31]:
pd.set_option('display.max_columns', None)
#pd.set_option('display.max_rows', None)
pd.options.display.float_format = '{:,.3f}'.format

### Functions

In [27]:
def get_distance(lat_a, long_a, lat_b, long_b):
    '''
    Returns the distance between two pairs of coordinates in km.
    '''
    
    return geodesic((lat_a, long_a), (lat_b, long_b)).km

[Back to top](#Top)<br>

</br>
</br>
</br>

## Extract, transform and load the data <a name="ETL"></a>
***

In [3]:
df_accidents = pd.read_csv('../Inputs/Train.csv', parse_dates=['datetime'])
print(df_accidents.shape)
df_accidents.head()

(6318, 4)


Unnamed: 0,uid,datetime,latitude,longitude
0,1,2018-01-01 00:25:46,-1.189,36.931
1,2,2018-01-01 02:02:39,-0.663,37.209
2,3,2018-01-01 02:31:49,-0.663,37.209
3,4,2018-01-01 03:04:01,-1.288,36.827
4,5,2018-01-01 03:58:49,-1.189,36.931


In [14]:
osm = [line[:-1] for line in open('../Inputs/map')]

[Back to top](#Top)<br>

</br>
</br>
</br>

## Data cleaning <a name="Data_cleaning"></a>
***

[Back to top](#Top)<br>

</br>
</br>
</br>

## Data analysis <a name="Data_analysis"></a>
***

In [34]:
df_accidents = cf.assign_hex_bin(df_accidents, 'latitude', 'longitude')
df_accidents.head()

Unnamed: 0,uid,datetime,latitude,longitude,h3_zone_5,h3_zone_6,h3_zone_7
0,1,2018-01-01 00:25:46,-1.189,36.931,857a6e43fffffff,867a6e417ffffff,877a6e416ffffff
1,2,2018-01-01 02:02:39,-0.663,37.209,857a4513fffffff,867a45107ffffff,877a45102ffffff
2,3,2018-01-01 02:31:49,-0.663,37.209,857a4513fffffff,867a45107ffffff,877a45102ffffff
3,4,2018-01-01 03:04:01,-1.288,36.827,857a6e43fffffff,867a6e42fffffff,877a6e42cffffff
4,5,2018-01-01 03:58:49,-1.189,36.931,857a6e43fffffff,867a6e417ffffff,877a6e416ffffff


In [33]:
coords_1 = (-1.189, 36.931)
coords_2 = (-0.663, 37.209)

get_distance(-1.189, 36.931, -0.663, 37.209)

65.88091474890801

[Back to top](#Top)<br>