## P2P Calibration 
put at the end of p2p in order to import transitmatrix

We use Graphhopper in order to test the p2p calibration. A Graphhopper API is required to run this process and it can be obtained here:
https://graphhopper.com/api/1/docs/FAQ/

Note: You can also use a GoogleMaps API.


Show the mean and stddev of the difference between p2p's route time
and GraphHopper's route time, in seconds.

IMPORTANT: To use this, must have a valid GraphHopper Matrix API key
saved in a text file in this directory called GRAPHHOPPER_API_KEY.txt

Positive differences indicate p2p's route was longer, negative times indicates
that p2p's route was shorter.

In [8]:
import pandas as pd

In [2]:
# P2P Calibration 
#need this because p2p and graphhopper use synonyms for 
#vehicle/route types
p2p_to_graphhopper_type_names = {
    'drive' : 'car',
    'walk' : 'foot',
    'bike' : 'bike'
}

def sample_one_matrix(df, tm, network_type, api_key):
    '''
    Test one ditance matrix
    '''

    base_url = "https://graphhopper.com/api/1/matrix"
    first = True
    for data in df.itertuples():
        x_data = data[4]
        y_data = data[5]
        if first:
            point_string = "?point={},{}".format(x_data, y_data)
            first = False
        else:
            point_string = "&point={},{}".format(x_data, y_data)

        base_url += point_string

    param_string = "&type=json&vehicle={}&debug=true&out_array=times&key={}".format(network_type, api_key)
    base_url += param_string


    try:
        r = requests.get(base_url)

        results = r.json()['times']
    except:
        print('there was a problem fetching from GraphHopper. Exiting...')
        sys.exit()


    already_checked = set()
    diffs = []
    for i, row in enumerate(df.index):
        for j, col in enumerate(df.index):
            if (row, col) not in already_checked and row != col:
                calculated_time = tm.get(row, col)
                actual_time = results[i][j]
                diff = calculated_time - actual_time
                diffs.append(diff)
                already_checked.add((row, col))
                already_checked.add((col, row))

    stddev = np.std(diffs)
    mean = np.mean(diffs)


    print('diffs mean: {}, stddev: {}'.format(mean, stddev))


def calibrate(network_type='walk', input_file='resources/LEHD_blocks.csv', 
    sl_file='resources/condensed_street_data.csv', n=1):
    '''
    Show the mean and stddev of the difference between p2p's route time
    and GraphHopper's route time, in seconds.

    IMPORTANT: To use this, must have a valid GraphHopper Matrix API key
    saved in a text file in this directory called GRAPHHOPPER_API_KEY.txt

    Positive differences indicate p2p's route was longer, negative times indicates
    that p2p's route was shorter.
    '''
    if network_type == 'drive':
        assert sl_file is not None, 'must provide sl_file for use with driving network calibration'
    with open('GRAPHHOPPER_API_KEY.txt', 'r') as api_file:
        api_key = api_file.read()
        api_key = api_key.strip()
    gh_type_name = p2p_to_graphhopper_type_names[network_type]

    tm = TransitMatrix(network_type=network_type, primary_input=input_file)
    if network_type == 'drive':
        tm.process(speed_limit_filename=sl_file)
    else:
        tm.process()

    #extract the column names
    xcol = ''
    ycol = ''
    idx = ''

    df = pd.read_csv(input_file)

    print('The variables in your data set are:')
    df_cols = df.columns.values
    for var in df_cols:
        print('> ',var)
    while xcol not in df_cols:
        xcol = input('Enter the x coordinate (Latitude): ')
    while ycol not in df_cols:
        ycol = input('Enter the y coordinate (Longitude): ')
    while idx not in df_cols:
        idx = input('Enter the index name: ')

    df.rename(columns={xcol:'x',ycol:'y', idx:'idx'},inplace=True)
    df.set_index('idx', inplace=True)

    for i in range(n):

        sample_one_matrix(df.sample(24), tm, gh_type_name, api_key)
    

In [None]:
calibrate(network_type='walk',
          input_file='data/ORIG/tracts2010.csv',
          sl_file='data/DEST/health_chicago.csv',
          n=1)

## Epsilon Calibration

<a id='scoremodel'></a>

In [9]:
# Load the travel time distance matrix and assess dimensions:  
df = pd.read_csv('scripts/data/matrices/walk_asym_health_tracts.csv')
df.shape

(801, 200)

In [10]:
#Identify percentage of values outside epsilon:
p_eps=((df.groupby('1').count()).iloc[0][0])/len(df)

In [11]:
print ("Of the total matrix ","{0:.2f}%".format(p_eps* 100),"of the values are outside the bounding box. If the value is below 1% it seems epsilon is appropriate for this particular dataset.")

Of the total matrix  0.25% of the values are outside the bounding box. If the value is below 1% it seems epsilon is appropriate for this particular dataset.
