# New Intersection Activation Dates

In lieu of having the activation date in the Miovision API, we've traditionally manually pulled data from API until we honed in on the day when data was first available. This notebook partly automates this process. It first determines which intersections now available from the [Miovision API](https://docs.api.miovision.com/#!/Intersections/get_intersections) need to be added to `miovision_api.intersections`. Given a user-defined range of dates to analyze, it then sequentially pulls from the API until it obtains the first date when data was available.

The user should manually validate the script's results using the Miovision API. The task of including the `px`, appending a geometry, and uploading to `miovision_api.intersections` remains manual, however.

In [2]:
import configparser
import pathlib

from requests import Session
import pandas as pd
import psycopg2
import datetime
import numpy as np

In [4]:
# Get api key from airflow variable.
config = configparser.ConfigParser()
config.read('/etc/airflow/data_scripts/volumes/miovision/api/config.cfg')
api_key=config['API']
miov_token=api_key['key']

In [5]:
session = Session()
session.proxies = {}

headers = {'Content-Type': 'application/json',
           'Authorization': miov_token}

## New Intersections Table

In [7]:
# Get intersections from Miovision API.
response = session.get("https://api.miovision.com/intersections/",
                       params={}, headers=headers, proxies=session.proxies)
df_api = pd.DataFrame(response.json())
df_api = df_api[['id', 'name']].copy()
df_api.columns = ['id', 'intersection_name']

# Get intersections currently stored in `miovision_api` on Postgres.
dbset = config['DBSETTINGS']
with psycopg2.connect(**dbset) as conn:
    df_pg = pd.read_sql("SELECT * FROM miovision_api.intersections", con=conn)

# Join the two tables, and select intersections in the API and not in Postgres.
df_intersections = pd.merge(df_pg, df_api[['id', 'intersection_name']], how='outer',
                            left_on='id', right_on='id', suffixes=('', '_api'))
df_newints = df_intersections.loc[df_intersections['intersection_uid'].isna(), ['id', 'intersection_name_api']]
df_newints.index += 1

`df_newints` is a table of the new intersections to be added.

In [8]:
df_newints

Unnamed: 0,id,intersection_name_api
69,ff494e5c-628e-4d83-9cc3-13af52dbb88f,Bathurst Street and Fort York Boulevard
70,78769d41-765a-4dbf-ae32-6908a2e04d52,College Street and Bathurst Street
71,c259139e-a19e-42d5-9327-791693c1000e,Lake Shore Boulevard West and Bathurst Street
72,15524515-c5ab-4e02-b99b-52611c3fed9d,Pearl Street and University Avenue
73,768710a4-9177-40cf-a2e8-ba403b4dadc4,Testing Lab


## Find First Full Day of Data

The user must provide a `'test_daterange_start'` and a `'test_daterange_end'` column to `df_newints` as the start and inclusive end date, respectively, of the range of dates to search for the activation date. Different rows (i.e. intersections) can have different values.

Brent provided us with a list of configuration dates for the new intersections, which correspond to the day that SmartSense configuration is complete and the location starts reporting data. Searching in the vicinity of these dates:

In [23]:
df_newints['test_daterange_start'] = '2021-06-16'
df_newints['test_daterange_end'] = '2021-06-17'

df_newints.loc[59, 'test_daterange_start'] = '2020-12-20'
df_newints.loc[59, 'test_daterange_end'] = '2020-12-23'
df_newints.loc[60, 'test_daterange_start'] = '2021-05-31'
df_newints.loc[60, 'test_daterange_end'] = '2021-06-03'
df_newints.loc[61, 'test_daterange_start'] = '2021-05-12'
df_newints.loc[61, 'test_daterange_end'] = '2021-05-15'
df_newints.loc[62, 'test_daterange_start'] = '2021-06-06'
df_newints.loc[62, 'test_daterange_end'] = '2021-06-09'
df_newints.loc[63, 'test_daterange_start'] = '2021-06-07'
df_newints.loc[63, 'test_daterange_end'] = '2021-06-10'
df_newints.loc[66, 'test_daterange_start'] = '2021-05-12'
df_newints.loc[66, 'test_daterange_end'] = '2021-05-15'
df_newints.loc[68, 'test_daterange_start'] = '2021-05-12'
df_newints.loc[68, 'test_daterange_end'] = '2021-05-15'

In [24]:
def get_response_length(intersection_id, params):
    response = session.get(("https://api.miovision.com/intersections/{int_id}/tmc"
                            .format(int_id=intersection_id)),
                           params=params, headers=headers, proxies=session.proxies)

    if response.status_code != 200:
        return -1
    return len(response.json())


def get_first_data_date(intersection_id, start_time, end_time, max_retries=3):

    # Generate a sequence of dates.
    for ctime in pd.date_range(
            start_time, end=end_time, freq='D').to_pydatetime():

        params = {'endTime': ctime + datetime.timedelta(minutes=15),
                  'startTime': ctime}
        
        # The API throws an error when we query same day data.
        if ctime.date() >= datetime.date.today():
            return np.nan

        # For each date, try downloading 00:00 - 00:15 data (maximum of max_retries
        # times in case we hit HTTP errors).
        for i in range(max_retries):
            response_length = get_response_length(intersection_id, params)
            if response_length >= 0:
                break

        # If we keep getting other HTTP codes, throw an error.
        if response_length < 0:
            raise ValueError('keep getting HTTP errors from session!')

        # It's highly unlikely the first timestamp of available data is from midnight to 12:10 AM,
        # so set the actual activation date to the day before ctime.
        if response_length > 0:
            return ctime - datetime.timedelta(days=1)
    
    return np.nan

In [25]:
first_date_of_data = []

for i, row in df_newints.iterrows():
    first_date_of_data.append(
        get_first_data_date(
            row['id'],
            row['test_daterange_start'],
            row['test_daterange_end'],
            max_retries=3))

df_newints['activation_date'] = first_date_of_data

`df_newints` now contains the activation dates of the intersections. A `NaT` indicates that no start date was found. If the start date occurred before `'test_daterange_start'`, `'activation_date'` will be set to `'test_daterange_start'`.

In [26]:
df_newints

Unnamed: 0,id,intersection_name_api,test_daterange_start,test_daterange_end,activation_date
58,dbf09553-c593-4bb2-90e5-7eb3bc7ebe08,Bayview Avenue and River Street,2021-06-16,2021-06-17,NaT
59,fe26f2d1-41db-4079-9764-7405a4b189f2,Bloor Street West and Dufferin Street,2020-12-20,2020-12-23,2020-12-22
60,67492dd1-ef98-4a9c-ac05-b2605b6a398d,Eglinton Avenue West and Jane Street,2021-05-31,2021-06-03,2021-06-02
61,335c7cef-eb56-439f-907c-60189abffed3,Harbord Street/St. George Street/Hoskin Street,2021-05-12,2021-05-15,2021-05-14
62,0d9c7765-f74d-42c7-907e-1f483db51f56,Jane Street and Lawrence Avenue West,2021-06-06,2021-06-09,2021-06-08
63,2d1091bd-09c4-4d48-be84-153565290c88,Jane Street and Wilson Avenue,2021-06-07,2021-06-10,2021-06-09
64,35425467-0e8d-4fe7-b35d-6ccc9b71b0cf,Sheppard Avenue West and Jane Street,2021-06-16,2021-06-17,NaT
65,11dcfdc5-2b37-45c0-ac79-3d6926553582,Sheppard Avenue West and Keele Street,2021-06-16,2021-06-17,NaT
66,1b9bad75-36d1-4886-b649-4d928bded1a7,Sheppard Avenue West and Weston Road,2021-05-12,2021-05-15,2021-05-14
67,9ed9e7f3-9edc-4f58-ae5b-8c9add746886,Steeles Avenue West and Jane Street,2021-06-16,2021-06-17,NaT
