# Argyle Insurance - Period Splicing

This notebook takes data from the [Argyle Activities API](https://argyle.com/docs/api-reference/activities) and delivers a breakdown of gig driver's time and distance across the following periods, for insurance purposes:

*   **P1** - Gig App is on and worker is waiting for request
*   **P2** - worker has accepted request and is going to the pickup location
*   **P3** - worker has picked up asset (food, human etc) and is going to the drop off location (P3 Ends when worker has dropped off the asset)

## Instructions:

Notebook can be customized in the next step, allowing the notebook user to limit API requests by Account ID. Once this is ready, click on the **Cell** dropdown in the toolbar above and select **Run All** to generate outputs.

Outputs will consist of two tables:

**Table 1** - By month-year table:

*   "Period 1 Time (next_request - dropoff)",
*   "Period 2 Time (pickup - request)",
*   "Period 3 Time (dropoff - pickup)",
*   "Period 1 Distance (Period 1 Time * Period 3 Velocity)",
*   "Period 2 Distance (Period 2 Time * Period 3 Velocity)",
*   "Period 3 Distance (end_location - start_location)",
*   "Period 3 Velocity (Period 3 Distance / Period 3 Time)"


**Table 2** - By activity table:

*   "Period 1 Time (next_request - dropoff)",
*   "Period 2 Time (pickup - request)",
*   "Period 3 Time (dropoff - pickup)",
*   "Period 1 Distance (Period 1 Time * Period 3 Velocity)",
*   "Period 2 Distance (Period 2 Time * Period 3 Velocity)",
*   "Period 3 Distance (end_location - start_location)",
*   "Period 3 Velocity (Period 3 Distance / Period 3 Time)"

Tables will be indexed on user-account.

In [1]:
# Plug in your Argyle credentials
CLIENT_ID = "YOUR_ARGYLE_CLIENT_ID"
CLIENT_SECRET = "YOUR_ARGYLE_CLIENT_SECRET"

# Difference between trips can be considered as P1 Time as long as it doesn't exceed P1_MAX_TIME
P1_MAX_TIME = 60 # minutes; 60 mins by default

# Limit the list of activities received from Argyle API
ACTIVITIES_LIMIT = 5000

# Filter activities by employer(s)
EMPLOYER_FILTER = ["uber", "lyft"]

In [2]:
import pandas as pd
import numpy as np
import requests
from datetime import datetime
import matplotlib.pyplot as plt

headers = {
    "Content-Type": "application/json",
}

params = {
    "limit": ACTIVITIES_LIMIT
}

API_PROD = "https://api.argyle.io/v1/activities"
API_SANDBOX = "https://api-sandbox.argyle.io/v1/activities"

def get_activity_data(account_id=None, use_sandbox=True):
    activity_api_url = API_SANDBOX if use_sandbox else API_PROD
    if account_id:
        params['account'] = account_id

    print(f'api url: {activity_api_url}')
    print(f'params: {params}')

    response = requests.get(
        activity_api_url,
        params=params,
        headers=headers,
        auth=(CLIENT_ID, CLIENT_SECRET))

    return response.json()

## Step 1 - Collecting and formatting data

**To limit by Account ID:**

Change the line below to:

<code>json_obj = get_activity_data(account_id="ENTER_ACCOUNT_ID_HERE", use_sandbox=True)</code>

**To use Production API:**

Change the line below to:

`
json_obj = get_activity_data(use_sandbox=False)
`

or


`
json_obj = get_activity_data(account_id="ENTER_ACCOUNT_ID_HERE", use_sandbox=False)
`
to limit by Account ID.

In [3]:
json_obj = get_activity_data(account_id="ENTER_ACCOUNT_ID_HERE", use_sandbox=False)

df = pd.json_normalize(json_obj['results'])

api url: https://api.argyle.io/v1/activities
params: {'limit': 5000, 'account': 'ENTER_ACCOUNT_ID_HERE'}


KeyError: 'results'

## Step 2 - Data Transformation

In this step the data will be cast to their respective types and processed to generate the data sliced by period. It will be aggregated 

In [None]:
data_types = {
    "id": str,
    "account": str,
    "distance": "float32",
    "employer": "category",
    "type": "category",
    "all_timestamps.dropoff_at": "datetime64[s]",
    "all_timestamps.pickup_at": "datetime64[s]",
    "all_timestamps.request_at": "datetime64[s]"
}

df_new = df[data_types.keys()].astype(data_types, errors='ignore')
df_new = df_new[df_new.employer.isin(EMPLOYER_FILTER)]

df_new.info()

In [None]:
SECS_IN_HOUR = 3600.0

df_new["month_year"] = df_new["all_timestamps.request_at"].dt.month.astype(int, errors='ignore').astype(str) + "/" + df_new["all_timestamps.request_at"].dt.year.astype(int, errors='ignore').astype(str)

df_new["p1_time"] = df_new["all_timestamps.request_at"].shift(1) - df_new["all_timestamps.dropoff_at"] # next_request - drop off
df_new["p1_time"] = df_new["p1_time"].dt.total_seconds() / SECS_IN_HOUR # Convert to hours
df_new.loc[df_new["p1_time"] >= P1_MAX_TIME / 60.0, "p1_time"] = 0  # Set P1 time to 0 if it's greater or equal than P1_MAX_TIME
df_new.loc[df_new["p1_time"] < 0, "p1_time"] = 0  # Set P1 time to 0 if it's negative (driver received a new request before drop off)

df_new["p2_time"] = df_new["all_timestamps.pickup_at"] - df_new["all_timestamps.request_at"]
df_new["p2_time"] = df_new["p2_time"].dt.total_seconds() / SECS_IN_HOUR # Convert to hours

df_new["p3_time"] = df_new["all_timestamps.dropoff_at"] - df_new["all_timestamps.pickup_at"]
df_new["p3_time"] = df_new["p3_time"].dt.total_seconds() / SECS_IN_HOUR # Convert to hours

# P3 average velocity will determine P1 and P2 distances
df_new["p3_dist"] = df_new.distance
df_new["p3_velocity"] = df_new["p3_dist"] / df_new["p3_time"]

df_new["p2_dist"] = df_new["p2_time"] * df_new["p3_velocity"]
df_new["p1_dist"] = df_new["p1_time"] * df_new["p3_velocity"]

df_new.head(10)

## Step 3 - Output

The following cells contain the data filtered as specified.

In [None]:
# Table 1 - by month-year
table_1 = df_new[["id","account","month_year", "p1_time", "p2_time", "p3_time", "p1_dist", "p2_dist", "p3_dist"]].groupby(["account","month_year"]).sum()
table_1

In [None]:
# Table 2 - by activity
table_2 = df_new[["id","account","type", "p1_time", "p2_time", "p3_time", "p1_dist", "p2_dist", "p3_dist"]].groupby(["account","type"]).sum()
table_2.unstack("type")

In [None]:
fig = plt.figure()
periods = ["Period 1", "Period 2", "Period 3"]

time_breakdown = df_new[["id","account","type", "p1_time", "p2_time", "p3_time", "p1_dist", "p2_dist", "p3_dist"]].groupby(["account"])

for idx, (account_id, group) in enumerate(time_breakdown):
  values = group.sum()

  ax1 = fig.add_axes([0,idx,1,0.8])
  ax1.set_title(f"Time breakdown (Account ID: {account_id})")
  ax1.set_ylabel("Time")

  times = [values.p1_time, values.p2_time, values.p3_time]
  ax1.bar(periods, times, color=["#81c784", "#64b5f6", "#ffb74d"])