# Vehicle Stop Exploration
__Author:__ Joshua Shew  
__Date:__ August 8, 2023

This notebook explores data collected from July 30 to August 4, on recorded
vehicle coordinates, collected every second while the route is active. It aims
to investigate the feasibility of detecting vehicle stops, particularly at bus
stops. It is based on the work by Gabriel in the "Initial Data Exploration"
report.

## Module Import

In [None]:
import os

import pandas as pd
import geopandas as gp
from dotenv import load_dotenv
_ = load_dotenv()

## Data Retrieval and Cleaning

Import and sample vehicle data

1. Retrieve the data from a CSV file
1. Clean the data (remove testing entries)
1. Sort the data
1. Sample 10,000 entries

In [None]:
vehicle_data = pd.read_csv("data/vehicles_weekly_20230805.csv", low_memory=False)
vehicle_data = vehicle_data[vehicle_data["callName"] != "abc123"] # Remove a test callName
vehicle_data = vehicle_data.sort_values("id").reset_index(drop=True) # Sort by id
print(f"The data contains {len(vehicle_data)} entries. We sample 10,000 for exploration.")
vehicle_data = vehicle_data.sample(10_000, random_state=1)
print("See summary statistics below.")
vehicle_data.describe()

Access stop data

1. Request data from API
1. Parse data
1. Remove unneeded elements

In [None]:
import requests

url = "https://transloc-api-1-2.p.rapidapi.com/stops.json"
querystring = {"agencies": "1199", "callback": "call"}
headers = {
    "X-RapidAPI-Key": os.getenv("X_RAPIDAPI_KEY"),
    "X-RapidAPI-Host": "transloc-api-1-2.p.rapidapi.com",
}
response = requests.get(url, headers=headers, params=querystring)
response.raise_for_status()
raw_stops_data = response.json()
for index, stop_data in enumerate(raw_stops_data["data"]):  # Flatten stop JSON
    stop_data["lat"] = stop_data["location"]["lat"]
    stop_data["lng"] = stop_data["location"]["lng"]
    del stop_data["location"]
stops_data = pd.DataFrame(raw_stops_data["data"])
columns_to_remove = [
    "description",
    "url",
    "parent_station_id",
    "agency_ids",
    "station_id",
    "location_type",
    "routes",
]
stops_data = stops_data[list(set(stops_data.columns) - set(columns_to_remove))][
    ["stop_id", "name", "lat", "lng", "code"]
]
print(f"We now have data for {len(stops_data)} stops.")
stops_data.head()