# Taxi Monitoring Simulation

## Overview

This notebook demonstrates how to build a real-time taxi monitoring system using Redis as the backend database. We'll simulate the operation of thousands of taxis in New York City by processing actual trip data from January 2025.

The simulation models taxi dispatching, pickup, and drop-off events to create a realistic representation of urban mobility patterns. Through this project, we'll explore:

1. **Data Processing**: Loading and transforming NYC yellow taxi trip data
2. **Event-Based Architecture**: Using Redis to track taxi statuses and locations in real-time
3. **Driver Assignment**: Implementing algorithms to match available drivers with pickup requests
4. **Performance Analysis**: Evaluating system throughput and identifying peak demand periods

This project showcases practical applications of Redis data structures (sorted sets, hashes) for tracking real-time events in a high-volume transportation system. The techniques demonstrated here can be adapted for other real-time tracking applications such as food delivery, logistics, or ride-sharing services.

Let's begin by setting up our environment and exploring the dataset.

## Setup

In [None]:
%%capture
%pip install redis rich tqdm 

In [None]:
import pandas as pd
import redis
from rich.pretty import pprint
from random import random
from tqdm import tqdm

In [None]:
# creating cluster
# make sure jupyter server is connected to redis network
# `docker network connect redis_default jupyter-jupyter-1`
r = redis.RedisCluster(host='redis-master-1', port=6379)

In [None]:
# removing previous runs
_=[r.delete(k) for k in r.scan_iter("nyc:*")]

## Introduction to the Yellow Taxi Trip Data (January 2025)

The dataset `yellow_tripdata_2025-01.parquet` contains detailed information about yellow taxi trips in New York City for the month of January 2025. This dataset is part of the larger collection of yellow taxi trip records maintained by the New York City Taxi and Limousine Commission (TLC), which provides valuable insights into the city's transportation patterns.

### Dataset Contents

The dataset includes various attributes for each taxi trip, such as:

- **pickup_datetime**: The date and time when the passenger was picked up.
- **dropoff_datetime**: The date and time when the passenger was dropped off.
- **pickup_location_id**: A unique identifier for the location where the trip began.
- **dropoff_location_id**: A unique identifier for the location where the trip ended.
- **passenger_count**: The number of passengers in the taxi during the trip.
- **trip_distance**: The total distance of the trip in miles.
- **fare_amount**: The fare charged for the trip.
- **tip_amount**: The amount of tip given by the passenger.
- **total_amount**: The total amount charged for the trip, including fare, tip, and any additional fees.

This dataset is instrumental for various analyses, including understanding traffic patterns, evaluating taxi service efficiency, and studying the economic aspects of taxi operations in New York City. By leveraging this data, researchers, policymakers, and businesses can gain insights into urban mobility and make informed decisions to enhance transportation services.

### Download
To download the `yellow_tripdata_2025-01.parquet` dataset, you can use the following commands in a jupyter terminal session: 

```bash
mkdir -p ~/downloads
cd ~/downloads
wget https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2025-01.parquet
```


## Extracting Trip Data Events

In this section, we will focus on extracting the pickup and drop-off events for each taxi trip record from the `yellow_tripdata_2025-01.parquet` dataset, specifically for the first five days of January 2025. 

By isolating these events, we aim to simulate a real-time system that tracks New York City taxi traffic.


In [None]:
df = pd.read_parquet("/home/jovyan/downloads/yellow_tripdata_2025-01.parquet")

In [None]:
items = df[["VendorID","tpep_pickup_datetime","tpep_dropoff_datetime","PULocationID","DOLocationID"]]
items = items[items.tpep_pickup_datetime < pd.Timestamp("2025-01-05")]
items

## Rules of the Taxi Management System

This system simulates the management of taxis in New York using Redis to keep track of drivers and their locations. Each ride record can be viewed as two events: a departure and an arrival. Below are the main rules and logic implemented in the system:

1. **Departure and Arrival Events**:
   - Each record represents a departure event (when a passenger requests a taxi) and an arrival event (when the taxi arrives at the destination).

2. **Managing Departures**:
   - When a passenger requests a taxi, the system looks for an available driver in the pickup location. If no driver is found in the area, a new one is created, assuming they were already present in that location to allow the departure event to occur.
   - Once the driver is selected, they begin heading towards the arrival zone. The time required for the trip is not specified, but the arrival event is recorded in an intermediate structure that keeps track of drivers in transit.

3. **Managing Arrival Events**:
   - When a record is processed, the system also handles all arrival events that occur before the departure event. This ensures that the system remains updated and that drivers who have already arrived at destinations are correctly recorded.

4. **Assumption of Permanence**:
   - It is assumed that a driver remains at the arrival location after completing a trip. This means that once they arrive at the destination, the driver is considered available for new requests in the new location.

5. **Recording Arrivals**:
   - When a driver arrives at a specific location, they are registered in the system. This allows for tracking available drivers based on their current location.

6. **Assigning Drivers**:
   - If a passenger requests a taxi and an available driver is found, that driver is removed from the list of available drivers and assigned to the ride.

In [None]:
def create_driver(vendor):
    """
    Creates a new driver in the system.

    This function increments the driver counter to generate a unique driver ID,
    and stores the driver's information in Redis, associating the driver with the specified vendor.

    Parameters:
    vendor (int): The vendor ID associated with the driver.

    Returns:
    str: The key of the newly created driver in the format "nyc:drivers:{driver_id}".
    """
    driver_id = r.incr("nyc:driver_counter")
    r.hset(f"nyc:drivers:{driver_id}",mapping={"vendor":int(vendor)})
    return f"nyc:drivers:{driver_id}"

def driver_arrived(vendor, driver, location):
    """
    Records the arrival of a driver at a specific location.

    This function adds the driver to a sorted set in Redis that keeps track of available drivers
    at the specified location, using a random value to represent the driver's priority.

    Parameters:
    vendor (int): The vendor ID associated with the driver.
    driver (str): The ID of the driver.
    location (int): The location ID where the driver has arrived.
    """
    r.zadd(f"nyc:location:{location}-{vendor}", {driver: random()})

def find_driver(location,vendor):
    """
    Finds an available driver in the specified location.

    This function checks if there are any available drivers in the specified location for the given vendor.
    If no driver is found, a new driver is created. If a driver is found, that driver is removed from the
    list of available drivers.

    Parameters:
    location (int): The location ID where a driver is needed.
    vendor (int): The vendor ID associated with the driver.

    Returns:
    str: The ID of the found or newly created driver.
    """
    candidate = r.zrange(f"nyc:location:{location}-{vendor}", 0, 1)
    if not candidate:
        driver = create_driver(vendor)
    else:
        driver = candidate[0].decode()
        r.zrem(f"nyc:location:{location}-{vendor}",driver)
    return driver

def send_driver(vendor,driver,where,when):
    """
    Sends a driver to a specified destination.

    This function adds the driver to a sorted set in Redis that keeps track of drivers currently in transit,
    associating the driver with the vendor and the destination location, along with the timestamp of the event.

    Parameters:
    vendor (int): The vendor ID associated with the driver.
    driver (str): The ID of the driver being sent.
    where (int): The destination location ID.
    when (datetime): The timestamp of when the driver is sent.
    """
    r.zadd(f"nyc:running",{f"{vendor}-{driver}-{where}":when.timestamp()})

def handle_finished_trips(time):
    """
    Handles the completion of trips for drivers.

    This function checks for drivers that have arrived at their destinations before the specified time,
    removes them from the list of drivers in transit, and records their arrival at the respective locations.

    Parameters:
    time (datetime): The current time used to check for finished trips.
    """
    arrived_drivers = r.zrangebyscore("nyc:running","-inf",record.tpep_pickup_datetime.timestamp())
    for item in arrived_drivers:
        r.zrem("nyc:running", item)
        vendor, driver, location = item.split(b"-")
        driver = (driver).decode()
        location = int(location)
        vendor = int(vendor)
        # print("driver",driver ,"arrived to location",location)
        driver_arrived(vendor, driver, location)

## Running the Simulation

In this phase, we will run a simulation of the taxi trip events that we have extracted. The events will be processed at full speed, without accounting for the actual delays that would occur between consecutive events. This approach allows us to evaluate the performance and efficiency of our system in handling a substantial volume of data.

### Objectives of the Simulation

- **Performance Testing**: By simulating the processing of five days' worth of taxi trip data, we can assess the time required for our system to handle this volume of information.
- **System Capability Evaluation**: Since the simulation will complete in a fraction of the actual five days, we will be able to test the system's capability to manage taxi traffic in a large metropolis like New York City.
- **Scalability Insights**: This simulation will provide insights into how well our system can scale and respond to real-time data influx, which is crucial for urban mobility solutions.

In [None]:
for i, record in tqdm(items.iterrows(),total=len(items)):
    # print (record.tpep_pickup_datetime)
    handle_finished_trips(record.tpep_pickup_datetime)
    
    driver = find_driver(record.PULocationID,record.VendorID)
    send_driver(record.VendorID,driver,record.DOLocationID,record.tpep_dropoff_datetime)
    
    # print("sending ",driver, "to location", record.DOLocationID)

## Evaluating Pickup per Minute

The rate of pickups per unit of time is not constant and can vary significantly throughout the day. In this section, we aim to evaluate the peak traffic periods for taxi pickups in New York City and compare these rates with the processing capacity of our system, which we estimated in the previous step.

### Objectives

- **Identify Peak Traffic Periods**: Analyze the pickup data to determine the minutes with the highest number of pickups.
- **Compare with System Capacity**: Assess how the peak pickup rates align with the estimated processing capabilities of our system. This will help us understand if our system can handle the influx of data during peak times.


In [None]:
_=(
    items.tpep_pickup_datetime.rename("pickup time")
    .dt.to_period('min')
    .value_counts().sort_index()
    .to_frame().plot(title="Pickups Per minute",figsize=(16,6)))