# Descriptive (Spatial) Analytics

Analyze taxi demand patterns for the relevant one-year period and 
city (please check carefully which year your team has been allocated). 

Specifically show how these
patterns (start time, trip length, start and end location, price, average idle time between trips, and so 
on) for the given sample varies in different spatio-temporal resolution (i.e., census tract vs. varying
hexagon diameter and/or temporal bin sizes). 

Give possible reasons for the observed patterns.

## Average idle time

To account for large time gaps when the driver does not work, you can introduce a threshold to consider only significant idle periods. Here's an updated process:

1. Sort Trips: Sort the trips for each Uber driver by the start time in ascending order.

2. Identify Idle Periods: For each driver, find the time gap between the end time of one trip and the start time of the next trip. If the time gap exceeds a certain threshold (e.g., 30 minutes), consider it as a significant idle period. Ignore smaller gaps or consecutive gaps that are within the threshold, as they may indicate breaks or short periods of inactivity.

3. Sum Idle Times: Add up all the significant idle times calculated in step 2 for all drivers to get the total idle time.

4. Count the Idle Periods: Count the number of significant idle periods observed in step 2. This will be the total number of significant idle periods for all drivers.

5. Calculate the Average: Divide the total idle time by the number of significant idle periods to find the average idle time per significant idle period.

By introducing a threshold, you can exclude the large time gaps when the driver does not work from the calculation of the average idle time. This approach provides a more accurate representation of the idle time during active working periods.

Note: The threshold value you choose should be based on the specific context and requirements of your analysis. Adjust it accordingly to filter out idle periods that are not relevant to your calculation.

In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

In [2]:
# import datasets
dfChicago = pd.read_csv("data\datasets\df_chicago.csv.zip")
dfChicago_hourly = pd.read_csv("data\datasets\df_chicago_hourly.csv")

In [3]:
dfChicago.columns

Index(['Trip_ID', 'Taxi_ID', 'Trip_Start_Timestamp', 'Trip_End_Timestamp',
       'Trip_Seconds', 'Trip_Miles', 'Pickup_Census_Tract',
       'Dropoff_Census_Tract', 'Pickup_Community_Area',
       'Dropoff_Community_Area', 'Fare', 'Tips', 'Tolls', 'Extras',
       'Payment_Type', 'Pickup_Centroid_Latitude', 'Pickup_Centroid_Longitude',
       'Pickup_Centroid_Location', 'Dropoff_Centroid_Latitude',
       'Dropoff_Centroid_Longitude', 'Dropoff_Centroid__Location',
       'Community_Areas', 'date_start', 'dayOfWeek', 'isHoliday', 'season',
       'start_time_hourly', 'start_time_day', 'start_time_week',
       'start_time_month', 'isRushhour', 'velocity_(miles/h)', 'pressure',
       'datetime', 'humidity', 'temperature_celsius', 'wind_direction',
       'wind_speed', 'description', 'h3_hex_id_high_res',
       'h3_hex_id_medium_res', 'h3_hex_id_low_res'],
      dtype='object')

In [4]:
len(dfChicago['Taxi_ID'].unique())

3536

In [7]:
dfChicago.dtypes

Trip_ID                        object
Taxi_ID                        object
Trip_Start_Timestamp           object
Trip_End_Timestamp             object
Trip_Seconds                  float64
Trip_Miles                    float64
Pickup_Census_Tract           float64
Dropoff_Census_Tract          float64
Pickup_Community_Area         float64
Dropoff_Community_Area        float64
Fare                          float64
Tips                          float64
Tolls                         float64
Extras                        float64
Payment_Type                   object
Pickup_Centroid_Latitude      float64
Pickup_Centroid_Longitude     float64
Pickup_Centroid_Location       object
Dropoff_Centroid_Latitude     float64
Dropoff_Centroid_Longitude    float64
Dropoff_Centroid__Location     object
Community_Areas               float64
date_start                     object
dayOfWeek                      object
isHoliday                        bool
season                         object
start_time_h

In [10]:
from datetime import datetime

# Function to calculate average idle time
def calculate_average_idle_time(trip_data, threshold_minutes):
    total_idle_time = 0
    total_idle_periods = 0

    # Sort trip data by start time
    trip_data = trip_data.sort_values(by="Trip_Start_Timestamp")

    """ for i in range(len(trip_data) - 1):
        current_trip = trip_data[i]
        next_trip = trip_data[i + 1]

        current_end_time = datetime.strptime(current_trip["Trip_End_Timestamp"], "%I:%M %p")
        next_start_time = datetime.strptime(next_trip["Trip_Start_Timestamp"], "%I:%M %p")

        # Calculate time difference in minutes
        time_diff = (next_start_time - current_end_time).total_seconds() / 60

        if time_diff > threshold_minutes:
            total_idle_time += time_diff
            total_idle_periods += 1

    if total_idle_periods > 0:
        average_idle_time = total_idle_time / total_idle_periods
        return average_idle_time
    else:
        return 0 """

# Call the function with a threshold of 30 minutes
threshold = 30
average_idle = calculate_average_idle_time(dfChicago, threshold)

print(f"Average idle time: {average_idle} minutes")


Average idle time: None minutes


In [16]:
from datetime import datetime

# Sort trip data by start time
trip_data_test = dfChicago.sort_values(by=['Taxi_ID', 'Trip_Start_Timestamp'])
trip_data_test

Unnamed: 0,Trip_ID,Taxi_ID,Trip_Start_Timestamp,Trip_End_Timestamp,Trip_Seconds,Trip_Miles,Pickup_Census_Tract,Dropoff_Census_Tract,Pickup_Community_Area,Dropoff_Community_Area,...,pressure,datetime,humidity,temperature_celsius,wind_direction,wind_speed,description,h3_hex_id_high_res,h3_hex_id_medium_res,h3_hex_id_low_res
4144659,7c5f446862febec7ff725c9dbfda3f53d2b58cff,001330b81e23412049f9c3eff5b6e972a91afe59c9aa36...,2013-07-12 16:00:00,2013-07-12 16:00:00,1560.0,6.0,,,6.0,15.0,...,1018.0,2013-07-12 16:00:00,43.0,24.64,59.0,3.0,sky is clear,882664c163fffff,872664c16ffffff,862664c17ffffff
4149562,106f37011cf2e3a4164bb03a8e33f8a77ad2b31b,001330b81e23412049f9c3eff5b6e972a91afe59c9aa36...,2013-07-12 19:00:00,2013-07-12 19:00:00,1620.0,5.8,1.703183e+10,1.703108e+10,6.0,8.0,...,1018.0,2013-07-12 19:00:00,38.0,25.86,63.0,4.0,sky is clear,882664c161fffff,872664c16ffffff,862664c17ffffff
4151121,7797e40db82db8850c092c84b3f2c3b8fec632b1,001330b81e23412049f9c3eff5b6e972a91afe59c9aa36...,2013-07-12 20:00:00,2013-07-12 20:00:00,1200.0,3.8,1.703108e+10,1.703124e+10,8.0,24.0,...,1018.0,2013-07-12 20:00:00,39.0,25.84,0.0,4.0,sky is clear,882664c1e1fffff,872664c1effffff,862664c1fffffff
4151836,28fc5c3b06a7096764752e5b73e9451c7b96b5ee,001330b81e23412049f9c3eff5b6e972a91afe59c9aa36...,2013-07-12 20:00:00,2013-07-12 20:00:00,720.0,3.1,1.703124e+10,1.703107e+10,24.0,7.0,...,1018.0,2013-07-12 20:00:00,39.0,25.84,0.0,4.0,sky is clear,882664cac3fffff,872664cacffffff,862664cafffffff
4174578,c51e93e5ce3e49ff3507992843e4c1998eb60a5f,001330b81e23412049f9c3eff5b6e972a91afe59c9aa36...,2013-07-13 14:00:00,2013-07-13 14:00:00,480.0,1.4,,,31.0,28.0,...,1020.0,2013-07-13 14:00:00,60.0,22.26,17.0,1.0,sky is clear,882664cf61fffff,872664cf6ffffff,862664cf7ffffff
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8966792,7d07ddc4a57cb6bce30b35b05ea269bb63e4a64c,ffe37a1527449eac0fc3d72c9032ee2dc0eff65bb6e7f4...,2013-12-31 22:00:00,2013-12-31 22:00:00,600.0,1.9,1.703133e+10,1.703108e+10,33.0,8.0,...,1026.0,2013-12-31 22:00:00,89.0,-11.24,241.0,1.0,snow,882664c1b1fffff,872664c1bffffff,862664c1fffffff
8968374,0d44cdd074cc2b24ad6dd9378a324578ef0838fd,ffe37a1527449eac0fc3d72c9032ee2dc0eff65bb6e7f4...,2013-12-31 22:00:00,2013-12-31 22:00:00,240.0,0.7,1.703108e+10,1.703108e+10,8.0,8.0,...,1026.0,2013-12-31 22:00:00,89.0,-11.24,241.0,1.0,snow,882664c1e1fffff,872664c1effffff,862664c1fffffff
8972103,ce7b2291442653209b32e99d61d1e1e9d2a3ea97,ffe37a1527449eac0fc3d72c9032ee2dc0eff65bb6e7f4...,2013-12-31 23:00:00,2013-12-31 23:00:00,240.0,0.5,1.703108e+10,1.703108e+10,8.0,8.0,...,1023.0,2013-12-31 23:00:00,89.0,-11.00,217.0,1.0,mist,882664c1e9fffff,872664c1effffff,862664c1fffffff
8973113,249b27470cc87c05de433798fc518b77ae87b657,ffe37a1527449eac0fc3d72c9032ee2dc0eff65bb6e7f4...,2013-12-31 23:00:00,2013-12-31 23:00:00,540.0,1.2,1.703132e+10,1.703108e+10,32.0,8.0,...,1023.0,2013-12-31 23:00:00,89.0,-11.00,217.0,1.0,mist,882664c1e3fffff,872664c1effffff,862664c1fffffff


### Census tract vs. varying hexagon diameter

### Census tract vs. diff temporal bin sizes

## More features