# **Uber Supply-Demand Gap Analysis**    -



##### **Project Type**    - Exploratory Data Analysis(EDA)
##### **Contribution**    - Individual
##### **Team Member 1 -** Himanshu Arya

# **Project Summary -**

This project analyzes Uber request data to identify and explain the mismatch between ride demand and supply. It uses real-world data to uncover time-based and location-based issues such as high cancellation rate, "No Cars Available spikes, and low trip fulfillment in key time slots.

I used Python for EDA. I used Pandas for analyzing the data and Matplotlib & Seaborn for the visualization.

The findings can help Uber improve driver allocation, reduce cancellation, and enhance customer satisfaction.

# **GitHub Link -**

https://github.com/HiAr21/Uber_Supply-Demand_Gap_AnalysisProvide

# **Problem Statement**


In many urban regions, Uber experiences frequent demand-supply mismatches, leading to poor user experience such as no cars available or high cancellation, especially during peak hours.
Aim to identify:
- When and where demand is high
- When and where supply fails
- Which combination of time and pickup point are most problematic

#### **Define Your Business Objective?**

The objective is to perform a detailed EDA to:

- Identify periods with peak demand and low supply
- Quantify supply shortfall using trip completion data
- Provide actionable insights to reduce failed bookings
- Recommend data-driven solutions to improve Uber’s operational efficiencyAnswer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
# Load Dataset
df = pd.read_csv("uber_data_eda.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
(no_of_row,no_of_col)=df.shape
print(f"Number of Rows : {no_of_row}")
print(f"Number of Columns : {no_of_col}")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(),cbar=False)

### What did you know about your dataset?

The dataset contains detailed Uber ride request logs collected over a few days. Each row represents a unique ride request and includes:

- Request & Drop timestamps

- Pickup Point (either City or Airport)

- Driver ID (if a driver was assigned)

- Request Status — either:
  - Trip Completed
  - Cancelled
  - No Cars Available

The dataset also includes additional derived fields such as:

- Request Hour and Time Slot (Morning, Day, Evening, Late Night)

- A flag indicating whether the trip was completed or not

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Check Unique Values for each variable.

In [None]:
df['Request Date'].unique()

In [None]:
df.nunique()

In [None]:
# Check Unique Values for each variable.
df['Status'].value_counts()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Convert datatype of date&time to datetime
df['Request Date & Time'] = pd.to_datetime(df['Request Date & Time'],format='%d/%m/%Y %H:%M:%S')
df['Drop Date & Time'] = pd.to_datetime(df['Drop Date & Time'],format='%d/%m/%Y %H:%M:%S')

df['Request Date'] = pd.to_datetime(df['Request Date'],format='%d/%m/%Y')
df['Drop Date'] = df['Drop Date & Time'].dt.date
df['Drop Date'] = pd.to_datetime(df['Drop Date'],format='%d/%m/%Y')

In [None]:
df['Request Time'] = pd.to_datetime(df['Request Time'],format='%H:%M:%S')

In [None]:
df['Drop Time'] = df['Drop Date & Time'].dt.time

In [None]:
df.info()

In [None]:
# Create Gap Score of Trips completed and Total requests (demand - supply)
df['Trip Completed'] = df['Status'] == 'Trip Completed'

gap_df = df.groupby(['Time Slot', 'Pickup point'])['Trip Completed'].agg(['count', 'sum']).reset_index()
gap_df['Gap_Score'] = gap_df['count'] - gap_df['sum']
gap_df.rename(columns={'count': 'Total_Requests', 'sum': 'Trips_Completed'}, inplace=True)

gap_df['Trip_Completed(%)'] = gap_df['Trips_Completed']/gap_df['Total_Requests']*100

gap_df.sort_values(by='Trip_Completed(%)', ascending=False)

gap_df

In [None]:
#chart-4 Status Proportion by Pickup Point
pickup_status = df.groupby(['Pickup point', 'Status']).size().reset_index(name='count')
pickup_total = df.groupby('Pickup point').size().reset_index(name='total')
pickup_status = pickup_status.merge(pickup_total, on='Pickup point')
pickup_status['percent'] = (pickup_status['count'] / pickup_status['total']) * 100

pickup_status

In [None]:
#chart-5 Heatmap: Hour vs Status

heat_data = df.groupby(['Request Hour', 'Status']).size().unstack().fillna(0)
heat_data

In [None]:
#chart-6 Trip Duration Distribution

df['Trip Duration (min)'] = (df[df['Status']=='Trip Completed']['Drop Date & Time'] - df[df['Status']=='Trip Completed']['Request Date & Time']).dt.total_seconds() / 60
df.loc[df['Status'] != 'Trip Completed', 'Trip Duration (min)'] = None

completed_trips = df[df['Status'] == 'Trip Completed']

completed_trips.head()


In [None]:
#chart-7 % of No Cars/Cancellations per Time Slot

slot_status = df.groupby(['Time Slot', 'Status']).size().reset_index(name='count')
slot_total = df.groupby('Time Slot').size().reset_index(name='total')
slot_status = slot_status.merge(slot_total, on='Time Slot')
slot_status['percent'] = (slot_status['count'] / slot_status['total']) * 100

slot_status

In [None]:
#chart-8 Pickup Point vs Time Slot Heatmap

pt_heat = df.groupby(['Pickup point', 'Time Slot'])['Status'].value_counts().unstack().fillna(0)

pt_heat

In [None]:
#chart-9 Line Plot of Requests Over Time (Daily)

requests_per_day = df.groupby('Request Date').size().reset_index(name='Requests')
requests_per_day

### What all manipulations have you done and insights you found?


1. Converted Date & Time Columns to datetime format

  * Both Request Date & Time and Drop Date & Time were in mixed formats.
  * Standardized them using pd.to_datetime() with day-first parsing to ensure accurate time-based analysis.

2. Created Request Hour and Time Slot columns

  * Request Hour was extracted from the datetime to understand hourly trends.

  * Time Slot categorized the day into Late Night, Morning, Day, and Evening — useful for grouping and peak analysis.

3. Created Trip Completed Flag

  * A binary column to indicate whether the request led to a successful trip (based on Status = "Trip Completed").

4. Computed Gap Score

  * A new metric calculated as:
Gap Score = Total Requests - Completed Trips

  * Helps quantify the demand-supply gap in each group (time slot, pickup point).

5. Calculated Trip Duration (for completed trips)

  * Derived from the difference between drop and request timestamps, converted to minutes.

  * Used only where both timestamps exist (i.e., for Trip Completed).

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:
df.head()

#### Chart - 1

In [None]:
#1 Request by Hour and Status
plt.figure(figsize=(12, 6))
sns.countplot(x='Request Hour', hue='Status', data=df)
plt.title('Hourly Requests by Status')
plt.grid()
plt.show()


#### Chart - 2

In [None]:
#2 Time Slot vs Status
plt.figure(figsize=(8, 5))
sns.countplot(x='Time Slot', hue='Status', data=df, order=['Late Night', 'Morning', 'Day', 'Evening'])
plt.title('Requests by Time Slot and Status')
plt.show()


#### Chart - 3

In [None]:
#3 Pickup Point vs Status
plt.figure(figsize=(6,4))
sns.countplot(x='Pickup point', hue='Status', data=df)
plt.title('Request Status by Pickup Point')
plt.show()

#### Chart - 4

In [None]:
# Status Proportion by Pickup Point
plt.figure(figsize=(4,2.5))
sns.barplot(x='Pickup point', y='percent', hue='Status', data=pickup_status)
plt.title('Proportion of Status by Pickup Point (%)')
plt.ylabel('Percentage')
plt.show()

Airport has higher No Cars Available %, City has more Cancellations — both signal supply failure but from different causes.

##### 1. Why did you pick the specific chart?

To compare how ride outcomes (Completed, Cancelled, No Cars) vary between City and Airport pickups. A percentage-based bar chart allows clear proportional comparison.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Helps Uber focus supply expansion at the Airport and work on cancellation reduction in the City through driver incentives or UI improvements.

Yes. Persistent No Cars Available at Airport can push users to competitors or taxis.

#### Chart - 5

In [None]:
# Heatmap: Hour vs Status

plt.figure(figsize=(6,6))
sns.heatmap(heat_data, annot=True, fmt=".0f")
plt.title("Request Status by Hour (Heatmap)")
plt.ylabel("Hour of Day")
plt.xlabel("Status")
plt.show()

Shows exactly what status dominates at what hour — e.g., "No Cars Available" spike 5–9 AM.

##### 1. Why did you pick the specific chart?

A heatmap provides a visual intensity map of how status outcomes vary by hour, showing peak problem periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Time-specific patterns help deploy drivers proactively before peak failure windows.

Yes. If these hours continue to fail, Uber could lose commuter and business traffic.

#### Chart - 6 : Trip Duration Distribution

In [None]:
# Check for null or negative values
completed_trips['Trip Duration (min)'].describe()
completed_trips[completed_trips['Trip Duration (min)'] < 0].head()


In [None]:
# Keep only valid durations
filtered = completed_trips[
    (completed_trips['Trip Duration (min)'] > 0) &
    (completed_trips['Trip Duration (min)'] < 120)
]

In [None]:
#6 Trip Duration Distribution
plt.figure(figsize=(8,5))
sns.histplot(filtered['Trip Duration (min)'], bins=40, kde=True)
plt.title('Distribution of Trip Durations (Completed Trips)')
plt.xlabel('Duration (minutes)')
plt.show()

##### 1. Why did you pick the specific chart?

To understand how long successful trips take. A histogram with KDE curve reveals duration spread and potential outliers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Helps price short vs long trips better, and target flat fares more effectively.

Outliers may indicate traffic delays or inefficient routing, leading to customer frustration.

#### Chart - 7 : % of No Cars/Cancellations per Time Slot

In [None]:
#7 % of No Cars/Cancellations per Time Slot

plt.figure(figsize=(10,5))
sns.barplot(x='Time Slot', y='percent', hue='Status', data=slot_status, order=['Late Night', 'Morning', 'Day', 'Evening'])
plt.title('Percentage of Each Status per Time Slot')
plt.ylabel('% of Requests')
plt.show()


Gives clear % context — e.g., Morning = 55% No Cars Available at Airport.

##### 1. Why did you pick the specific chart?

To identify which time slots have the most unfulfilled demand — a stacked percentage bar chart reveals imbalance quickly.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Lets Uber customize solutions per time slot — more drivers in morning, cancellation deterrents in evening.

Yes. If Morning No Car rates persist, it can cause long-term user churn during a mission-critical window.

#### Chart - 8 : Pickup Point vs Time Slot Heatmap

In [None]:
#8 Pickup Point vs Time Slot Heatmap

plt.figure(figsize=(6,4))
sns.heatmap(pt_heat, annot=True, fmt=".0f", cmap="YlOrBr")
plt.title('Request Outcomes by Pickup Point and Time Slot')
plt.show()


Shows which pickup+time combos are broken (e.g., Airport+Morning = red zone).

##### 1. Why did you pick the specific chart?

To cross-analyze time + location together, which helps identify specific problem zones (like Airport in Morning).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. This level of granularity helps Uber focus supply and outreach surgically, not just broadly.


Airport-Morning users are often time-sensitive (flights). Continued failure here will lead to high-value customer churn.



#### Chart - 9 : Line Plot of Requests Over Time (Daily)

In [None]:
#9 Line Plot of Requests Over Time (Daily)

plt.figure(figsize=(8,4))
sns.lineplot(x='Request Date', y='Requests', data=requests_per_day, marker='o')
plt.title('Total Requests Over Days')
plt.xticks(rotation=45)
plt.grid()
plt.show()

##### 1. Why did you pick the specific chart?

To observe daily request patterns and identify anomalies or consistent growth.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Confirms Uber can rely on consistent demand and plan driver schedules with confidence.


Not directly — but any future daily dips (e.g., drop after cancellations spike) can be early signs of customer dissatisfaction.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

 1. Increase Driver Availability During Morning Hours : Morning (5–9 AM) shows the highest demand but lowest completion rates, especially at the Airport.

    Recommendation:

    * Offer time-based driver incentives or bonuses during Morning shifts.
    * Use notifications to encourage driver logins before 5 AM, especially around airports.

2. Deploy Targeted Supply at the Airport : Airport pickups consistently suffer from "No Cars Available," especially in the Morning.

    Recommendation:

    * Assign a minimum driver quota to be present near airports during high-demand slots.
    * Create dynamic geofenced incentives for drivers in airport zones.

3. Reduce Evening Cancellations from City : Cancellations are highest in the Evening, mostly from City pickups.

    Recommendation:
      
    * Introduce cancellation penalties or delay deterrents for drivers.
    * Use customer alerts: “High cancellation zone — request another ride in X mins.”

5. Focus on High-Risk Segments (Airport-Morning Users) : Airport-Morning users are time-sensitive (flights, early commutes).

    Recommendation:

    * Flag these users for priority fulfillment or guaranteed rides.
    * Consider offering premium ride guarantees or loyalty credits if no cars are available.

# **Conclusion**

Key findings include:

* High demand but low fulfillment during Morning slots, especially at the Airport, due to unavailability of drivers.

* Evening slot shows a spike in cancellations, particularly in City pickups, indicating possible driver disengagement or traffic-related hesitations.

* Trip completion rates vary significantly based on the combination of time slot and pickup point, making it crucial for Uber to approach resource allocation more surgically.

* Most trips are short (~<30 minutes), which indicates high potential for high turnover if demand is met effectively.Write the conclusion here.