# **Project Name**    -





##### **Project Name**    - UBER SUPPLY DEMAND GAP
##### **Contribution**    - Individual
##### **Member Name**     - ANSH SINGH

# **Project Summary -**

The Uber Request Data Analysis project was undertaken with the objective of examining ride request patterns in order to identify and understand supply-demand discrepancies within Uber’s service operations. By analyzing real-world data comprising detailed information about individual ride requests, this study aims to uncover the factors contributing to frequent ride cancellations and unfulfilled ride requests, especially during high-demand periods.

---
The dataset utilized for this project includes essential variables such as Request ID, Pickup Point, Request Timestamp, Drop Timestamp, and Status. Each row in the dataset represents a unique ride request, capturing the location from which the ride was requested (City or Airport), the date and time of the request, the ride status (Trip Completed, Cancelled, or No Cars Available), and the completion time where applicable. These features provide a rich foundation for analyzing patterns related to rider demand, service fulfillment, and operational bottlenecks.

---

The project began with a structured data cleaning and preprocessing phase. All timestamp fields were converted into a standardized datetime format. Missing values were primarily found in the Drop Timestamp column, which is expected as only completed rides have a drop time. Duplicate entries were checked and eliminated to ensure the integrity of the analysis. Additional derived features were created for more granular analysis, including Hour (extracted from the request timestamp) and a categorized Time Slot variable that groups requests into logical intervals such as Late Night, Early Morning, Morning, Afternoon, and Evening.

To simplify and standardize the analysis of ride outcomes, a new categorical variable titled Ride Success was introduced. This variable classifies ride outcomes into three distinct categories: Fulfilled (Trip Completed), Cancelled (Cancelled by driver or rider), and Unfulfilled (No Cars Available). This feature enabled a clearer comparison of service performance across different times and pickup points.

---
In conclusion, this project demonstrates the effectiveness of data analytics in diagnosing and addressing real-world logistical inefficiencies. By leveraging structured data, statistical reasoning, and visualization, the Uber Request Data Analysis project provides a foundation for data-driven decision-making in urban mobility services.



# **GitHub Link -**

https://github.com/Ansh3105/Uber_Demand_Gap_Analysis

# **Problem Statement**


This project aims to conduct a demand gap analysis for Uber by examining ride request data to identify when and where the platform struggles to meet customer demand. Frequent issues such as ride cancellations and “No Cabs Available” messages — particularly during late-night and early-morning hours at high-traffic locations like airports — highlight a mismatch between supply and demand. By analyzing these patterns, the project seeks to uncover the root causes of the demand-supply gap and recommend strategies to improve cab availability and service efficiency.








#### **Define Your Business Objective?**

The business objective of this project is to enhance Uber’s operational efficiency and customer satisfaction by identifying and addressing demand-supply gaps in its ride service. By analyzing historical ride request data, the project aims to pinpoint the specific time slots and locations—such as airports during late-night and early-morning hours—where Uber consistently fails to meet rider demand. The insights generated will help the company make data-driven decisions to optimize driver allocation, reduce ride cancellations, and improve service availability. Ultimately, the goal is to strengthen Uber’s market position by ensuring a more reliable and responsive ride experience for customers.











# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

# Data handling
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset

df = pd.read_csv('Uber Request Data.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head(10)

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicates = df.duplicated().sum()
print(f"Number of duplicates are {duplicates}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values

missing_values = df.isnull().sum()

# Plot
missing_values.plot(kind='bar', color='skyblue')
plt.title("Missing Values per Column")
plt.xlabel("Columns")
plt.ylabel("Number of Missing Values")
plt.show()


### What did you know about your dataset?

It contains 6745 number of  rows and 12 columns.

---

It includes the following key columns:

Request id  :   Unique ID for each ride request.

Pickup point : Location of pickup – either City or Airport.

Status : Final ride status – Trip Completed, Cancelled, or No Cars Available.

Pickup Date : Date  When the ride was requested.

Pickup Time : Date  When the ride was requested.

Drop Date : Date  When the ride was completed.

Drop Time : time  When the ride was completed.

Day of week : Day on which ride was done


Hour_of_pickup : Hour of day when ride was requested(0 as 12:00 am , 23 as 11:00 pm)

Time_Slots : categorized hours of day such as ( 0-3:Late night , etc)

Ride Success : ride was fullfilled , not avaialible or cancelled

Car Avaibility : Shows Car Avaialible or not


---

Drop timestamp has many missing values, which is expected:

Rides that were cancelled or unfulfilled (no cars) don't have a drop time.

---

No significant duplicate Request id values (checked using .duplicated().sum()).

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Variables Description

It includes the following key variables:

Request id  :   Unique ID for each ride request.

Pickup point : Location of pickup – either City or Airport.

Status : Final ride status – Trip Completed, Cancelled, or No Cars Available.

Pickup Date : Date  When the ride was requested.

Pickup Time : Date  When the ride was requested.

Drop Date : Date  When the ride was completed.

Drop Time : time  When the ride was completed.

Day of week : Day on which ride was done


Hour_of_pickup : Hour of day when ride was requested(0 as 12:00 am , 23 as 11:00 pm)

Time_Slots : categorized hours of day such as ( 0-3:Late night , etc)

Ride Success : ride was fullfilled , not avaialible or cancelled

Car Avaibility : Shows Car Avaialible or not

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# WE ALREADY DID THE DATA CLEANING , FORMATING , WRANGLING IN EXCEL FILE AND UPLOADED THAT EXCEL FILE WITH THE NAME (Uber Request Data.csv).
# HENCE NO NEED FOR DATA WRANGLING WITH THE HELP OF PYTHON

### What all manipulations have you done and insights you found?

Loaded the dataset using pandas from a .csv file.

---
Checked the dataset structure using:

df.shape to get the number of rows and columns.

df.info() to view data types and missing values.

df.head() to preview the data.

---

Handled missing values:

Found missing values in the Drop timestamp column.

These were not removed because they represent valid cases of cancellations or unfulfilled requests (expected in this context).

---


Checked for duplicates:


Used df.duplicated().sum() to check for duplicate rows.

No duplicates found

---

Standardized time format:

Extracted only the time part from Request timestamp for easier time-based grouping.

And seprated the date part for better understanding of data

---

Created derived columns for analysis:

While not part of cleaning, some new columns (like Hour, Time_Slots,Day Of Week,ride_success, car avaibility ) were added after cleaning to support later steps.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1

hourly_requests = df.groupby('Hour_of_Pickup')['Request id'].count().sort_index()

# Plot the bar chart
plt.figure(figsize=(8, 5))
hourly_requests.plot(kind='bar', color='steelblue', edgecolor='black')
plt.title('Number of Ride Requests by Hour of the Day')
plt.xlabel('Hour of Day (0–23)')
plt.ylabel('Number of Requests')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This chart helps visualize when ride demand is highest and lowest throughout the day. By plotting the number of requests (from Request id) against each hour of the day (hour_of_pickup), we can:

Identify peak demand periods

Detect low-activity hours

Analyze time-based behavior of riders



##### 2. What is/are the insight(s) found from the chart?

Peak Hours

High request volume between certain hours (e.g., 8 AM–11 AM or 5 PM–9 PM) might indicate rush hours.

Low Demand Hours

Very few requests at late night (e.g., 2 AM–5 AM), likely due to low activity or lack of available cars.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive effects:

Helps optimize driver availability during peak hours.

Supports better operational planning and scheduling.

Useful for marketing strategies to target low-demand hours.

Quick visual representation of overall ride demand throughout the day.




#### Chart - 2

In [None]:
# Chart - 2
pickup_counts = df['Pickup point'].value_counts()

# Plot the bar chart
plt.figure(figsize=(8, 6))
pickup_counts.plot(kind='bar', color=['red', 'skyblue'], edgecolor='black')
plt.title('Number of Ride Requests by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.show()

##### 1. Why did you pick the specific chart?

This chart shows the total number of ride requests made from different pickup points — mainly City and Airport. It helps understand where demand is geographically concentrated.

##### 2. What is/are the insight(s) found from the chart?

Identifies whether most users are requesting rides from the City or the Airport.

Highlights imbalance in ride demand between the two locations.

Helps determine where driver availability should be prioritized.

Can guide location-specific strategies to improve fulfillment rates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive effects:

Supports better driver distribution between the City and Airport.

Helps Uber align supply based on geographic demand.



#### Chart - 3

In [None]:
# Chart - 3

# Count of each status
status_counts = df['Status'].value_counts()

# Plot the bar chart
plt.figure(figsize=(10, 6))
status_counts.plot(kind='bar', color=['green', 'orange', 'red'], edgecolor='black')
plt.title('Ride Request Status Distribution')
plt.ylabel('Number of Requests')
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.show()

##### 1. Why did you pick the specific chart?

This chart helps visualize the distribution of ride outcomes — how many trips were completed, cancelled, or could not be fulfilled due to unavailability of cars. It highlights performance and service gaps in the Uber system.



##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:
Shows the proportion of successful vs unsuccessful ride requests.

High number of cancellations or unavailability indicates supply or operational issues.

Helps identify the most frequent reasons for failed rides.

Indicates user experience and reliability of service.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive effects:
Gives a clear breakdown of fulfillment efficiency.

Helps Uber focus on reducing cancellations and increasing fulfillment rates.

Useful for performance monitoring and improvement.

#### Chart - 4

In [None]:
# Chart - 4

slot_order = ['Late Night', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night']


color_palette = {
    'Trip Completed': '#4CAF50',     # medium green
    'Cancelled': '#FF9800',          # vibrant orange
    'No Cars Available': '#F44336'   # standard red
}




# Plot the count of requests by time slot and status

plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Time_slots', hue='Status', order=slot_order,
              palette=color_palette)

plt.title('Ride Request Status by Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Number of Requests')
plt.legend(title='Status')
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.show()

##### 1. Why did you pick the specific chart?

This chart visualizes ride request outcomes (Completed, Cancelled, No Cars Available) across different time slots of the day. It helps identify when and why service failures occur, enabling better supply-demand management.



##### 2. What is/are the insight(s) found from the chart?

 Insights from the chart:
Highlights specific time slots where cancellations or unfulfilled requests spike.

Shows if demand during certain hours exceeds supply (e.g., early morning or evening).

Helps identify operational bottlenecks or driver shortages.

Useful for aligning driver availability with user demand trends.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Positive effects:
Helps Uber improve supply planning during critical time windows.

Identifies exact hours where user satisfaction may be low.

---
Negative effects :
Doesn’t show location of requests — Airport vs City patterns could be different

#### Chart - 5

In [None]:
# Chart - 5 visualization code

# Count requests in each time slot
slot_counts = df['Time_slots'].value_counts()

# Plot the line chart
plt.plot(slot_counts.index, slot_counts.values, marker='o')
plt.title('Requests per Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

This chart shows how ride demand varies across different time slots in a day. A line plot helps visualize the trend and flow of ride requests, making it easier to spot demand peaks and lows.

##### 2. What is/are the insight(s) found from the chart?

 Insights from the chart:

Identifies time slots with the highest and lowest ride requests.

Shows demand trends like morning or evening rush hours.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive effects:

Helps Uber allocate drivers during high-demand time slots.

Supports time-based strategy planning (e.g., incentives in low-demand hours).

Aids in scheduling marketing campaigns or fare adjustments.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

availability_counts = df['Car_Avaibility'].value_counts()

# Plot pie chart
plt.figure(figsize=(6, 6))
plt.pie(availability_counts, labels=availability_counts.index, autopct='%1.1f%%',
        colors=['#A1D99B', '#FCBBA1'], startangle=90, wedgeprops={'edgecolor': 'black'})
plt.title('Overall Car Avaibility')
plt.show()

##### 1. Why did you pick the specific chart?

1.   List item
2.   List item



This chart helps visualize how often cars were available versus not available on the platform. A pie chart is ideal here because it clearly shows the proportion of each category in the total requests.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:

Highlights the share of ride requests that could not be served due to car unavailability.

If the "Not Available" section is large, it indicates a major supply-demand gap.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Positive effects:

Provides a quick snapshot of platform reliability.

Useful for management to assess fleet adequacy

#### Chart - 7

In [None]:
# Create a simple category: Fulfilled or Unfulfilled
df['Demand_Status'] = df['Status'].apply(lambda x: 'Fulfilled' if x == 'Trip Completed' else 'Unfulfilled')

# Count how many were fulfilled vs unfulfilled
counts = df['Demand_Status'].value_counts()

# Plot the bar chart
counts.plot(kind='bar', color=['red', 'green'])
plt.title('Demand Gap')
plt.xlabel('Request Outcome')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?




To show how many ride requests were successfully completed vs not completed.

##### 2. What is/are the insight(s) found from the chart?

You can see how big the gap is between demand and supply.

If "Unfulfilled" is large, it means many users didn’t get a ride.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive effects:


Helps Uber understand when and where they’re failing to meet demand.

Useful for deciding when to add more drivers.

Improves planning and user experience.



#### Chart - 8


In [None]:
# Filter for cancelled rides (either 'Cancelled' or 'No Cars Available')
cancelled_df = df[df['Status'].isin(['Cancelled', 'No Cars Available'])]

# Group by hour of pickup and count number of cancellations
cancellations_by_hour = cancelled_df.groupby('Hour_of_Pickup').size()

# Plotting the line chart
plt.figure(figsize=(10, 5))
plt.plot(cancellations_by_hour.index, cancellations_by_hour.values, marker='o', linestyle='-', color='red')
plt.title('Number of Cancelled Rides by Hour')
plt.xlabel('Hour of the Day (0-23)')
plt.ylabel('Number of Cancellations')
plt.grid(True)
plt.xticks(range(0, 24))
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The line chart shows how cancellations vary across different hours of the day.

It helps us visually identify peak cancellation periods, which are directly tied to Uber's demand-supply mismatch problem.

##### 2. What is/are the insight(s) found from the chart?

Peak Cancellations in Early Morning (5 AM – 9 AM):

Many users request rides during this window (e.g., airport trips), but driver availability is low.



---



Another Spike in Late Evening (10 PM – Midnight):

Demand rises again, especially for return trips from offices or events.

Some drivers may log off, causing cancellations or "No Cabs" messages.

##### 3. Will the gained insights help creating a positive business impact?

Stable Mid-Day Hours:

The system performs well between late morning and early evening.

Suggests this time window has a healthy driver-to-rider ratio.

Predictable Demand Surges:

The consistent morning and evening peaks provide an opportunity.

Uber can use this to optimize driver incentives and pre-position vehicles.

#### Chart - 9 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

# Create a table showing count of each Status in each time slot
table = df.groupby(['Time_slots', 'Status']).size().unstack(fill_value=0)

# Plot heatmap
sns.heatmap(table, annot=True, cmap='Blues')
plt.title('Request Status by Time Slot')
plt.show()

##### 1. Why did you pick the specific chart?

This heatmap is used to visualize how ride request outcomes (Trip Completed, Cancelled, No Cars Available) are distributed across different time slots in a day. It helps spot peak hours of service success or failure using color intensity.

##### 2. What is/are the insight(s) found from the chart?

Insights from the chart:

Identifies time slots with high demand and high failure rates (dark colors = more requests).

Shows when most requests get cancelled or unfulfilled (e.g., Early Morning or Evening).

Helps detect under-served time periods that need more driver supply.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**1. Balance Supply with Time-Specific Demand**

Deploy more drivers during high-demand slots like Early Morning and Evening, where ride requests spike.
Use incentives to encourage driver activity during these hours.

**2. Address Pickup Point Disparities**

Ensure higher driver availability at the Airport, where most cancellations and "No Cars Available" issues are observed.
Consider dedicated airport fleet during peak times.

**3. Reduce Ride Cancellations**

Analyze why drivers cancel frequently (fare concerns, distance, waiting time).

Introduce features like:

Penalties for frequent cancellations

**4. Use Predictive Analytics**

Use historical patterns to predict high-demand slots and locations, and auto-schedule more drivers in advance.

# **Conclusion**

The analysis of Uber request data reveals significant demand-supply mismatches during peak hours, particularly in the early morning and evening time slots. A high proportion of ride cancellations and unavailability of cars at specific locations, especially the airport, highlights operational inefficiencies and the need for better fleet management.

By identifying key problem areas such as time-based and location-based service gaps, this study provides actionable insights to improve driver allocation, reduce cancellation rates, and enhance overall user satisfaction. Implementing data-driven strategies based on these findings can help Uber optimize resource utilization, improve service reliability, and achieve its business objective of delivering a seamless and efficient ride-hailing experience.