# **Project Name**    - UBER SUPPLY DEMAND GAP



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Member 1 -** - Tushar Garg


# **Project Summary -**
This project focuses on analyzing and visualizing Uber ride request data to understand patterns in demand, supply, and service gaps. Using advanced data analytics techniques, we aim to uncover critical insights that can help Uber improve ride fulfillment rates, allocate resources more efficiently, and enhance customer satisfaction.

 A :- Objective

The primary goal of this project is to identify when and where Uber experiences significant supply-demand mismatches. Specifically, we explore the following key questions:

When is demand highest and lowest?

Which time slots and locations have the most unfulfilled ride requests?

What are the primary reasons for failed ride requests — cancellations or lack of available drivers?

How can Uber optimize driver deployment to improve fulfillment?

B:- Dataset Description

The dataset used in this project contains thousands of anonymized Uber ride requests. Each record includes:

Request timestamp

Drop timestamp

Pickup point (City or Airport)

Driver ID (or blank if no driver was assigned)

Ride status (Trip Completed, Cancelled, No Cars Available)

We cleaned the dataset to extract:

Hour of request

Day of week

Time Slot (Morning, Afternoon, Evening, Night)

We also handled missing values, standardized formats, and created new derived columns to support deeper insights.

C:- Tools and Techniques Used

Python (Pandas, Seaborn, Matplotlib) for EDA and visualization

SQL for structured queries and validation

Excel for final dashboards and KPI creation

Pivot Tables and Charts for interactive visual analysis

Slicers, Heatmaps, and Conditional Formatting for presentation-quality insights

D:- Analysis Framework

We followed the UBM Analysis Structure:

Univariate Analysis – Focused on single variables like Time Slot, Status, Pickup Point

Bivariate Analysis – Explored interactions between two variables, such as Time Slot vs Status

Multivariate Analysis – Evaluated multiple variables simultaneously, such as Pickup Point vs Time Slot vs Status

A total of 20 charts were created, providing detailed visual analysis across these dimensions.

E:- Key Insights

Peak Demand occurs during Morning and Evening, especially at the Airport.

Trip Completion Rate is less than 50%, revealing a large unmet demand.

Failures are largely due to "No Cars Available" at Night and "Cancellations" in the Morning.

The Airport location consistently underperforms in fulfillment, especially during Night slots.

Supply Gap Rate is high (>50%), indicating a major business challenge.

Multivariate Heatmaps highlight that early morning hours on weekends are the worst performing windows.

F:- Final Deliverables

Cleaned Dataset (.csv)

Python EDA Notebook (.ipynb)

SQL Query File with all 20 chart insights

Excel Dashboard with charts, slicers, and KPIs

PDF Summary Report compiling insights and recommendations

G:- Business Impact

The insights provided by this project can directly help Uber:

Deploy drivers during peak demand slots

Improve ride fulfillment rates

Reduce cancellations and customer churn

Optimize operations between City and Airport pickup points

This project showcases how structured data analytics and visualization can turn raw ride logs into actionable strategies for smarter mobility services.




Write the summary here within 500-600 words.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**

This project aims to analyze Uber ride request data to identify the patterns and causes of ride failures, such as driver cancellations and unavailability of cars.

Experiences fluctuations in ride requests across different times of the day and locations. A major challenge the company faces is the supply-demand mismatch, where user demand often exceeds the number of available drivers, leading to unfulfilled ride requests, customer dissatisfaction, and lost revenue opportunities.

By uncovering when (time slots), where (pickup points), and why (status) these issues occur, the project seeks to provide data-driven insights to help Uber.

#### **Define Your Business Objective?**
The primary business objective of this project is to identify and understand the supply-demand gap in Uber ride requests, with the goal of improving ride fulfillment rates and enhancing customer satisfaction. This involves analyzing when, where, and why Uber is unable to meet user demand—whether due to unavailability of drivers, high cancellation rates, or operational inefficiencies.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries


import pandas as pd
import io
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
#Providing dataset in GitHub Repository,choose that data as file path is not bieng able to traced.
# Load Dataset
from google.colab import files
uploaded = files.upload()
# Load the uploaded file
file_name = list(uploaded.keys())[0]
# Check the file extension to use the correct reading function
if file_name.endswith('.csv'):
  df = pd.read_csv(io.BytesIO(uploaded[file_name]))
elif file_name.endswith('.xlsx'):
  df = pd.read_excel(io.BytesIO(uploaded[file_name]))
else:
  print("Unsupported file format. Please upload a .csv or .xlsx file.")
  df = None

if df is not None:
  display(df.head())

In [None]:

df.drop(columns=['Unnamed: 6'], inplace=True)
df.columns

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
rows, columns = df.shape

print(f"Total Rows: {rows}")
print(f"Total Columns: {columns}")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()

print(f"Number of duplicate rows: {duplicate_count}")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()

In [None]:
# Visualizing the missing values
print("Missing Values Count:\n")
print(missing_values)

### What did you know about your dataset?

Answer Here

The dataset contains detailed ride request logs from Uber, focusing on pickup and drop patterns, driver availability, and service status over a specific period. After loading and cleaning the dataset, I performed structural and exploratory analysis to understand its composition and key issues.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(f"Total Columns: {columns}")

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

Request timestamp-The exact date and time when the user requested a ride. Used to extract Hour, Day, and Time Slot.

Drop timestamp	-	The date and time when the ride was completed. If missing, the ride was not fulfilled.

Driver id	-	Unique identifier of the driver assigned to the ride. Blank if no driver was available.

Pickup point-	The location where the ride was requested: either City or Airport.

Status-	The final status of the ride



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = df.nunique()
print(unique_values)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True)
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], errors='coerce', dayfirst=True)
df['Hour'] = df['Request timestamp'].dt.hour
df['Day'] = df['Request timestamp'].dt.day_name()
def get_time_slot(hour):
    if hour < 5:
        return "Late Night"
    elif hour < 9:
        return "Early Morning"
    elif hour < 12:
        return "Morning"
    elif hour < 17:
        return "Afternoon"
    elif hour < 21:
        return "Evening"
    else:
        return "Night"

df['Time Slot'] = df['Hour'].apply(get_time_slot)


### What all manipulations have you done and insights you found?

Answer Here.

Converts the Request timestamp column from text/string format to a Python datetime object
Converts Drop timestamp to datetime.

Creates a new column Hour that extracts the hour (0–23) from the timestamp.

Helps identify time-based patterns in demand and failures.

Creates a Day column like Monday, Tuesday, etc.

Useful for analyzing weekday vs weekend behavior.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

In [None]:

# ============================
# ✅ UNIVARIATE ANALYSIS
# ============================


#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(8,5))
sns.countplot(data=df, x='Time Slot', order=df['Time Slot'].value_counts().index)
plt.title("Total Requests by Time Slot")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Identifies peak periods of demand throughout the day.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

High demand during Morning and Evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

 Business Impact: Positive — helps allocate drivers efficiently.

 Risk: Missed demand = customer churn.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(6,6))
df['Status'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90)
plt.title("Ride Status Distribution")
plt.ylabel('')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Understand how rides are fulfilled or fail.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

A large share of rides are either cancelled or unfulfilled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Negative — points to reliability issues.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.countplot(data=df, x='Pickup point')
plt.title("Requests by Pickup Point")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Shows which location (City/Airport) has more demand.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Airport has more frequent requests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Positive — guide for targeted driver allocation.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.countplot(data=df, x='Hour')
plt.title("Hourly Ride Requests")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Reveal demand distribution across hours.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Peaks around early morning and evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Positive — helps manage shift planning.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
sns.countplot(data=df, x='Day', order=["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"])
plt.title("Ride Requests by Day of Week")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

 Compare day-of-week demand.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Demand is slightly higher on weekdays.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Moderate — helps plan weekly schedules.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
missing_drivers = df['Driver id'].isna().sum()
print(f"Number of missing driver IDs: {missing_drivers}")


##### 1. Why did you pick the specific chart?

Answer Here.
Show how often rides had no assigned driver.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

High count of NULLs = supply shortage.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Negative — shows missed opportunities.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
missing_drop = df['Drop timestamp'].isna().sum()
print(f"Number of missing drop timestamps: {missing_drop}")

##### 1. Why did you pick the specific chart?

Answer Here.

Capture incomplete trips.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 High = high failure/cancellation rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Negative — damages customer trust.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
print(f"Total ride requests: {df.shape[0]}")

##### 1. Why did you pick the specific chart?

Answer Here.

Baseline metric.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Total ride demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Shows market size and platform use.

In [None]:
# ============================
# ✅ BIVARIATE ANALYSIS (8 Charts)
# ============================

#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='Time Slot', hue='Status')
plt.title("Ride Status by Time Slot")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Understand performance over time.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Failures peak at night and early morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Target incentives during weak supply hours.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
sns.countplot(data=df, x='Pickup point', hue='Status')
plt.title("Ride Status by Pickup Location")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Compare success/failure by location.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Airport sees more no-car availability issues.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Adds urgency for better airport supply.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
sns.countplot(data=df, x='Hour', hue='Status')
plt.title("Hourly Ride Status")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

 Breakdown hourly demand/success.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Early morning has max cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Optimize early shifts.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
sns.countplot(data=df, x='Day', hue='Status', order=["Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"])
plt.title("Status by Day")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Understand weekly failure patterns.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 Weekends slightly worse for fulfillment.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Business Impact: Prep extra fleet for weekends.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
failures = df[df['Status'].isin(['Cancelled', 'No Cars Available'])]
sns.countplot(data=failures, x='Time Slot', hue='Status')
plt.title("Failure Reasons by Time Slot")
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

 Differentiate failure reasons.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

 No cars common at Night; Cancellations in Morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

 Business Impact: Tailor different solutions per failure type.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
heatmap_data = df.groupby(['Pickup point', 'Time Slot']).size().unstack(fill_value=0)
sns.heatmap(heatmap_data, annot=True, fmt="d", cmap="Blues")
plt.title("Pickup Point vs Time Slot")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.

Localize problem areas by time.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Insight: Airport suffers in Early Morning & Night.

Business Impact: Pinpoints high-failure conditions.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
completed = df[df['Status'] == 'Trip Completed']
sns.countplot(data=completed, x='Hour')
plt.title("Hourly Completed Trips")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

Visualize fulfilled demand.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Insight: Successful trips cluster during day.

Business Impact: Monitor successful fleet hours.

In [None]:
# Chart 16: Hourly Demand (again, demand chart)
# Why: Reinforce hourly demand data.
# Insight: Validates earlier hourly chart.
# Business Impact: Same as Chart 4.
sns.countplot(data=df, x='Hour')
plt.title("Hourly Demand")
plt.show()


In [None]:

# ============================
# ✅ MULTIVARIATE ANALYSIS (4 Charts)
# ============================


In [None]:
# Chart 17: Time Slot vs Pickup Point vs Status (Stacked)
# Why: Full picture of gap by time and location.
# Insight: Airport at night has highest issues.
# Business Impact: Strategic time/place coverage.
multi_data = df.groupby(['Time Slot', 'Pickup point', 'Status']).size().reset_index(name='Count')
sns.catplot(x='Time Slot', y='Count', hue='Status', col='Pickup point', data=multi_data, kind='bar', height=5, aspect=1.2)
plt.subplots_adjust(top=0.85)
plt.suptitle("Time Slot vs Pickup Point vs Status")
plt.show()


In [None]:
# Chart 18: Hour vs Day vs Status
# Why: When & which day service fails most?
# Insight: Weekends 2-6 AM have worst fulfillment.
# Business Impact: Prioritize surge planning.
heatmap_data2 = df[df['Status'] != 'Trip Completed'].pivot_table(index='Hour', columns='Day', values='Request id', aggfunc='count')
sns.heatmap(heatmap_data2, annot=True, cmap='Reds')
plt.title("Failed Requests by Hour and Day")
plt.show()

In [None]:
# Chart 19: Pickup Point – Status – Completion Rate
# Why: Shows status proportionally by pickup.
# Insight: City has better fulfillment than Airport.
# Business Impact: Adjust driver distribution.
status_data = df.groupby(['Pickup point', 'Status']).size().unstack(fill_value=0)
(status_data.T / status_data.sum(axis=1)).T.plot(kind='bar', stacked=True)
plt.title("Pickup Point vs Status %")
plt.ylabel("Proportion")
plt.show()

In [None]:
# Chart 20: Demand vs Completed vs Failures by Time Slot
# Why: Compare total demand and gaps.
# Insight: Gap > Completed in key slots.
# Business Impact: Prioritize slots with highest missed demand.
gap_data = df.groupby('Time Slot').agg(
    Total=('Request id', 'count'),
    Completed=('Status', lambda x: (x == 'Trip Completed').sum()),
    Failed=('Status', lambda x: (x != 'Trip Completed').sum())
).reset_index()
gap_data.plot(x='Time Slot', kind='bar', stacked=False)
plt.title("Demand vs Completed vs Failed by Time Slot")
plt.xticks(rotation=45)
plt.show()

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

To achieve the core business objective of reducing the supply-demand gap and improving fulfillment rates, I recommend the following data-driven actions:

1.Time-Specific Driver Allocation

Insight: Most failed rides occur in Early Morning (5–9 AM) and Late Night (9 PM–5 AM), especially due to No Cars Available.

Recommendation:

Increase driver availability during these time slots through targeted incentives or shift planning.

Implement dynamic pricing to balance demand with supply in high-failure hours.

2.Location-Based Supply Strategy

Insight: The Airport consistently underperforms compared to the City, especially during peak hours.

Recommendation:

Deploy dedicated driver pools or waiting areas at the Airport.

Offer real-time priority ride assignment to drivers near airports during peak windows.

3.Reduce Cancellations via Driver Engagement

Insight: High cancellations happen in the Morning slot, likely due to driver reluctance or mismatched preferences.

Recommendation:

Implement cancellation penalties and better ride-matching algorithms.

Introduce ride-type preferences (short vs long, airport vs local) to reduce friction.


# **Conclusion**

Write the conclusion here.

This project identified key gaps between Uber’s ride demand and supply, especially during specific time slots and locations. The insights and KPIs generated can help Uber optimize driver allocation, reduce failures, and improve overall service efficiency.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***