# **Uber Supply-Demand Gap Analysis**    -



##### **Project Type**    - Exploratory Data Analysis (EDA)
##### **Contribution**    - Individual
##### **Team Member 1 -** - Raghvendra Gupta
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**

### Project Summary

The primary objective of this Exploratory Data Analysis (EDA) project is to understand and visualize the supply-demand gap in Uber ride requests based on time slots and pickup locations. The analysis is conducted using a real-world dataset provided by the internship company, containing information about ride request timestamps, pickup points (City or Airport), driver assignments, and trip statuses (Completed, Cancelled, No Cars Available).

The key motivation behind this analysis is to identify patterns in ride cancellations and unavailability to help Uber better allocate its driver resources and reduce customer dissatisfaction. This project is executed individually as part of the internship experience, with end goals of gaining hands-on experience in data wrangling, feature engineering, visualization, and insight generation.

The dataset was initially inconsistent, especially in timestamp formats. Cleaning involved converting request and drop timestamps to datetime formats, extracting the hour of the request, and categorizing time slots into meaningful parts of the day (e.g., Morning, Night, Evening, etc.). A new column was created to flag cancelled rides for further analysis.

Two major visualizations were created in both Google Sheets and Python to highlight the gaps:

1. **Request Status by Time Slot**: This chart revealed that trip completions are highest in the Morning and Evening, while Early Morning and Night slots show disproportionately higher cancellations and "No Cars Available" issues. The pickup point also affects this distribution ‚Äî the City faces more cancellations, while the Airport sees more unavailability.

2. **Cancellation Rate by Time Slot**: A calculated metric showing the percentage of cancelled rides per time slot. It showed that Early Morning and Morning slots have the highest cancellation rates, indicating driver shortages or low availability during those times.

Key insights derived from the analysis:
- **Early Morning and Night** slots experience the most significant supply-demand gap.
- The **Airport** has more "No Cars Available" issues, especially during off-peak hours.
- The **City** sees more ride cancellations during peak hours like Morning and Evening.
- These patterns suggest a mismatch between rider demand and driver availability based on time and location.

As a solution, Uber can consider incentivizing drivers during low-supply time slots, particularly Early Morning and Night. Better scheduling algorithms and predictive dispatching based on historical demand could also improve availability and reduce cancellations.

This project demonstrates the practical impact of EDA in real-world operational decision-making and highlights the importance of data-driven strategies in ride-hailing platforms.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


### Problem Statement

Ride-hailing platforms like Uber face a persistent challenge in balancing rider demand with driver availability. At specific times of the day and in particular locations, users often experience delays, cancellations, or unavailability of cabs due to this imbalance. This leads to poor customer experience, lost revenue opportunities, and inefficiencies in operations.

The problem at hand is to identify **when** and **where** these supply-demand gaps occur most frequently by analyzing the status of ride requests across various time slots and pickup points. Using real Uber request data, the goal is to pinpoint the time periods with the highest ride cancellations or ‚ÄúNo Cars Available‚Äù cases and understand whether location (City vs Airport) influences the gap.

By performing exploratory data analysis on this dataset, we aim to provide Uber with actionable insights to help them improve driver allocation strategies and enhance overall service reliability.

#### **Define Your Business Objective?**

### Define Business Objective

The business objective of this project is to analyze Uber ride request data to identify patterns of demand-supply mismatch across different time slots and pickup points.

By understanding when and where cancellations and "No Cars Available" incidents occur most frequently, Uber can:

- Optimize driver allocation across time slots and locations
- Minimize customer wait times and cancellations
- Improve rider satisfaction and operational efficiency
- Increase trip completion rates and revenue

This EDA will provide data-driven insights that support Uber in making strategic decisions to bridge the gap between rider demand and driver availability.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [23]:
# Import Libraries
# ----------------------------------------
# üì¶ Importing Required Libraries
# ----------------------------------------

# Data Manipulation
import pandas as pd
import numpy as np

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Date & Time Handling
from datetime import datetime

# Plot styling
sns.set(style='whitegrid')
plt.style.use("ggplot")

# Configure display settings
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print("‚úÖ Libraries imported successfully.")

‚úÖ Libraries imported successfully.


### Dataset Loading

In [24]:
# Load Dataset
# ----------------------------------------
# üìÇ Loading the Uber Request Dataset
# ----------------------------------------

# Define the file path (ensure the file is in the same directory)
file_path = "Uber Request Data.csv"

# Read the dataset
try:
    df = pd.read_csv(file_path)
    print(f"‚úÖ Dataset loaded successfully with {df.shape[0]} rows and {df.shape[1]} columns.")
except FileNotFoundError:
    print("‚ùå Error: The file was not found. Please check the file path.")
except Exception as e:
    print(f"‚ùå An unexpected error occurred while loading the file: {e}")


‚úÖ Dataset loaded successfully with 6745 rows and 6 columns.


### Dataset First View

In [25]:
# Dataset First Look
# ----------------------------------------
# First Look at the Dataset
# ----------------------------------------

# Display the first 5 rows of the dataset
df.head()


Unnamed: 0,Request id,Pickup point,Driver id,Status,Request timestamp,Drop timestamp
0,619,Airport,1.0,Trip Completed,11/7/2016 11:51,11/7/2016 13:00
1,867,Airport,1.0,Trip Completed,11/7/2016 17:57,11/7/2016 18:47
2,1807,City,1.0,Trip Completed,12/7/2016 9:17,12/7/2016 9:58
3,2532,Airport,1.0,Trip Completed,12/7/2016 21:08,12/7/2016 22:03
4,3112,City,1.0,Trip Completed,13-07-2016 08:33:16,13-07-2016 09:25:47


### Dataset Rows & Columns count

In [26]:
# Dataset Rows & Columns count
# ----------------------------------------
# Dataset Dimensions: Rows & Columns
# ----------------------------------------

# Using .shape to get number of rows and columns
num_rows, num_cols = df.shape

print(f"‚úÖ The dataset contains:")
print(f"‚û°Ô∏è {num_rows:,} rows")
print(f"‚û°Ô∏è {num_cols} columns")


‚úÖ The dataset contains:
‚û°Ô∏è 6,745 rows
‚û°Ô∏è 6 columns


### Dataset Information

In [27]:
# Dataset Info
# ----------------------------------------
# Dataset Information Summary
# ----------------------------------------

# Display concise summary of the dataset
df.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 6 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Request id         6745 non-null   int64  
 1   Pickup point       6745 non-null   object 
 2   Driver id          4095 non-null   float64
 3   Status             6745 non-null   object 
 4   Request timestamp  6745 non-null   object 
 5   Drop timestamp     2831 non-null   object 
dtypes: float64(1), int64(1), object(4)
memory usage: 316.3+ KB


#### Duplicate Values

In [28]:
# Dataset Duplicate Value Count
# ----------------------------------------
# Checking for Duplicate Rows
# ----------------------------------------

# Count duplicate records
duplicate_count = df.duplicated().sum()

if duplicate_count == 0:
    print("‚úÖ No duplicate rows found in the dataset.")
else:
    print(f"‚ö†Ô∏è Found {duplicate_count:,} duplicate rows. You may consider removing them.")


# Drop duplicate rows (if any)
df.drop_duplicates(inplace=True)
print("‚úÖ Duplicate rows removed.")


‚úÖ No duplicate rows found in the dataset.
‚úÖ Duplicate rows removed.


#### Missing Values/Null Values

In [29]:
# Missing Values/Null Values Count

In [30]:
# Visualizing the missing values

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [31]:
# Dataset Columns

In [32]:
# Dataset Describe

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [33]:
# Check Unique Values for each variable.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [34]:
# Write your code to make your dataset analysis ready.

### What all manipulations have you done and insights you found?

Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [35]:
# Chart - 1 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 2

In [36]:
# Chart - 2 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 3

In [37]:
# Chart - 3 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 4

In [38]:
# Chart - 4 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 5

In [39]:
# Chart - 5 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 6

In [40]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [41]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [42]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [43]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [44]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [45]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [46]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [47]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [48]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [49]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

# **Conclusion**

Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***