# **Project Name**    -



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Siddesh Keshav Vaishnav**


# **Project Summary -**

This project aims to analyze Uber ride request data to uncover operational inefficiencies, demand–supply imbalances, and patterns in user behavior. The dataset captures ride requests over a specific period, detailing time, pickup point, ride status, and driver availability.

Through exploratory data analysis (EDA), we investigated key variables such as request timestamps, pickup locations (Airport and City), ride statuses (Trip Completed, Cancelled, No Cars Available), and time-based trends. By categorizing request times into logical time slots (Late Night, Early Morning, Morning, etc.), we visualized ride distributions, peak demand windows, and failure patterns.

Python was used for data cleaning, transformation, and visualizations with libraries like Pandas, Matplotlib, and Seaborn. An in-memory SQLite database was also created to run structured queries for deeper insights. In parallel, Excel was used to build pivot-based dashboards to present findings from a business perspective.

# **GitHub Link -**


https://github.com/SIDDUPAAJI/uber-supply-demand-analysis

# **Problem Statement**


**Uber faces operational challenges in balancing rider demand and driver availability, particularly during peak hours. This imbalance results in increased ride cancellations, unfulfilled ride requests, and customer dissatisfaction.**

#### **Define Your Business Objective?**

To analyze Uber ride request data and uncover key insights into demand-supply gaps, cancellation patterns, and service inefficiencies. The goal is to provide actionable recommendations that enhance resource allocation, reduce failed rides, and optimize operations during critical time windows.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Done in code
import pandas as pd
import numpy as np
import sqlite3
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

### Dataset Loading

In [None]:
# Loaded using pandas read_csv()


### Dataset First View

In [None]:
Used df.head() to preview data structure.

### Dataset Rows & Columns count

In [None]:
Data shape confirmed with df.shape

### Dataset Information

In [None]:
Used df.info() to examine column data types and null counts.

#### Duplicate Values

In [None]:
Checked using df.duplicated().sum() — no significant duplicates found

#### Missing Values/Null Values

In [None]:
Parsed timestamps properly and dropped rows with null Request timestamp.

### What did you know about your dataset?

The dataset includes ~6.7K Uber ride requests, with information on request/drop timestamps, pickup location (City or Airport), ride status, and driver availability. It’s time-based and categorical, making it suitable for structured EDA.

## ***2. Understanding Your Variables***

In [None]:
#Dataset Columns
Request ID, Pickup point, Driver ID, Status, Request timestamp, Drop timestamp

In [None]:
#Dataset Describe
Statistical summary used to check distributions.

### Variables Description

.Pickup point: Origin of request (City or Airport)

.Status: Ride outcome (Trip Completed, Cancelled, No Cars Available)

.Request timestamp: Time of booking request

.Drop timestamp: Time of trip completion

.Request hour: Extracted from request timestamp

.Time slot: Mapped from hour to a named bucket (Morning, Night, etc.)

### Check Unique Values for each variable.

In [None]:
Used df[column].nunique() and value_counts() for categoricals.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
.Parsed datetime columns

.Extracted hour, created time slot

.Standardized Status and Pickup point formatting

.Removed rows with nulls in essential columns

### What all manipulations have you done and insights you found?

.Converted timestamps into usable time slots for analysis

.Standardized inconsistent text entries

.Discovered trends like early morning cancellations and night-time availability issues

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 Trip Completions by Time Slot
query = """
SELECT [Time slot] AS slot, COUNT(*) AS total
FROM uber_data
GROUP BY [Time slot]
ORDER BY total DESC
"""
df1 = pd.read_sql(query, conn)

sns.barplot(data=df1, x='slot', y='total', palette='Blues')
plt.title('Total Requests by Time Slot')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To understand request distribution across the day.

##### 2. What is/are the insight(s) found from the chart?

Most requests come during Morning and Evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps identify peak workload hours for resource planning.

#### Chart - 2

In [None]:
# Chart - 2 Request Status by Time Slot
query = """
SELECT [Time slot], Status, COUNT(*) AS total
FROM uber_data
GROUP BY [Time slot], Status
"""
df2 = pd.read_sql(query, conn)

plt.figure(figsize=(12,6))
sns.barplot(data=df2, x='Time slot', y='total', hue='Status', palette='Set2')
plt.title('Status Distribution Across Time Slots')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To compare ride outcome trends across time buckets.

##### 2. What is/are the insight(s) found from the chart?

Cancellations and no-cars spike during Early Morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps target time-based incentives for drivers.

#### Chart - 3

In [None]:
# Chart - 3 Status by Pickup Point
query = """
SELECT [Pickup point], Status, COUNT(*) AS total
FROM uber_data
GROUP BY [Pickup point], Status
"""
df3 = pd.read_sql(query, conn)

plt.figure(figsize=(8,6))
sns.barplot(data=df3, x='Pickup point', y='total', hue='Status', palette='pastel')
plt.title('Request Status by Pickup Point')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To evaluate service performance at different locations.

##### 2. What is/are the insight(s) found from the chart?

Airport has significantly higher cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Justifies location-specific driver deployment.

#### Chart - 4

In [None]:
# Chart - 4 Heatmap of Completed Trips
query = """
SELECT [Pickup point], [Time slot], COUNT(*) AS total
FROM uber_data
WHERE Status = 'Trip Completed'
GROUP BY [Pickup point], [Time slot]
"""
df4 = pd.read_sql(query, conn)

pivot = df4.pivot(index='Pickup point', columns='Time slot', values='total').fillna(0)

plt.figure(figsize=(10,6))
sns.heatmap(pivot, annot=True, fmt='d', cmap='YlGnBu')
plt.title('Completed Trips Heatmap')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To visualize high-performing time-location combos.

##### 2. What is/are the insight(s) found from the chart?

Evening City pickups have the highest completion rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Can be used to realign driver shifts with success hotspots.

#### Chart - 5

In [None]:
# Chart - 5 Hourly Requests
query = """
SELECT [Request hour] AS hour, COUNT(*) AS total
FROM uber_data
GROUP BY hour
ORDER BY hour
"""
df5 = pd.read_sql(query, conn)

sns.barplot(data=df5, x='hour', y='total', color='coral')
plt.title('Requests by Hour')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To confirm raw hour-level trends in request frequency

##### 2. What is/are the insight(s) found from the chart?

Request volume peaks between 6 AM–10 AM and 5 PM–9 PM

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Backs up planning for hourly dispatch optimization.

#### Chart - 6

In [None]:
# Chart - 6 Ride Status Pie Chart
query = """
SELECT Status, COUNT(*) AS total
FROM uber_data
GROUP BY Status
"""
df6 = pd.read_sql(query, conn)

plt.figure(figsize=(6,6))
plt.pie(df6['total'], labels=df6['Status'], autopct='%1.1f%%',
        colors=sns.color_palette('pastel'))
plt.title('Ride Status Breakdown')
plt.show()


##### 1. Why did you pick the specific chart?

To summarize ride outcome distribution.

##### 2. What is/are the insight(s) found from the chart?

About 55% completed, 20% cancelled, 25% no cars.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Identifies pain points for service improvement.

#### Chart - 7

In [None]:
# Chart - 7 Cancellations by Time Slot

query = """
SELECT [Time slot] AS slot, COUNT(*) AS cancelled
FROM uber_data
WHERE Status = 'Cancelled'
GROUP BY [Time slot]
"""
df7 = pd.read_sql(query, conn)

sns.barplot(data=df7, x='slot', y='cancelled', color='salmon')
plt.title('Cancellations by Time Slot')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To pinpoint time windows with cancellation spikes.

##### 2. What is/are the insight(s) found from the chart?

Early Morning shows disproportionately high cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Opportunity for targeted driver bonuses or notifications.

#### Chart - 8

In [None]:
# Chart - 8 vNo Cars by Time Slot
query = """
SELECT [Time slot] AS slot, COUNT(*) AS no_cars
FROM uber_data
WHERE Status = 'No Cars Available'
GROUP BY [Time slot]
"""
df8 = pd.read_sql(query, conn)

sns.barplot(data=df8, x='slot', y='no_cars', color='slateblue')
plt.title('No Cars Available by Time Slot')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To find when rider demand exceeds car supply.

##### 2. What is/are the insight(s) found from the chart?

No cars available is worst during Late Night and Morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Could trigger surge pricing or dynamic routing algorithms.

#### Chart - 9

In [None]:
# Chart - 9 Completion Rate by Time Slot
query = """
SELECT [Time slot] AS slot,
    ROUND(SUM(CASE WHEN Status = 'Trip Completed' THEN 1 ELSE 0 END)*1.0 / COUNT(*), 2) AS completion_rate
FROM uber_data
GROUP BY [Time slot]
"""
df9 = pd.read_sql(query, conn)

sns.barplot(data=df9, x='slot', y='completion_rate', color='mediumseagreen')
plt.title('Trip Completion Rate by Time Slot')
plt.ylim(0, 1)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

To compute and visualize fulfillment efficiency.

##### 2. What is/are the insight(s) found from the chart?

Completion rate dips sharply during Early Morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Reinforces need for availability incentives in low-rate slots.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

.Implement incentives for drivers during high-cancellation hours (Early Morning, Late Night)

.Use real-time heatmaps to position drivers efficiently

.Offer predictive alerts to customers about delays in low-availability zones

.Improve onboarding of drivers for Airport pickups during early hours

# **Conclusion**

This EDA uncovered Uber’s most critical supply-demand gaps using a combination of time-based, location-based, and outcome-driven analysis. With just 9 targeted visualizations and SQL-backed insights, we identified the when, where, and why of missed ride opportunities — and how Uber can address them with operational tweaks and policy changes.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***