# **Project Name**    -   Exploratory Data Analysis of Uber Cab Requests - Demand–Supply Gap Analysis


##### **Project Type**    - Exploratory Data Analysis (EDA) & Business Intelligence Dashboard
##### **Contribution**    - Individual
##### **Author**          - Bhavesh Kumar

# **Project Summary**

The goal of this project is to perform exploratory data analysis (EDA) on an Uber cab request dataset, with a focus on understanding demand–supply gaps, trip fulfillment performance, and operational inefficiencies across time and pickup locations.
The analysis aims to uncover patterns related to request volume, trip outcomes, peak demand hours, and failure reasons to support data-driven decision-making.<br>

#### **Key Steps**<br>
**Data Collection and Cleaning :**

The project utilizes an Uber cab request dataset containing 6,745 records and 6 variables related to trip requests, pickup locations, time details, and trip outcomes.
Data cleaning involved inspecting the dataset structure, validating data types, handling missing values in driver and drop-time fields (which were valid for unfulfilled trips), and standardizing categorical variables to ensure consistency.

**Data Preprocessing and Transformation :**

Date and time columns were converted into appropriate datetime formats to enable temporal analysis.
Additional features such as request hour and day of week were derived to analyze demand patterns across time.
Indicator variables were created to support failure rate and completion rate analysis.

**Exploratory Data Analysis and Visualization :**

A combination of univariate, bivariate, and multivariate analysis was performed using Python visualization libraries (matplotlib and seaborn).
Visualizations included bar charts, line plots, stacked columns, and rate-based charts to examine relationships between demand, supply, trip status, time, and pickup location.

### **Insights**


- Demand peaks during morning and evening hours, while supply does not scale proportionally during these periods.

- No Cars Available is the primary reason for trip failures, exceeding user-initiated cancellations.

- Airport pickups consistently experience higher failure rates compared to city pickups.

- Completion rates decline significantly during peak demand hours, indicating operational bottlenecks.

- Failure rates vary by hour, highlighting specific high-risk periods that require intervention.

### **Business Impact**

This EDA provides valuable insights into Uber’s operational challenges related to demand–supply imbalance.
The findings can help stakeholders improve driver allocation strategies, design time-based incentives, optimize airport operations, and reduce trip failures.
Overall, the analysis demonstrates how data-driven insights can enhance service reliability, customer satisfaction, and operational efficiency.

# **GitHub Link**

https://github.com/bhaveshk-25/uber-demand-supply-analysis.git

# **Problem Statement**


Uber often faces situations where customer demand exceeds available supply, leading to cancellations and unfulfilled requests.

The objective of this project is to analyze Uber trip request data to identify when and where supply shortages occur, understand reasons for trip failures, and provide actionable insights through dashboards.

#### **Define Your Business Objective?**

- Identify demand and supply patterns across time

- Analyze trip failure reasons (Cancelled, No Cars Available)

- Compare performance between Airport and City pickups

- Identify peak hours with high failure rates

- Provide actionable insights for operational improvement using dashboards

# **General Guidelines**  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
df = pd.read_csv("Uber Request Data.csv")

### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
df.shape

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
df.isnull().sum()

### What did you know about your dataset?

After the initial data inspection, it is clear that the dataset is structured and well-formed, with 6,745 records capturing Uber trip requests.
The data contains meaningful time, location, and status variables, and while a few fields (Driver ID and Drop Timestamp) have missing values, these are expected and valid for unfulfilled trips, indicating no major data quality issues at this stage.

## ***2. Understanding Your Variables***

In [None]:
df.columns

In [None]:
df.describe()

### Variables Description

| Variable Name	   |  Description                                                                                |
|------------------|---------------------------------------------------------------------------------------------|
| Request id	   |   Unique identifier for each Uber cab request                                               |
| Pickup point	   |  Location of pickup, either City or Airport                                                 |
| Driver id	       |  Unique identifier of the driver assigned to the trip (missing when no driver was available)|
| Status           |  Outcome of the request: Trip Completed, Cancelled, or No Cars Available                    |
| Request timestamp|  Date and time of request                                                                   |
| Drop timestamp   |  Date and time of drop                                                                      |

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
categorical_cols = df.select_dtypes(include='object').columns

for col in categorical_cols:
    print(f"{col}: {df[col].unique()}")

### **Dataset Description**
The dataset contains 6,745 records and 6 columns representing Uber cab requests.
Initial exploration was performed to understand the data structure, column types, missing values, and categorical distributions.
This step helped identify key variables such as trip status, pickup location, and time-based fields for further analysis.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Format timestamps columns to datetime
df['Request timestamp'] = pd.to_datetime(
    df['Request timestamp'],
    dayfirst=True,
    format='mixed'
)

df['Drop timestamp'] = pd.to_datetime(
    df['Drop timestamp'],
    dayfirst=True,
    format='mixed'
)

In [None]:
# Separate date and time from Request and Drop timestamps
df["Request Date"] = df["Request timestamp"].dt.date
df["Request Time"] = df["Request timestamp"].dt.time
df["Drop Date"] = df["Drop timestamp"].dt.date
df["Drop Time"] = df["Drop timestamp"].dt.time

In [None]:
# Extracting hour and day of Request
df['Request hour'] = df['Request timestamp'].dt.hour
df['Request day'] = df['Request timestamp'].dt.day_name()

In [None]:
# Droping Request and Drop timestamps as they are separated
df.drop(columns=["Request timestamp","Drop timestamp"],inplace = True)

In [None]:
# Adding a column 'Trip Completed' which shows 1 if trip completed else 0
df["Trip Completed"] = np.where(df["Status"] == "Trip Completed",1,0)

In [None]:
# Updated columns with proper formats
df.head()

In [None]:
# Final cleaned dataset for analysis
df.shape

### What all manipulations have you done and insights you found?

- Data wrangling was performed to prepare the dataset for analysis.
- Date and time columns were converted into appropriate datetime formats, and new time-based features such as Request Hour and Day of Week were created to support temporal analysis.
- Categorical values were standardized, and missing values were checked to ensure data quality.
- These transformations made the dataset suitable for exploratory data analysis and dashboard creation.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:

plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Status')
plt.title('Distribution of Trip Status')
plt.xlabel('Trip Status')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

To understand the overall outcome of Uber requests and quantify how many trips are successfully completed versus failed.

##### 2. What is/are the insight(s) found from the chart?

- A significant portion of requests are not completed.

- Trip failures form a noticeable share of total demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. This insight highlights operational inefficiencies and helps Uber focus on improving fulfillment rates.

Negative growth insight:
High failure volume indicates lost revenue and poor customer experience if not addressed

#### Chart - 2

In [None]:
plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Pickup point')
plt.title('Requests by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

To compare demand distribution between City and Airport pickup locations.

##### 2. What is/are the insight(s) found from the chart?

- Airport contributes a substantial share of total requests.

- Demand is not evenly distributed across locations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Helps optimize driver deployment by location.

#### Chart - 3

In [None]:
hourly_demand = df['Request hour'].value_counts().sort_index()

plt.figure(figsize=(8,4))
sns.lineplot(x=hourly_demand.index, y=hourly_demand.values)
plt.title('Hourly Demand Pattern')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

To identify peak and off-peak demand hours throughout the day.

##### 2. What is/are the insight(s) found from the chart?

- Demand peaks during morning and evening hours.

- Demand drops significantly during late night and mid-day hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Enables better shift planning, surge pricing, and driver incentives.

Negative growth insight:
Failure to scale supply during peak hours results in missed revenue opportunities.

#### Chart - 4

In [None]:
plt.figure(figsize=(7,4))
sns.countplot(data=df, x='Request day',
              order=df['Request day'].value_counts().index)
plt.title('Requests by Day of Week')
plt.xlabel('Day')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

- To analyze how demand varies across different days of the week.

##### 2. What is/are the insight(s) found from the chart?

- Certain weekdays experience higher request volumes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Yes. Supports weekly driver scheduling and demand forecasting.

#### Chart - 5

In [None]:
status_pickup = pd.crosstab(df['Pickup point'], df['Status'])

status_pickup.plot(kind='bar', stacked=True, figsize=(7,5))
plt.title('Trip Status by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

- To compare trip outcomes across City and Airport locations.

##### 2. What is/are the insight(s) found from the chart?

- Airport pickups show a higher proportion of failed trips.

- City pickups have better completion performance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Yes. Highlights location-specific operational issues requiring targeted action.

#### Chart - 6

In [None]:
demand_supply = df.groupby('Request hour').agg(
    total_requests=('Request id', 'count'),
    completed_trips=('Status', lambda x: (x == 'Trip Completed').sum()))

plt.figure(figsize=(8,4))
plt.plot(demand_supply.index, demand_supply['total_requests'], label='Total Requests')
plt.plot(demand_supply.index, demand_supply['completed_trips'], label='Completed Trips')
plt.title('Demand vs Completed Trips by Hour')
plt.xlabel('Hour')
plt.ylabel('Number of Requests')
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

- To directly compare customer demand with actual trip fulfillment over time.

##### 2. What is/are the insight(s) found from the chart?

- Completed trips do not scale proportionally with demand during peak hours.

- A visible demand–supply gap exists.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Yes. Helps identify hours requiring immediate operational intervention.

- Negative growth insight:  Persistent supply gaps during peak demand lead to revenue loss and customer churn.

#### Chart - 7

In [None]:
failure_hour = df[df['Status'] != 'Trip Completed'] \
    .groupby(['Request hour','Status']).size().unstack(fill_value=0)

failure_hour.plot(kind='bar', stacked=True, figsize=(9,5))
plt.title('Failure Analysis by Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Failed Requests')
plt.show()

##### 1. Why did you pick the specific chart?

- To identify when trip failures occur and understand their timing.

##### 2. What is/are the insight(s) found from the chart?

- Failures peak during high-demand hours.

- No Cars Available is the dominant failure reason.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

- Yes. Helps Uber proactively manage risk hours.

- Negative growth insight:
High failure concentration during peak hours directly affects customer trust.

#### Chart - 8

In [None]:
failure_rate = df.groupby('Request hour').apply(
    lambda x: (x['Status'] != 'Trip Completed').sum() / len(x) * 100)

plt.figure(figsize=(8,4))
sns.lineplot(x=failure_rate.index, y=failure_rate.values)
plt.title('Failure Rate (%) by Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Failure Rate (%)')
plt.show()

##### 1. Why did you pick the specific chart?

- To normalize failures relative to demand and identify high-risk hours.

##### 2. What is/are the insight(s) found from the chart?**

- Some hours have disproportionately high failure rates.

- Failure probability varies significantly across the day.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

- Yes. Enables preventive actions such as targeted incentives.

#### Chart - 9

In [None]:
pivot_multi = pd.pivot_table(
    df,
    values='Request id',
    index='Request hour',
    columns=['Pickup point', 'Status'],
    aggfunc='count',
    fill_value=0
)

pivot_multi.plot(figsize=(10,5))
plt.title('Trip Status by Hour and Pickup Point')
plt.xlabel('Hour')
plt.ylabel('Number of Requests')
plt.show()

##### **1. Why did you pick the specific chart?**

- To analyze the combined impact of time and location on trip outcomes.

##### **2. What is/are the insight(s) found from the chart?**

- Airport failures intensify during peak hours.

- Time and location together significantly influence performance.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

- Yes. Supports location-specific planning during critical hours.

#### Chart - 10

In [None]:
# Calculate completion rate by hour and pickup point
completion_rate = (
    df.assign(completed = df['Status'] == 'Trip Completed')
      .groupby(['Request hour', 'Pickup point'])
      .agg(completion_rate_pct=('completed', 'mean'))
      .reset_index()
)

# Plot
plt.figure(figsize=(9,5))
sns.lineplot(
    data=completion_rate,
    x='Request hour',
    y='completion_rate_pct',
    hue='Pickup point'
)

plt.title('Completion Rate by Hour and Pickup Point')
plt.xlabel('Hour of Day')
plt.ylabel('Completion Rate')
plt.ylim(0, 1)
plt.show()

##### **1. Why did you pick the specific chart?**

- To evaluate service quality by comparing completion rates instead of raw counts.

##### **2. What is/are the insight(s) found from the chart?**

- Completion rates drop during peak hours.

- Airport underperforms in evening and late-night compared to City pickups.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

- Low completion rates during high-demand periods indicate poor service reliability and lost revenue.

Answer Here

## **5. Solution to Business Objective**

#### **What do you suggest the client to achieve Business Objective ?**

Based on the exploratory data analysis and visualizations, the primary business challenge identified is a demand–supply mismatch, particularly during peak hours and at airport pickup locations. Most trip failures are driven by vehicle unavailability rather than user cancellations.

To address this, Uber should:

- Increase driver availability during peak hours by introducing time-based incentives and surge pricing.

- Deploy location-specific strategies for airport pickups, such as dedicated airport driver pools or priority queuing.

- Use failure-rate metrics to proactively identify high-risk hours and adjust supply before service degradation occurs.

- Continuously monitor completion and failure rates through dashboards to enable real-time operational decisions.

Implementing these solutions can reduce trip failures, improve customer satisfaction, and increase completed trips, directly contributing to positive business growth and operational efficiency.

# **Conclusion**

This project successfully analyzed Uber’s cab request data to identify key demand–supply gaps and operational inefficiencies. Through exploratory data analysis, it was observed that trip failures are primarily driven by vehicle unavailability, especially during peak demand hours and at airport pickup locations.

The analysis highlighted clear temporal and location-based patterns affecting trip completion rates. By leveraging Python for data preparation, SQL for analytical insights, and Excel for interactive dashboarding, the project demonstrates how data-driven analysis can support better resource allocation, improved service reliability, and enhanced customer experience.

Overall, the insights derived from this analysis can help Uber optimize driver deployment, reduce cancellations, and improve fulfillment rates, contributing to sustainable operational and business growth.