<a href="https://colab.research.google.com/github/Abhinair26/Uber-Supply-and-Demand/blob/main/EDA_Uber_Supply_and_Demand_(Abhijith_B_Nair).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Uber Supply and Demand



##### **Project Type**    - EDA
##### **Contribution**    - Abhijith B Nair


# **Project Summary -**



---

**Project Summary: Uber Supply-Demand Gap Analysis**

This project focuses on analyzing Uber request data to identify and understand the supply-demand gap in ride availability. The objective is to uncover patterns in customer ride requests, cancellations, and driver availability, ultimately offering data-driven insights and recommendations to improve Uber’s service operations. The analysis is based on a real-world dataset containing records of ride requests made to Uber, including key variables such as pickup point, request timestamp, drop-off timestamp, ride status, and more.

---







# **GitHub Link -**

https://github.com/Abhinair26/Uber-Supply-and-Demand

# **Problem Statement**


The rapid growth of ride-hailing services like Uber has revolutionized urban transportation, offering convenience, affordability, and flexibility to millions of users. However, maintaining a seamless user experience depends heavily on the balance between rider demand and driver availability. When demand exceeds supply, users face longer wait times, increased cancellation rates, and potential revenue loss for the platform.

This project aims to analyze and understand the supply-demand gap in Uber requests using real-world data. The dataset contains information about ride requests made to Uber in a metropolitan area, including timestamps, pickup points, driver availability status, and trip status (completed, cancelled, or no cars available).

#### **Define Your Business Objective?**

The core objective is to:

1.   Identify patterns and trends in trip requests by time of day, day of the week, and pickup location.
2.   Detect peak demand hours and areas where the supply of drivers consistently fails to meet user demand.

1.   Analyze the reasons behind unfulfilled requests, such as high cancellation rates or no car availability.
2.  Generate actionable insights that can help Uber optimize driver distribution and improve customer satisfaction.






# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Importing necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load the dataset
df = pd.read_csv('Uber Request Data.csv')

### Dataset First View

In [None]:
# Show initial info
print("Initial DataFrame Info:")
print(df.info())
print("\nMissing values:\n", df.isnull().sum())

### Dataset Information

In [None]:
# Extract features
df['Request hour'] = df['Request timestamp'].dt.hour
df['Request day'] = df['Request timestamp'].dt.date
df['Request month'] = df['Request timestamp'].dt.month
df['Request weekday'] = df['Request timestamp'].dt.day_name()
df['Time slot'] = pd.cut(df['Request hour'],
                         bins=[0, 4, 8, 12, 16, 20, 24],
                         labels=['Late Night', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night'],
                         right=False)


**Converting values**

In [None]:
# Convert dates
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True, errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], dayfirst=True, errors='coerce')

#### Missing Values/Null Values

In [None]:
# Drop rows with missing request timestamps
df = df.dropna(subset=['Request timestamp'])

### What did you know about your dataset?


---

## 📊 **Dataset Summary: Uber Request Data**

This dataset captures **ride request information** from **Uber** in **Bangalore** over a period of **several days in July 2016**. It is intended for analyzing operational challenges and service performance, particularly at two major locations: **Airport** and **City**.

---

### 📁 **File Name:**

`Uber Request Data.csv`

### 🔢 **Rows (Records):**

6745

### 🧾 **Columns and Description:**

| Column Name         | Description                                                                            |
| ------------------- | -------------------------------------------------------------------------------------- |
| `Request id`        | Unique identifier for each ride request                                                |
| `Pickup point`      | Location from where the ride was requested – either **City** or **Airport**            |
| `Driver id`         | Unique identifier for the driver (may be blank if no driver was assigned)              |
| `Status`            | Final status of the ride – **Trip Completed**, **Cancelled**, or **No Cars Available** |
| `Request timestamp` | Time when the ride was requested                                                       |
| `Drop timestamp`    | Time when the trip ended (only present if the trip was completed)                      |

---

## 🔍 **What We Can Understand from the Dataset**

### ✅ **Primary Objectives & Analysis Goals:**

* Identify **supply-demand gaps**.
* Analyze **driver availability patterns**.
* Understand **ride request behavior** at different times of the day.
* Study cancellation trends and “No cars available” scenarios.
* Derive **actionable insights** for operations improvement.

---


## ***2. Understanding Your Variables***

### Variables Description


---

### 🚕 **Uber Request Data - Variable Description**

| **Variable**        | **Description**                                                                                       |
| ------------------- | ----------------------------------------------------------------------------------------------------- |
| `Request id`        | A unique identifier assigned to each ride request.                                                    |
| `Pickup point`      | The pickup location category for the request: either `City` or `Airport`.                             |
| `Driver id`         | The unique ID of the driver who accepted the request (blank if not assigned).                         |
| `Status`            | The current status of the request:<br> - `Trip Completed`<br> - `Cancelled`<br> - `No Cars Available` |
| `Request timestamp` | The date and time when the ride was requested by the customer.                                        |
| `Drop timestamp`    | The date and time when the trip was completed (or ended).                                             |

---



## 3. ***Data Wrangling***

### What all manipulations have you done and insights you found?


---

## ✅ **Data Manipulations Performed**

### 1. **Date-Time Cleaning**

* Standardized both `Request timestamp` and `Drop timestamp` into datetime format.
* Extracted new columns:

  * `Request Date`
  * `Request Time`
  * `Request Hour`
  * `Time Slot` (e.g., Morning, Afternoon, Evening, Night)

### 2. **Missing Data Handling**

* Found and handled missing `Driver id` and `Drop timestamp` entries.
* Marked null `Driver id` values as part of the `No Cars Available` issue.

### 3. **Derived Columns**

* Created `Trip Duration` by subtracting `Request timestamp` from `Drop timestamp` (where applicable).
* Segmented `Pickup point` into `City` and `Airport`.
* Added `Day of Week` and `Hour Bucket` for analysis.

---

## 📊 **Insights from Exploratory Data Analysis (EDA)**

### 1. **Request Patterns**

* **Most ride requests** were made during **morning (5–10 AM)** and **evening (5–9 PM)** hours.
* **High number of requests** were made **from the city to the airport in the morning**, and **from the airport to the city in the evening**.

### 2. **Supply-Demand Gap**

* There’s a **significant gap between demand and completed trips** in peak hours.

  * **Morning Peak (City → Airport)**: Large number of requests, but many were marked as **"No Cars Available"**.
  * **Evening Peak (Airport → City)**: High number of **"Cancelled"** trips.

### 3. **Cancellation and Availability Issues**

* **Cancellations** were highest in **airport to city** trips during **evening hours**.
* **No cars available** was most common in **city to airport** trips during **morning hours**.

### 4. **Driver Behavior Insight**

* Many requests lacked a `Driver id`, showing **drivers were not accepting trips** during high-demand times.
* Suggests **imbalanced supply** — possibly due to shift timing preferences or ride direction profitability.

---

## 📌 Key Problems Identified

| Problem               | Time              | Pickup Point | Status          |
| --------------------- | ----------------- | ------------ | --------------- |
| 🚫 No Cars Available  | Morning (5–10 AM) | City         | Supply Shortage |
| 🚫 High Cancellations | Evening (5–9 PM)  | Airport      | Driver Refusal  |

---


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# 1. Plot request status distribution
plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Status', palette='Set2')
plt.title('Request Status Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

I selected the count plot (bar chart using sns.countplot) for visualizing the Request Status Distribution because it is the most effective way to show how frequently each status category occurs in a categorical variable like Status.

##### 2. What is/are the insight(s) found from the chart?

📊 Insights from this chart:
High number of incomplete trips:

The number of trips marked as Cancelled or No Cars Available is significantly high, indicating operational inefficiencies or supply-demand mismatch.

Low fulfillment rate:

If Trip Completed is significantly lower than the other statuses, it suggests that many customers are unable to get rides, possibly due to driver unavailability.

Dominant failure mode:

By comparing the heights of the bars for Cancelled vs No Cars Available, you can identify which issue is more frequent:

A high number of Cancelled trips could imply rider impatience or driver behavior.

A high number of No Cars Available trips may reflect a lack of available drivers during peak hours or in certain locations.

# **Will the gained insights help creating a positive business impact?**


---

### ✅ 1. **High Number of Cancellations and “No Cars Available” Requests**

**Business Impact**:

* **Operational Optimization**: Indicates either a **supply-demand mismatch** or a poor driver allocation algorithm.
* **Actionable Strategy**: Uber can incentivize drivers during peak hours or adjust dynamic pricing to meet demand.
* **Business Result**: Increases **ride completion rate** and **revenue**, reduces **customer churn**.

---

### ✅ 2. **Time Slot Analysis (Hourly Trend)**

**Insight**: Peaks in early morning (e.g., 5–9 AM) and evening (e.g., 5–9 PM).
**Business Impact**:

* **Workforce Planning**: Better planning of driver shifts during peak hours.
* **Customer Experience**: Reduces waiting time and trip cancellations.
* **Business Result**: Boosts **customer satisfaction** and loyalty.

---

### ✅ 3. **Location-Based Insights (Pickup Point)**

**Insight**: City vs. Airport trips behave differently in status.
**Business Impact**:

* **Geo-Targeted Driver Allocation**: Allocate more drivers to high-demand or high-cancellation areas like the airport.
* **Business Result**: Improves **efficiency**, enhances **profit per trip**.

---

### ✅ 4. **Trip Status Patterns**

**Insight**: "No Cars Available" more common during night hours.
**Business Impact**:

* **Incentive Programs**: Special bonuses for night-shift drivers to stay online.
* **Business Result**: Converts missed opportunities into completed rides, increasing **utilization rate**.

---

### ✅ 5. **Data-Driven Policy Making**

* Example: If cancellations are mostly driver-initiated, policies can penalize unnecessary cancellations or introduce passenger verification steps.

---



#### Chart - 2

In [None]:
# 2. Plot frequency by pickup point
plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set1')
plt.title('Pickup Point vs Status')
plt.show()

##### 1. Why did you pick the specific chart?

I chose the countplot with hue to plot "Pickup point" vs "Status" because it is the most effective and intuitive way to compare categorical data distributions, especially when segmented by a secondary category.

##### 2. What is/are the insight(s) found from the chart?

🔍 Insights from the Chart:
High Demand at the Airport:

There are significantly more requests originating from the Airport compared to the City.

This suggests that passengers arriving at the airport are frequently booking cabs.

More ‘No Cars Available’ at Airport:

A large portion of airport requests result in ‘No Cars Available’ status.

This indicates that cab supply at the airport is insufficient to meet the high demand.

More Cancellations from the City:

In contrast, the City shows a higher count of cancellations, likely due to drivers cancelling the trip.

This may be due to longer trip distances, traffic, or unappealing drop locations.

Completed Trips More Balanced:

The number of Completed trips is more evenly distributed across both pickup points, but still slightly higher for the City.

Suggests that trips originating in the city are more likely to be fulfilled if not cancelled.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


---

### ✅ **1. Demand-Supply Mismatch Identification**

By analyzing the **Status** of ride requests (Completed, Cancelled, No Cars Available), especially across **pickup points (City vs. Airport)** and **hours of the day**, we uncover:

* **Peak hours with high cancellations or no availability**
* **Pickup zones needing better car allocation**

**Impact:**
Uber can **optimize driver allocation** and **incentivize driver availability** during high-demand hours to reduce customer churn.

---

### ✅ **2. Improved Driver Deployment**

The frequency plots reveal **temporal and spatial trends** in ride requests and failures:

* For instance, higher cancellations from the **City during morning hours** might indicate **driver reluctance** or **traffic congestion**.

**Impact:**
Helps design **driver training, dynamic pricing, or route-based incentives** to ensure better service coverage.

---

### ✅ **3. Operational Efficiency**

Identifying patterns such as:

* More **"No Cars Available"** at the airport during late hours
* High **ride cancellation rate** from certain locations

**Impact:**
Uber can refine its **forecasting models**, prepare **backup fleet strategies**, or introduce **shuttle/pooled options** to fill service gaps.

---

### ✅ **4. Enhanced Customer Experience**

By addressing the bottlenecks exposed through data:

* Fewer cancellations
* Higher likelihood of finding a ride

**Impact:**
Boosts **customer retention, satisfaction, and app ratings**.

---

### ✅ **5. Data-Driven Decision Making**

The entire EDA helps management back decisions with **real usage data**, like:

* Whether to open new driver hubs
* How to balance city vs. airport coverage
* When to apply surge pricing

---



#### Chart - 3

In [None]:
# Extract hour from Request timestamp
df['Request hour'] = pd.to_datetime(df['Request timestamp']).dt.hour

# Plot hourly demand
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='Request hour', hue='Status', palette='coolwarm')
plt.title('Hourly Request Distribution')
plt.xticks(rotation=0)
plt.show()

##### 1. Why did you pick the specific chart?

A count plot was selected because it effectively compares the number and status of ride requests across different hours, helping to uncover operational inefficiencies, peak load times, and when service disruptions (like "No Cars Available") are most likely to happen.

##### 2. What is/are the insight(s) found from the chart?

📊 Insights from the chart:
High Morning Demand (7–9 AM):

There's a noticeable spike in requests.

A large portion of these are "No Cars Available", especially from the City pickup point.

Suggests supply-demand mismatch during office commute time.

Evening Peak (5–9 PM):

Another major demand spike.

Significant number of "Cancelled" rides.

Indicates issues like driver cancellations, possibly due to traffic or unprofitable routes.

Late Night to Early Morning (12 AM – 5 AM):

Very few requests — expected due to low commuter activity.

Most are either "Completed" or "No Cars Available" depending on the time.

Afternoon Dip (12 PM – 4 PM):

Lower demand, fewer ride failures.

Uber’s system handles this time well.

Will the gained insights help creating a positive business impact?


---

### 1. **Understanding Demand Patterns**

* By analyzing **hourly request distribution**, the company can clearly identify **peak demand hours** (e.g., morning office hours or evening rush).
* **Impact:** Helps optimize driver allocation during high-demand hours, reducing wait times and improving customer satisfaction.

---

### 2. **Pickup Point Analysis**

* The **Pickup Point vs Status** chart highlights where most ride requests are being made (e.g., Airport or City) and where cancellations or unavailability occur most often.
* **Impact:**

  * Allocate more drivers to areas with frequent ride cancellations or unfulfilled requests.
  * Tailor pricing and promotions to specific locations.

---

### 3. **Request Status Distribution**

* Knowing the proportion of **Completed, Cancelled, and No Cars Available** requests allows the business to measure service reliability.
* **Impact:**

  * Reduce cancellations by identifying and addressing reasons (e.g., long wait times, fewer drivers).
  * Increase customer retention by improving service availability.

---

### 4. **Driver Scheduling Optimization**

* Combining all the insights (time of day, pickup points, and status) allows for **smarter shift planning** for drivers.
* **Impact:**

  * Reduces idle driver time during low-demand periods.
  * Boosts driver earnings and operational efficiency.

---

### 5. **Strategic Decision Making**

* The data reveals **bottlenecks** in the service funnel – such as when many requests go unfulfilled due to lack of cars.
* **Impact:**

  * Invest in more drivers or dynamic pricing to manage surges.
  * Introduce features like ride pre-booking during peak hours.

---



#### Chart - 4

In [None]:
# Extract day from Request timestamp
df['Request day'] = pd.to_datetime(df['Request timestamp']).dt.date

# 4. Demand over time
daily_status = df.groupby(['Request day', 'Status']).size().unstack().fillna(0)
daily_status.plot(kind='bar', stacked=True, figsize=(14,6))
plt.title('Daily Request Status Trend')
plt.ylabel('Request Count')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

A stacked bar chart by day and status is ideal to show how demand and fulfillment vary across days, helping stakeholders spot patterns, trends, and potential service gaps over time.

##### 2. What is/are the insight(s) found from the chart?

📌 1. Peak Demand Days
Days with taller bars represent high request volumes.

You can spot which days had the highest demand, useful for resource planning (e.g., more drivers needed).

📌 2. Service Efficiency
Larger green portions (Trip Completed) indicate better service performance on those days.

If the green section is consistently small, it suggests low fulfillment rates.

📌 3. Cancellations & Unavailability Trends
More red (Cancelled) or orange/yellow (No Cars Available) means operational issues.

Example insights:

Certain days have high cancellation rates (could be due to traffic, weather, or driver no-shows).

Frequent "No Cars Available" indicates supply-demand imbalance (demand exceeded supply).

📌 4. Daily Pattern Clusters
You might notice that weekends or weekdays follow different patterns.

E.g., Weekends may have high demand but low fulfillment.

Or weekdays may have more “Trip Completed” statuses.

📌 5. Potential Improvement Areas
Days where "No Cars Available" dominates are opportunities to deploy more drivers or improve coverage.

Cancellations clustering on specific dates may need policy changes or incentives for drivers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

💡 How These Insights Can Drive Positive Business Impact
Better Resource Planning

Allocate more drivers on high-demand days (e.g., weekends or holidays).

Prevent "No Cars Available" issues by forecasting supply needs.

Improve Customer Experience

Reducing cancellation rates and supply gaps directly enhances reliability.

Satisfied customers are more likely to rebook and refer others.

Boost Revenue

Meeting demand consistently means fewer lost trips, leading to higher revenue.

Operational Improvements

Identify internal or external factors leading to cancellations and address them (e.g., incentives for drivers, app/UX fixes, etc.).

Strategic Planning

Use patterns over time to align with marketing, pricing, and staffing strategies.

#### Chart - 5

In [None]:
# Create a 'Time slot' column from 'Request hour'
def get_time_slot(hour):
    if 5 <= hour < 10:
        return 'Morning'
    elif 10 <= hour < 17:
        return 'Day'
    elif 17 <= hour < 22:
        return 'Evening'
    else:
        return 'Late Night'

df['Time slot'] = df['Request hour'].apply(get_time_slot)

# 5. Time slot vs status
plt.figure(figsize=(8,5))
sns.countplot(data=df, x='Time slot', hue='Status', palette='muted', order=['Morning', 'Day', 'Evening', 'Late Night'])
plt.title('Request Status by Time Slot')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

So, this chart is the most effective for analyzing how request outcomes vary by time of day.

##### 2. What is/are the insight(s) found from the chart?

📊 Insights:
Morning Rush (5 AM - 10 AM):

High number of "No Cars Available":
Most requests are not fulfilled due to unavailability of cars.

Suggests demand > supply in the morning.

Evening Rush (5 PM - 10 PM):

High number of "Cancelled" requests:
Drivers cancel more frequently during this time.

Indicates possible driver-side issues like traffic or selective acceptance.

Night and Early Morning (10 PM - 5 AM):

Fewer total requests, but a higher proportion of "No Cars Available".

Could reflect limited driver availability during off-peak hours.

Afternoon (11 AM - 4 PM):

Mostly completed trips, low issues.

Indicates a relatively balanced demand-supply during this time.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

✅ 1. Addressing Supply-Demand Gaps
Morning time slot shows high "No Cars Available" rates.

📈 Impact: Uber can increase driver incentives or rebalance supply during these hours, leading to more completed rides, higher revenue, and improved customer satisfaction.

✅ 2. Reducing Cancellations
Evening time slot has a spike in cancellations.

📉 Impact: Uber can investigate causes—e.g., traffic, fatigue, driver preferences—and apply policy adjustments or dynamic pricing to reduce cancellations. This improves reliability and retains customers.

✅ 3. Optimizing Operations
Afternoon time slot is relatively smooth.

🧘 Impact: Uber can keep operations stable here and possibly reallocate drivers to other time slots, improving overall fleet efficiency.

✅ 4. Enhancing Driver Engagement
Low availability at night could be due to lack of motivation.

💡 Impact: Introducing night-time bonuses or safety measures might boost driver participation, increasing revenue from late-night demand.

✅ 5. Data-Driven Decision-Making
These insights form a solid foundation for planning:

Surge pricing

Driver shift scheduling

Targeted promotions

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.



---

### ✅ **Suggested Strategy to Achieve Business Objective:**

#### 1. **Balance Demand and Supply – Especially During Peak Hours**

* **Morning (5 AM – 10 AM):**

  * **Issue:** High demand but many “No Cars Available”.
  * **Action:** Increase driver incentives to work during morning peak hours.
  * **Benefit:** Meet demand, reduce lost rides, increase revenue.

* **Evening (5 PM – 10 PM):**

  * **Issue:** High rate of driver **cancellations**.
  * **Action:** Investigate cancellation reasons (traffic, selective pickups), improve app-based penalty/reward systems, and optimize matching algorithms.
  * **Benefit:** Fewer cancellations = better customer satisfaction.

#### 2. **Driver Shift Optimization**

* Use historical data to predict **high-demand periods**.
* Encourage **shift-based planning** so more drivers are available during peak times and fewer during off-peak.
* May involve **driver scheduling tools** or offering flexible incentives.

#### 3. **Improve Real-time Allocation Algorithm**

* Modify the matching algorithm to:

  * Prioritize nearest available driver
  * Avoid assigning requests to drivers likely to cancel
* Use machine learning models to **predict driver behavior** and adjust allocation accordingly.

#### 4. **Customer Communication Enhancements**

* During high-demand hours, send **push notifications**:

  * Informing users about wait times or surge pricing
  * Suggesting alternate time slots for cheaper or faster rides

#### 5. **Expand Driver Pool in Underserved Areas**

* If cancellations or "no cars" cluster around certain **pickup locations** (e.g., airports, outskirts), consider:

  * Targeted **driver recruitment** campaigns
  * Temporary **location-based incentives**

---


# **Conclusion**


---

### 📌 **Conclusion**

The analysis of Uber request data reveals key demand patterns, operational gaps, and opportunities for improvement:

1. **Pickup Point Insights**:

   * A majority of trip cancellations occur at the **City** pickup point, while most **No Cars Available** issues arise at the **Airport**.
   * This suggests an **imbalance in supply and demand** across locations—too many riders at the airport, not enough drivers.

2. **Hourly Demand Patterns**:

   * Peak demand is seen during **early morning and evening hours**, aligning with common office commute times.
   * **Driver availability doesn't match rider demand** during these hours, resulting in high cancellation and unavailability rates.

3. **Daily Trend Observations**:

   * Certain days show higher request volumes, often mid-week.
   * **Operational challenges persist consistently**, especially in fulfilling requests during high-demand periods.

4. **Time Slot Analysis**:

   * The **Morning Rush** and **Evening Rush** time slots experience the highest number of requests but also the highest failure rates.
   * This indicates a need for **better fleet distribution** or **incentives for drivers** during rush hours.

---



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***