
# **Project Name    -  Uber Supply Demand Gap**



##### **Project Type**    - EDA (Exploratory Data Analysis)
##### **Contribution**    - Individual
##### **Member**  - Muhammad Musaib

# **Project Summary**

This project involves an Exploratory Data Analysis (EDA) of Uber ride request data to identify demand patterns, service gaps, and potential operational bottlenecks. The data contains information such as request timestamps, pickup points (Airport or City), driver assignment status, and trip completion status.

The goal of this analysis is to understand customer behavior and operational efficiency across various time slots, identify peak demand periods, and highlight the reasons behind unfulfilled requests. Through data cleaning, visualization, and aggregation, we uncover critical insights such as demand-supply mismatch, time-of-day effects, and patterns in trip requests that can help optimize driver allocation and service performance.

By identifying problem areas such as "No Cars Available" and "Driver Cancelled" cases during high-demand periods, the analysis aims to support Uber’s business decisions with data-driven insights. Charts like bar plots, heatmaps, and time series plots provide clear visual storytelling of operational trends.

# **GitHub Link**

https://github.com/MusaibHazari/Uber-Request-Data

# **Problem Statement**
Analyze Uber ride request data to uncover operational inefficiencies and demand-supply gaps.

#### **Define Your Business Objective?**

To identify patterns in trip requests, detect peak demand slots, and propose improvements in driver availability to reduce cancellations and improve customer satisfaction.

# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Optional: nicer charts
sns.set(style='whitegrid')

### Dataset Loading

In [None]:
from google.colab import drive
file_id = '1E0kVqjq0dweoumXR6K4QvhOByxLSE_Mq'
url = f'https://drive.google.com/uc?id={file_id}'

# Read CSV directly from Google Drive
df = pd.read_csv(url)

### Dataset First View

In [None]:
# Preview data
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print("Rows:", df.shape[0])
print("Columns:", df.shape[1])

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_count = df.duplicated().sum()
print("Number of duplicate rows:", duplicate_count)

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum()
print("Missing values in each column:\n")
print(missing_values)

In [None]:
# Visualizing the missing values
missing_values = df.isnull().sum()
missing_values = missing_values[missing_values > 0]

plt.figure(figsize=(10, 6))
ax = sns.barplot(x=missing_values.index, y=missing_values.values, color='red')

# Add value labels on each bar
for i, v in enumerate(missing_values.values):
    ax.text(i, v + 50, str(int(v)), ha='center', va='bottom', fontweight='bold')

plt.title("Missing Values Count by Column")
plt.ylabel("Number of Missing Values")
plt.xlabel("Columns")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print("List of columns in the dataset:")
print(df.columns.tolist())

In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

| Variables              | Description                                                                         |
| ------------------------- | ----------------------------------------------------------------------------------- |
| `Request id`              | Unique ID for each ride request                                                     |
| `Pickup point`            | Location from where the trip was requested — either `City` or `Airport`             |
| `Driver id`               | ID of the driver who accepted the ride; missing if no driver was assigned           |
| `Status`                  | Final status of the request — `Trip Completed`, `Cancelled`, or `No Cars Available` |
| `Request timestamp`       | Date and time when the user requested the ride                                      |
| `Drop timestamp`          | Date and time when the ride ended (null if ride didn't happen)                      |
| `Trip Duration (hh:mm)`   | Trip duration in hours and minutes (only for completed trips)                       |
| `Trip Duration (in mins)` | Trip duration in minutes (numeric version of above)                                 |
| `Trip Date`               | Date of the request extracted from the timestamp                                    |
| `Request Time`            | Time of the request extracted from the timestamp                                    |
| `Day Of Week`             | Day of the week (e.g., Monday, Tuesday)                                             |
| `Is Weekend`              | Whether the trip was on a weekend — `Yes` or `No`                                   |
| `Trip Completed`          | Simplified yes/no status based on the `Status` column                               |
| `Driver Assigned`         | Whether a driver was assigned — `Yes` or `No`                                       |
| `Trip Request Time Slot`  | Time slot bucket (e.g., Morning, Evening, Late Night)                               |

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_counts = df.nunique()
print("Unique values in each column:\n")
print(unique_counts)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Since the dataset is already cleaned, we confirm that:
# - No duplicate rows exist
# - Missing values have been handled
# - Date/time columns are properly formatted
# - Additional features like 'Day Of Week', 'Trip Request Time Slot' are already available
# Therefore, no additional data wrangling is required.

### What all manipulations have you done and insights you found?

# **Data Manipulations Performed**
1. **Checked for duplicate rows:**
The raw dataset was examined for any exact duplicate entries to ensure data integrity. Removing duplicates helps prevent skewed analysis and ensures that each trip request is uniquely accounted for.

2. **Standardization of Timestamps:**
Timestamps in the raw data appeared in inconsistent formats (e.g., 11/7/2016 11:51 vs. 13-07-2016 08:33:16). These were standardized to a uniform DD-MM-YYYY HH:MM format to enable accurate time-based calculations and comparisons.

3. **Derived Columns Added:**
Additional columns were created to enrich the dataset and enable deeper analysis. These include fields like trip duration (in minutes and HH:MM format), day of the week, whether the trip occurred on a weekend, time of request, time slots (like Morning or Evening), and flags for trip completion and driver assignment. These derived columns made it easier to identify patterns and extract actionable insights.

# **Insights Discovered**
**1. Peak Demand Time Slots:**  
*   Morning (5 AM – 10 AM): High number of requests from City to Airport.
*   Evening (5 PM – 9 PM): High number of requests from Airport to City.

**2. Most Common Issues by Time Slot:**
*   Morning (City → Airport): Majority of request failures are due to No Cars Available.
*   Evening (Airport → City): A significant number of requests are Cancelled by Drivers.

**3. Trip Completion Rate:**
*   Highest number of Trip Completed entries occur outside peak hours.
*   During peak slots, completion rate drops due to driver unavailability or cancellations.

**4. Driver Availability:**
*   A large portion of requests during peak hours have no driver assigned, indicating a supply shortage.
*   Suggests a need for better driver allocation during high-demand periods.

**5. Trip Durations:**
*   Completed trips typically last between 40 to 70 minutes, showing consistent travel time regardless of direction or time.

**6. Directional Demand Imbalance:**
*   Mornings: High demand from City to Airport, but not vice versa.
*   Evenings: High demand from Airport to City, with fewer reverse trips.

**7. No Weekend Data:**
*   The dataset includes only Monday to Friday, so weekend behavior couldn't be analyzed.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 : **Trip Request Status Distribution**

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(10, 6))
ax = sns.countplot(data=df, x='Status', hue='Status', palette='Set1')

# Add number labels on each bar
for p in ax.patches:
    height = p.get_height()
    ax.text(p.get_x() + p.get_width()/2, height + 50, int(height),
            ha='center', va='bottom', fontweight='bold')

plt.title('Trip Request Status Distribution')
plt.xlabel('Trip Status')
plt.ylabel('Number of Requests')
plt.tight_layout()
plt.show()

##### **1. Why did you pick the specific chart?**

I chose this chart to clearly compare the frequency of each trip status category using categorical data.

##### **2. What is/are the insight(s) found from the chart?**

*  A large number of requests are not completed.
*  No Cars Available and Cancelled together outnumber completed trips.
*  The company is losing a significant portion of customer demand due to operational failures (driver-side or supply-side).

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business impact:**
* Reveals operational inefficiencies.
* Uber can adjust driver availability and policies to improve completions.

**Insights:**
* If unfulfilled requests continue, Uber could lose customers and revenue.

#### Chart - 2 : **Pickup Point vs Trip Status**

In [None]:
plt.figure(figsize=(10, 6))
ax = sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set2')

# Add number labels on each bar
for p in ax.patches:
    height = p.get_height()
    if height > 0:
        ax.annotate(f'{int(height)}',
                    (p.get_x() + p.get_width() / 2, height),
                    ha='center', va='bottom', fontsize=10)

plt.title('Pickup Point vs Trip Status')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.tight_layout()
plt.show()

##### **1. Why did you pick the specific chart?**

To compare how trip status varies between pickups from the City vs Airport.

##### **2. What is/are the insight(s) found from the chart?**

**Insights:**
* City pickups have higher cancellations.
* Airport pickups suffer more from lack of car availability.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business impact:**
* Helps optimize fleet deployment based on pickup location behavior.

**Risk of negative growth:**
* Ignoring location-based demand may create regional dissatisfaction.

#### Chart - 3 : **Trip Request Time Slot vs Status**

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10, 6))
ax=sns.countplot(data=df, x='Trip Request Time Slot', hue='Status', palette='Set3',
              order=['Late Night', 'Morning', 'Afternoon', 'Evening', 'Night'])
# Add number labels on each bar
for p in ax.patches:
    height = p.get_height()
    if height > 0:
        ax.annotate(f'{int(height)}',
                    (p.get_x() + p.get_width() / 2, height),
                    ha='center', va='bottom', fontsize=10)
plt.title('Trip Request Time Slot vs Status')
plt.xlabel('Time Slot')
plt.ylabel('Number of Requests')
plt.tight_layout()
plt.show()


##### **1. Why did you pick the specific chart?**

Time slots help identify when demand is highest and when issues occur most.

##### **2. What is/are the insight(s) found from the chart?**

**Insights:**

* Morning has the most cancellations.
* Evening shows the highest No Cars Available.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business impact:**
* Can guide driver incentive programs by time slot.
* Improve service during high-demand slots.

**Risk of negative growth:**
* Not addressing time-specific problems can lead to consistent service failure.

#### Chart - 4 : **Driver Assigned vs Not Assigned**

In [None]:
# Chart - 4 visualization code
driver_counts = df['Driver Assigned'].value_counts()
plt.figure(figsize=(10, 6))
plt.pie(driver_counts, labels=driver_counts.index, autopct='%1.1f%%', colors=['lightcoral', 'skyblue'], startangle=90)
plt.title('Driver Assigned vs Not Assigned')
plt.show()

##### **1. Why did you pick the specific chart?**

This gives a quick snapshot of the percentage of trips where a driver was assigned.

##### **2. What is/are the insight(s) found from the chart?**

**Insight:**
* A significant portion of requests were not assigned a driver, which highlights a critical operational failure.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* Understanding driver assignment rates helps Uber improve driver availability, optimize automated dispatch systems, and reduce response time.

**Risk of negative growth:**
* If unassigned requests remain high, Uber may lose customers due to poor service reliability, damaging its market trust.

#### Chart - 5 : **Trip Completion Rate**

In [None]:
# Chart - 5 visualization code
trip_completion = df['Trip Completed'].value_counts()
plt.figure(figsize=(10, 6))
plt.pie(trip_completion, labels=trip_completion.index, autopct='%1.1f%%',
        colors=['limegreen', 'salmon'], startangle=90, wedgeprops=dict(width=0.3))
plt.title('Trip Completion Rate')
plt.show()


##### **1. Why did you pick the specific chart?**

To measure how many requests are successfully fulfilled — a critical indicator of service performance.

##### **2. What is/are the insight(s) found from the chart?**

**Insight:**
* Less than half of the ride requests are successfully completed, indicating a significant service failure rate.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* Clearly outlines the need to boost fulfillment rates, possibly by reducing cancellations and increasing car availability.

**Risk of negative growth:**
* A persistently low completion rate can lead to customer churn, affecting market position and revenue growth.

#### Chart - 6 : **Trip Requests by Day of the Week**

In [None]:
# Chart - 6 visualization code
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
day_counts = df['Day Of Week'].value_counts().reindex(day_order)
plt.figure(figsize=(10, 6))
sns.lineplot(x=day_counts.index, y=day_counts.values, marker='o', color='teal')
plt.title('Trip Requests by Day of the Week')
plt.xlabel('Day')
plt.ylabel('Number of Trip Requests')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()

##### **1. Why did you pick the specific chart?**

To understand how trip demand varies throughout the week.

##### **2. What is/are the insight(s) found from the chart?**

**Insights:**
* Weekday trip volume appears higher than weekends.
* Friday and Monday show notable demand, likely due to airport or work-related travel.
* Weekends have fewer requests, suggesting a dip in business-related usage.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* Uber can optimize driver shifts to match daily demand trends.
* Helps plan targeted promotions or discounts on low-demand days.
* Informs surge pricing models for peak weekdays.

**Risk of negative growth:**
* Without adjusting for weekly patterns, Uber may face driver shortages on high-demand days or underutilization on others.
* Can lead to longer wait times, increased cancellations, and customer dissatisfaction.

#### Chart - 7 : **Trip Requests by Time Slot**

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10, 6))
ax = sns.countplot(
    data=df,
    x='Trip Request Time Slot',
    hue='Trip Request Time Slot',
    order=df['Trip Request Time Slot'].value_counts().index,
    palette='Set2',
    legend=False  # Avoids double legend since hue == x
)

# Add number labels on each bar
for p in ax.patches:
    height = p.get_height()
    if height > 0:
        ax.text(p.get_x() + p.get_width()/2, height + 50, int(height),
                ha='center', va='bottom', fontweight='bold')

plt.title('Requests by Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Number of Requests')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()


##### **1. Why did you pick the specific chart?**

It helps in clearly comparing how request volumes vary across different parts of the day.

##### **2. What is/are the insight(s) found from the chart?**

**Insights:**
* Morning has the highest number of ride requests.
* Evening also sees significant demand, but slightly lower than Morning.
* Late Night and Night have fewer requests, indicating low usage during those hours.
* Afternoon has moderate activity, likely due to non-work-related travel.

##### **3. Will the gained insights help creating a positive business impact?**
Are there any insights that lead to negative growth? Justify with specific reason.

**Positive Business Impact:**
* Uber can optimize driver allocation by increasing availability during Morning and Evening time slots.
* Incentives and bonuses can be aligned with peak hours to encourage more driver participation.
* Helps refine surge pricing models for specific time windows.

**Risks of Negative Growth:**
* If the high demand during Morning/Evening slots is not met with enough drivers, customer dissatisfaction and cancellations will rise.
* Low driver availability during peak hours may lead to revenue loss and customer churn.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?

Based on the analysis, the primary business challenge is high trip failure rates due to driver unavailability and cancellations during peak hours and in specific zones. To improve trip completion and overall service quality, I suggest the following actionable solutions:

1. Optimize Driver Allocation by Time and Location
Use demand patterns (hourly, daily, and zonal trends) to schedule more drivers during peak periods — particularly mornings in the city and evenings at the airport.

2. Introduce Incentives During High-Cancellation Periods
Provide targeted driver bonuses during times with high cancellation rates (e.g., early morning city pickups) to reduce driver rejection and improve fulfillment.

3. Dynamic Demand Forecasting System
Implement a real-time demand forecasting model using historical data (like day, hour, and pickup point) to proactively adjust pricing and driver deployment.

4. Improve Driver Dispatch Logic
Optimize the matching algorithm to reduce delays in driver assignment, especially during high-demand windows. This will directly boost trip completion rates.

5. Customer Notification & Scheduling Features
Introduce ride scheduling or pre-booking options to spread demand and give the system more lead time to assign drivers efficiently.

# **Conclusion**

This exploratory data analysis of Uber ride requests uncovered critical operational challenges, especially during peak hours. A major issue identified was the demand-supply gap—with "No Cars Available" dominating morning requests from the city, and driver cancellations peaking in the evening at the airport.

Less than half of the ride requests were successfully completed, indicating inefficiencies in driver allocation and dispatch systems. This leads to lost revenue and poor customer experience.

* To address these issues, Uber should:
* Optimize driver allocation based on time and location trends,
* Incentivize drivers during high-cancellation periods,
* Implement demand forecasting, and
* Improve dispatch algorithms.

These data-backed insights can significantly enhance service reliability, trip fulfillment, and overall customer satisfaction.