 **Project Name**    -
Demand Supply Analysis of Uber Ride Request



##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

To analyze Uber ride request data and identify key patterns, pain points, and operational inefficiencies such as high cancellation rates and unavailability of cars, particularly during specific time slots and pickup points.


# **GitHub Link -**

https://raw.githubusercontent.com/Lals30/Uber_request_data/refs/heads/main/Uber%20Request%20Data%20(4).csv

# **Problem Statement**


**Write Problem Statement Here.**
Uber frequently faces service gaps such as driver cancellations and car unavailability, particularly during peak hours and at key pickup points like airport.
This project analyses uber request data to uncover patterns behind these failures and recomends strategies to optimize supply, reduce cancellations, and improve ride fuillment rates.

#### **Define Your Business Objective?**

To analyze and optimize uber's ride request fuillment processs by identifying:


*   Time slots and locations with the highest demand-supply mismatch
*   Key problem behind trip cancellations and unavailabilty of cars
*   To improve efficiency and customer satisfaction
This will improve the wait time for drivers and improve trip completion rates.





# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from google.colab import files
uploaded = files.upload()


### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('Uber Request Data (1).csv')


### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape


### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()
missing = df.isnull().sum()
missing[missing > 0]  # shows only columns with missing values


In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 5))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis', yticklabels=False)
plt.title("Missing Values Heatmap")
plt.show()


### What did you know about your dataset?

The dataset contains 6746 ride requests and 6 columns, which shows the request for a uber ride. In that 60% of driver Id are missing( unassigned) and 58% of missing drop timestamp which means trip is not completed.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Describe
df.describe(include='all')

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_vals = pd.DataFrame({
    "Column": df.columns,
    "Unique Values Count": [df[col].nunique() for col in df.columns]
})
unique_vals


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
import pandas as pd
df = pd.read_csv("Uber Request Data (1).csv")
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True, errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], dayfirst=True, errors='coerce')
df.rename(columns=lambda x: x.strip().lower().replace(' ', '_'), inplace=True)
df['request_hour'] = df['request_timestamp'].dt.hour
def get_time_slot(hour):
    if 0 <= hour < 4:
        return 'Late Night'
    elif 4 <= hour < 8:
        return 'Early Morning'
    elif 8 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 16:
        return 'Afternoon'
    elif 16 <= hour < 20:
        return 'Evening'
    else:
        return 'Night'

df['time_slot'] = df['request_hour'].apply(get_time_slot)
df['cleaned_status'] = df['status'].replace({
    'Trip Completed': 'Completed',
    'No Cars Available': 'No Cars',
    'Cancelled': 'Cancelled'
})

unassigned_requests = df[df['driver_id'].isna()]


df['trip_duration_min'] = (df['drop_timestamp'] - df['request_timestamp']).dt.total_seconds() / 60


### What all manipulations have you done and insights you found?

Cleaned the data having inconsistence names and formatted it in a correct way.Converted timestamps into correct datetime objects. Added new features like request hour, time slot, trip duration. Identified missing values.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(7, 5))
sns.countplot(x='cleaned_status', data=df, palette='Set2')

plt.title("Number of Requests by Status", fontsize=14)
plt.xlabel("Request Status", fontsize=12)
plt.ylabel("Number of Requests", fontsize=12)
plt.xticks(fontsize=11)
plt.yticks(fontsize=11)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart is ideal for comparing discrete categories like Completed, Cancelled, and No Cars Available. It gives a quick visual cue of which outcomes are more common and easily understandable.

##### 2. What is/are the insight(s) found from the chart?

Completed requests form less than half of the total ride requests
A very high number of "No Cars Available"
Cancellations also form a large chunk of total requests

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive business impact
Insights help identify when and where rides fail, allowing Uber to improve driver allocation, reduce cancellations, and increase ride completion, leading to higher revenue and better customer satisfaction.

Negative growth
High rates of "No Cars Available" and cancellations show a demand-supply mismatch. If not addressed, it can lead to lost revenue, rider churn, and driver dissatisfaction.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(8,5))
sns.countplot(
    data=df,
    x='pickup_point',
    hue='cleaned_status',          # Completed / Cancelled / No Cars
    palette='Set1'
)

plt.title("Request Status by Pickup Point", fontsize=14)
plt.xlabel("Pickup Point", fontsize=12)
plt.ylabel("Number of Requests", fontsize=12)
plt.legend(title="Status")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A grouped bar chart is best for comparing categorical outcomes across two locations (City vs Airport). It makes the share of Completed, Cancelled, and No Cars immediately obvious for each pickup point.

##### 2. What is/are the insight(s) found from the chart?

 Airport shows a much larger share of “No Cars Available” and cancellations.
City rides are more likely to be Completed.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Business impact
Knowing the exact failure mix per location lets Uber re-allocate drivers or add location-based incentives at the Airport, directly improving fulfillment and revenue.

Negative Growth
If the high Airport failure rate isn’t fixed, riders may switch to competitors or other transport, causing lost revenue and brand harm in a key, high-fare corridor.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10, 5))
sns.histplot(
    data=df,
    x='request_hour',
    hue='cleaned_status',
    multiple='stack',
    palette='coolwarm',
    bins=24
)

plt.title("Ride Requests by Hour and Status", fontsize=14)
plt.xlabel("Hour of Request (24h)", fontsize=12)
plt.ylabel("Number of Requests", fontsize=12)
plt.xticks(range(0, 24))
plt.legend(title="Request Status")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A stacked histogram is perfect to show distribution over time and highlight volume. It visually exposes which hours have high demand and where failures dominate.

##### 2. What is/are the insight(s) found from the chart?

• Early Morning (5–8 AM) has a spike in requests but is mostly “No Cars Available”, especially at the Airport.
• Evening hours (5–8 PM) also show high cancellations.
• Midday to afternoon (12–4 PM) is relatively balanced and has more completions.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

positive business impact
It identifies specific hours where demand peaks but supply fails, helping operations and planning teams deploy drivers strategically.
Negative-growth
The high ride failure rate during peak demand hours may lead to user churn, bad reviews, and platform-switching. Every unmet ride request is a lost revenue opportunity. If unresolved, it leads to long-term decline in trust and loyalty.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(9, 5))
sns.countplot(
    data=df,
    x='time_slot',
    hue='cleaned_status',
    order=['Late Night', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night'],
    palette='Spectral'
)

plt.title("Requests by Time Slot and Status", fontsize=14)
plt.xlabel("Time Slot", fontsize=12)
plt.ylabel("Number of Requests", fontsize=12)
plt.legend(title="Status")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A grouped bar chart is ideal to show categorical comparisons across time segments (Morning, Evening, etc.). This makes it easy to spot which part of the day suffers most in terms of ride failures.

##### 2. What is/are the insight(s) found from the chart?

• Early Morning (4–8 AM) has the highest volume of "No Cars Available" – especially for Airport pickups.
• Evening shows many cancellations.
• Morning and Afternoon are more stable and show more Completed rides.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight helps in driver shift planning. Knowing when demand peaks but supply fails allows Uber to optimize driver availability during the most critical time slots.

Negative
If the Early Morning demand is not served, Uber loses airport-bound customers, a high-value segment. Consistent failures during these slots will cause riders to switch to alternate transport or competitors.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
heatmap_data = df.groupby(['pickup_point', 'request_hour'])['request_id'].count().unstack()

plt.figure(figsize=(12, 6))
sns.heatmap(heatmap_data, cmap="YlGnBu", annot=True, fmt="d", linewidths=0.5)

plt.title("Requests Heatmap: Pickup Point vs Hour of Day", fontsize=14)
plt.xlabel("Hour of Day", fontsize=12)
plt.ylabel("Pickup Point", fontsize=12)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A heatmap is perfect for showing density of activity across two dimensions — here, pickup point and request hour. It helps quickly spot demand spikes and downtime, making it easy to identify problem windows.

##### 2. What is/are the insight(s) found from the chart?

• Airport shows very high request volume during Early Morning (5–7 AM) and Evening (5–8 PM).
• City has more consistent demand throughout the day.
• Both locations show clear rush hour clusters.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

these insights allow Uber to forecast demand more accurately by time and location. With that, they can assign drivers dynamically, improving availability and reducing lost trips.

negative-growth
 if high-volume hours (like 6 AM at Airport) go underserved, it could lead to user frustration and bad reviews during peak times — especially for critical rides like airport transfers.

In [None]:
pip install squarify

#### Chart - 6

In [None]:
# Chart - 6 visualization code

import squarify
import matplotlib.pyplot as plt

tree_data = df.groupby(['pickup_point', 'time_slot', 'cleaned_status']).size().reset_index(name='count')

tree_data['label'] = tree_data.apply(lambda x: f"{x['pickup_point']}\n{x['time_slot']}\n{x['cleaned_status']}\n{int(x['count'])}", axis=1)

# Plot
plt.figure(figsize=(14, 8))
squarify.plot(sizes=tree_data['count'], label=tree_data['label'], alpha=0.85, color=sns.color_palette("Spectral", len(tree_data)))

plt.title("Treemap of Requests by Pickup Point, Time Slot & Status", fontsize=16)
plt.axis('off')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A treemap visually combines 3 important variables — pickup location, time of day, and ride outcome — into one compact, easy-to-understand visual. It shows where the largest problems or successes are concentrated.


##### 2. What is/are the insight(s) found from the chart?

 Airport – Early Morning – No Cars is one of the largest blocks, highlighting Uber’s biggest problem segment.
• City – Afternoon – Completed also forms a large share, indicating stability.
• Several blocks show where cancellations dominate specific slots.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It helps leadership focus interventions on exact combinations like Airport + Early Morning, where demand is high and failures frequent. It guides targeted improvements rather than broad guesses

Negative impact
The treemap shows large sections of unfulfilled rides in very specific high-value segments. If Uber doesn’t resolve those (e.g., Early Morning airport rides), it risks losing premium customers and damaging its brand reputation.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(10, 5))
sns.countplot(
    data=df,
    x='cleaned_status',
    hue='time_slot',
    hue_order=['Late Night', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night'],
    palette='tab10'
)

plt.title("Request Status by Time Slot", fontsize=14)
plt.xlabel("Ride Status", fontsize=12)
plt.ylabel("Number of Requests", fontsize=12)
plt.legend(title="Time Slot")
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

	This grouped bar chart helps compare how different time slots contribute to each ride status (Completed, Cancelled, No Cars). It’s ideal to see when each type of outcome is most frequent.


##### 2. What is/are the insight(s) found from the chart?

• “No Cars Available” is highest in Early Morning.
• Cancellations peak in Evening.
• Most completed rides occur in Morning and Afternoon.
• Late Night and Night slots have low overall volume.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

It highlights which time slots to target for operational improvements, such as adding driver incentives for early morning or reducing wait times in the evening. This leads to higher fulfillment and better customer satisfaction.

Negative-growth
Frequent failures in Early Morning and Evening, which are high-demand periods, can cause lost revenue, negative customer experience, and platform abandonment if left unresolved.

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Select numerical features only
numeric_cols = df[['request_hour', 'trip_duration_min']].dropna()

# Compute correlation matrix
correlation_matrix = numeric_cols.corr()

# Plot the heatmap
plt.figure(figsize=(6, 4))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5, fmt=".2f")

plt.title("Correlation Heatmap of Numeric Features", fontsize=14)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A correlation heatmap is useful to understand relationships between numerical variables, such as whether trip durations vary by hour or whether time affects demand. It helps identify linear associations that may drive deeper modeling.

##### 2. What is/are the insight(s) found from the chart?

• You may observe a weak negative or no correlation between request_hour and trip_duration_min.
• This suggests trip duration is mostly independent of request time, meaning trip lengths are random or location-dependent rather than time-of-day dependent.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Select relevant features
pairplot_data = df[['request_hour', 'trip_duration_min', 'cleaned_status']].dropna()

# Plot
sns.pairplot(
    pairplot_data,
    hue='cleaned_status',         # Color by ride status
    palette='Set2',
    height=2.5,
    diag_kind='kde'
)

plt.suptitle("Pair Plot: Hour, Duration, and Status", fontsize=16, y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

	A pair plot is ideal for exploring relationships and distributions between multiple numeric variables (e.g., request_hour vs trip_duration) and how they behave across categories (e.g., ride status). It’s a multi-variable visual analysis tool.

##### 2. What is/are the insight(s) found from the chart?

• No strong visual correlation between request_hour and trip_duration_min.
• The density curves reveal that most rides are short in duration regardless of time.
• "Completed" rides tend to have longer durations (as expected).

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Deploy more drivers during Early Morning (4–8 AM) and Evening (5–8 PM) — especially for airport pickups.

2. Offer higher payouts or bonuses for drivers working during high-demand but low-supply slots.

3. Improve matching algorithms and set policies to reduce driver or rider cancellations.

4. Offer features like scheduled rides and transparent ETAs to reduce user drop-offs.

# **Conclusion**

The Uber ride request data analysis revealed significant service gaps, especially during Early Morning and Evening time slots, and primarily at the Airport pickup location.

A large share of requests were either cancelled or showed no cars available, highlighting driver unavailability and inefficient resource allocation as key issues.

Through detailed EDA and visualizations, we identified when, where, and why ride failures occur, enabling data-driven decision-making.

By acting on the insights — optimizing driver deployment, offering time-based incentives — Uber can significantly:

Improve ride fulfillment rates

Reduce customer churn

Increase operational efficiency

Drive sustainable business growth

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***