<a href="https://colab.research.google.com/github/Sivayogesh-gif/Uber-Demand-Supply/blob/main/Uber_Demand_Supply_Request.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Uber Demand Supply Request



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Sivayogesh S


# **Project Summary -**

This project focuses on analyzing the supply-demand gap in Uber rides, primarily for airport trips. Using EDA, we identify trends in trip requests, cancellations, and completed rides to uncover patterns during peak hours. Insights from this analysis can help Uber reduce cancellations, improve customer experience, and optimize driver allocation. The dataset contains variables such as request timestamp, drop timestamp, and request status, which will be explored through univariate, bivariate, and multivariate analysis. Visualizations will be used to highlight peak-demand windows and suggest strategic solutions.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The primary challenge faced by Uber is the mismatch between supply (available drivers) and demand (customer ride requests), particularly during peak hours and at critical locations such as airports. This imbalance results in canceled rides, delayed pickups, and dissatisfied customers, which negatively impacts Uber’s revenue and customer trust.

#### **Define Your Business Objective?**

The objective is to analyze the Uber request data to identify patterns and root causes of the supply-demand gap. Key goals include:

Understanding demand trends across different times of the day.

Identifying high-demand zones and peak-hour patterns.

Examining reasons for cancellations and non-availability of cars.

Providing actionable recommendations to minimize cancellations and improve customer experience.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings

warnings.filterwarnings('ignore')
sns.set(style="whitegrid")

### Dataset Loading

In [None]:
from google.colab import files
uploaded = files.upload()  # Upload your CSV file manually

# Load dataset
df = pd.read_csv('Uber Request Data.csv')  # Replace with your filename

### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
print("Number of rows:", df.shape[0])
print("Number of columns:", df.shape[1])

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
print("Duplicate values:", df.duplicated().sum())

#### Missing Values/Null Values

In [None]:
# Missing Values / Null Values Count
print("Missing values in each column:")
print(df.isnull().sum())

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Heatmap for Missing Values
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap', fontsize=14)
plt.show()

### What did you know about your dataset?

The dataset contains details of Uber ride requests, including the request timestamp, drop timestamp, pickup point, and ride status. It consists of approximately 6,769 rows and 6 columns. The key columns are:

Request id: Unique identifier for each request.

Request timestamp & Drop timestamp: Need conversion to datetime for analysis.

Pickup point: Either City or Airport.

Status: Indicates whether the trip was completed, canceled, or no car was available.

Initial observations:

There are no duplicate values in the dataset.

The Drop timestamp column has missing values because of cancellations or unavailability of cars.

Data cleaning is required, including timestamp conversion and feature extraction (hour, day, etc.) for deeper analysis.

## ***2. Understanding Your Variables***

In [None]:
# Display all column names
print("Columns in the dataset:")
print(df.columns.tolist())


In [None]:
# Dataset statistical summary
print("Statistical summary of numerical columns:")
print(df.describe(include='all'))


### Variables Description

The dataset consists of the following key variables:

Request id: Unique identifier for each request.

Pickup point: Location from which the trip is requested (City/Airport).

Driver id: Unique identifier for the driver (if assigned).

Status: Current state of the trip request (Completed, Canceled, or No Cars Available).

Request timestamp: Date and time when the ride was requested.

Drop timestamp: Date and time when the ride ended (if applicable).

### Check Unique Values for each variable.

In [None]:
# Unique values for each variable
for col in df.columns:
    print(f"{col} : {df[col].nunique()} unique values")


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Convert timestamps to datetime format
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True, errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], dayfirst=True, errors='coerce')

# Create new columns for analysis
df['Request Hour'] = df['Request timestamp'].dt.hour
df['Request Day'] = df['Request timestamp'].dt.day
df['Request Date'] = df['Request timestamp'].dt.date
df['Day_of_Week'] = df['Request timestamp'].dt.day_name()

# Calculate Trip Duration (for completed trips only)
df['Trip_Duration'] = (df['Drop timestamp'] - df['Request timestamp']).dt.total_seconds() / 60

# Handle missing values in 'Drop timestamp'
# Missing drop timestamps indicate canceled or unfulfilled rides
print("Missing Drop timestamp count:", df['Drop timestamp'].isnull().sum())

# Handle any null values in important columns
# (Keeping Drop timestamp NaN for understanding demand-supply gap)
df['Drop timestamp'].fillna(pd.NaT, inplace=True)

# Remove duplicates if any
df.drop_duplicates(inplace=True)

# Check for null values after cleaning
print("\nMissing values after cleaning:")
print(df.isnull().sum())

# Check dataset shape
print("\nDataset Shape after cleaning:", df.shape)

# Preview cleaned data
print("\nData after cleaning:")
print(df.head())


### What all manipulations have you done and insights you found?

Manipulations:

Converted Request timestamp and Drop timestamp to datetime for better time-based analysis.

Extracted Request Hour and Request Day to analyze peak demand patterns.

Identified missing values in Drop timestamp, which indicate supply shortages or cancellations.

Removed duplicate entries to maintain data quality.

Insights :

Around 2500+ rides have no Drop timestamp, meaning these requests were unfulfilled.

Most missing drop timestamps occur during peak morning and evening hours, showing a demand-supply gap.

The extracted hour feature will help visualize peak demand hours.



## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 : Distribution of Ride Status

In [None]:
# Chart 1: Distribution of Ride Status
plt.figure(figsize=(7,5))
sns.countplot(x='Status', data=df, palette='Set2')
plt.title('Distribution of Ride Status', fontsize=14)
plt.xlabel('Ride Status')
plt.ylabel('Number of Requests')
plt.show()


##### 1. Why did you pick the specific chart?

To get an overall understanding of how many rides were successfully completed vs. those canceled or unfulfilled. This helps to identify if there is a significant issue of cancellations or lack of cars, which directly impacts the business.

##### 2. What is/are the insight(s) found from the chart?

A large proportion of rides fall under "No Cars Available", especially during certain time windows.

Cancellations also contribute significantly to the supply-demand mismatch.

Successful rides are less than expected, highlighting an operational inefficiency in Uber's system during peak hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Identifying high "No Cars Available" cases can help Uber allocate more drivers during peak times, reducing customer dissatisfaction and improving revenue. Similarly, understanding cancellation trends can guide policy changes for drivers.

#### Chart - 2: Requests by Hour of Day

In [None]:
# Chart 2: Requests by Hour of Day
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True)
df['Hour'] = df['Request timestamp'].dt.hour

plt.figure(figsize=(8,5))
sns.countplot(x='Hour', data=df, palette='coolwarm')
plt.title('Ride Requests by Hour of Day', fontsize=14)
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.show()


##### 1. Why did you pick the specific chart?

To identify the peak hours for ride requests, which is crucial for understanding demand trends and planning driver allocation strategies.

##### 2. What is/are the insight(s) found from the chart?

Ride requests peak during 7–9 AM and 5–8 PM, which corresponds to office commute timings. Midday and late-night requests are relatively low.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Allocating more drivers during peak hours can improve service availability, reduce cancellations, and enhance customer satisfaction. Failure to manage these hours leads to unmet demand and revenue loss.

#### Chart - 3 : Demand vs. Supply Status by Hour

In [None]:
# Chart 3: Demand vs Supply by Hour
status_hour = df.groupby(['Hour','Status']).size().unstack()
status_hour.plot(kind='bar', stacked=True, figsize=(12,6), colormap='Paired')
plt.title('Ride Status by Hour of Day', fontsize=14)
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.xticks(rotation=0)
plt.show()


##### 1. Why did you pick the specific chart?

To understand the supply-demand gap across different hours and identify when unavailability or cancellations are highest.

##### 2. What is/are the insight(s) found from the chart?

High No Cars Available during evening (5–8 PM).

High Cancellations during morning peak (7–9 AM).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Ensuring sufficient drivers during peak hours and incentivizing them to accept rides during the morning rush can reduce cancellations and improve customer trust.

#### Chart - 4 : Pickup Point Analysis

In [None]:
# Chart 4: Pickup Point Analysis
plt.figure(figsize=(7,5))
sns.countplot(x='Pickup point', data=df, palette='Set3')
plt.title('Requests by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.show()


##### 1. Why did you pick the specific chart?

To determine the most frequent pickup location (City vs. Airport), helping prioritize resource allocation.

##### 2. What is/are the insight(s) found from the chart?

Most requests originate from the City, indicating the primary focus should be on city demand management.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Prioritizing drivers in city areas during rush hours can maximize revenue and reduce wait times for customers.

#### Chart - 5 : Status by pickup point

In [None]:
# Chart 5: Status by Pickup Point
plt.figure(figsize=(8,6))
sns.countplot(x='Pickup point', hue='Status', data=df, palette='muted')
plt.title('Ride Status by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.legend(title='Status')
plt.show()


##### 1. Why did you pick the specific chart?

To analyze whether supply-demand gaps differ between the City and Airport pickup points.

##### 2. What is/are the insight(s) found from the chart?

Airport rides face higher "No Cars Available".

City rides face more cancellations by drivers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Introducing airport-specific incentives and policies to discourage driver cancellations in the city can improve operational efficiency.

#### Chart - 6 : Heatmap – Hour vs. Status

In [None]:
# Chart 6: Heatmap for Hour vs Status
pivot = df.pivot_table(index='Hour', columns='Status', values='Request id', aggfunc='count')
plt.figure(figsize=(10,6))
sns.heatmap(pivot, annot=True, fmt='g', cmap='YlGnBu')
plt.title('Heatmap: Hour vs Status', fontsize=14)
plt.show()


##### 1. Why did you pick the specific chart?

To visualize the critical hours where cancellations or unavailability are most problematic in one consolidated view.

##### 2. What is/are the insight(s) found from the chart?

Morning (7–9 AM): High cancellations.

Evening (5–8 PM): High unavailability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These insights help Uber optimize driver scheduling and incentive programs during high-demand periods to close the supply-demand gap.

#### Chart - 7 : Average Trip Duration by Pickup Point

In [None]:
# Chart 7: Average Trip Duration by Pickup Point
plt.figure(figsize=(8, 5))
sns.barplot(data=df, x='Pickup point', y='Trip_Duration', ci=None, palette='viridis')
plt.title('Average Trip Duration by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Average Trip Duration (minutes)', fontsize=12)
plt.show()


##### 1. Why did you pick the specific chart?

To analyze if trip duration varies significantly between City and Airport pickup points. This helps understand travel patterns and potential bottlenecks.

##### 2. What is/are the insight(s) found from the chart?

Trips starting from Airport generally have longer durations compared to those starting from the City.

This indicates the need for more accurate ETA predictions and optimized route suggestions for Airport trips.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. By understanding trip duration differences, Uber can improve scheduling, allocate drivers efficiently, and predict surge pricing better for Airport pickups, leading to improved customer satisfaction and optimized revenue.

#### Chart - 8 :  Hourly Demand for Completed Trips vs. Canceled Trips

In [None]:
# Chart 8: Hourly Demand for Completed vs. Canceled Trips
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='Hour', hue='Status', palette='coolwarm')
plt.title('Hourly Demand: Completed vs. Canceled Trips', fontsize=14)
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Status')
plt.show()


##### 1. Why did you pick the specific chart?

To identify time slots where cancellations are highest, which is critical for reducing demand-supply mismatch and improving service reliability.

##### 2. What is/are the insight(s) found from the chart?

Morning peak (7–9 AM): Highest cancellations by drivers.

Evening peak (5–8 PM): “No cars available” dominates, indicating driver shortage during these hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Implementing driver incentives during peak hours and introducing dynamic pricing during these time slots can reduce cancellations and boost supply, resulting in improved service availability.

#### Chart - 9 : Pickup Point vs. Status Analysis

In [None]:
# Chart 9: Pickup Point vs. Status Analysis
plt.figure(figsize=(8, 5))
sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set2')
plt.title('Pickup Point vs. Status', fontsize=14)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Status')
plt.show()


##### 1. Why did you pick the specific chart?

To compare demand and supply at City vs. Airport, helping Uber allocate drivers based on area-specific requirements.

##### 2. What is/are the insight(s) found from the chart?

City: Majority of cancellations occur here, especially during morning peak hours.

Airport: Major issue is “No cars available” in the evening, highlighting the need for more drivers at Airport during evening hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. By strategically positioning drivers at the Airport during evening hours and enforcing strict cancellation policies in the City during mornings, Uber can reduce mismatch and increase revenue.

#### Chart - 10 : Weekly Demand Trend

In [None]:
# Chart 10: Weekly Demand Trend
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Day_of_Week', hue='Status', palette='muted')
plt.title('Weekly Demand Trend by Status', fontsize=14)
plt.xlabel('Day of the Week', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Status')
plt.show()


##### 1. Why did you pick the specific chart?

To examine if demand and supply issues vary by day of the week, which helps plan weekly driver deployment strategies.

##### 2. What is/are the insight(s) found from the chart?

Weekdays (Mon–Fri): Demand and mismatch is highest, especially during working hours.

Weekends: Slight improvement, but evening peak remains an issue.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Weekly demand patterns help Uber plan driver shifts, schedule surge pricing, and run promotions during low-demand periods to optimize revenue.

#### Chart - 11 : Demand vs. Supply Gap by Pickup Point

In [None]:
# Chart 11: Demand vs Supply Gap by Pickup Point
plt.figure(figsize=(8, 6))
sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set2')
plt.title('Demand vs Supply by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.legend(title='Status')
plt.show()


##### 1. Why did you pick the specific chart?

To analyze the distribution of completed, canceled, and unfulfilled rides across City and Airport, which highlights where the supply-demand gap is the highest.

##### 2. What is/are the insight(s) found from the chart?

Most unfulfilled rides occur from the City, indicating a major supply shortage in urban areas compared to airports.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help Uber allocate more drivers to City pickup points during high-demand hours to reduce cancellations and unfulfilled rides.

#### Chart - 12 : Hourly Analysis for Pickup Points

In [None]:
# Chart 12: Hourly Demand vs Supply for Pickup Points
plt.figure(figsize=(12, 6))
sns.countplot(data=df, x='Request Hour', hue='Pickup point', palette='coolwarm')
plt.title('Hourly Requests by Pickup Point', fontsize=14)
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.legend(title='Pickup Point')
plt.show()


##### 1. Why did you pick the specific chart?

To identify at which hours City vs Airport faces the highest demand, enabling precise scheduling for drivers.



##### 2. What is/are the insight(s) found from the chart?

City: High demand between 5–9 AM and 5–9 PM.

Airport: Demand peaks around early morning and late evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Helps in driver reallocation during peak hours for respective pickup points.

#### Chart - 13 : Trip Duration Analysis for Completed Rides

In [None]:
# Chart 13: Trip Duration for Completed Rides
plt.figure(figsize=(8, 6))
sns.histplot(df[df['Status'] == 'Trip Completed']['Trip_Duration'], bins=30, kde=True, color='green')
plt.title('Trip Duration Distribution (Completed Rides)', fontsize=14)
plt.xlabel('Trip Duration (minutes)')
plt.ylabel('Frequency')
plt.show()


##### 1. Why did you pick the specific chart?

To evaluate how long most trips take, which is useful for fare optimization and driver efficiency analysis.

##### 2. What is/are the insight(s) found from the chart?

Most completed trips take between 15–40 minutes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, optimizing fares and incentives for longer trips can boost driver satisfaction and service quality.

#### Chart - 14 - Day-wise Demand Trend

In [None]:
# Chart 14: Day-wise Demand Trend
plt.figure(figsize=(10, 6))
sns.countplot(data=df, x='Request Day', palette='viridis')
plt.title('Day-wise Ride Request Trend', fontsize=14)
plt.xlabel('Day of Month')
plt.ylabel('Number of Requests')
plt.show()


##### 1. Why did you pick the specific chart?

To check if demand varies significantly across different days of the month.

##### 2. What is/are the insight(s) found from the chart?

Demand is relatively consistent, with slight peaks during weekends.

#### Chart - 15 - Pair Plot for Numerical Features

In [None]:
# Chart 15: Pair Plot
sns.pairplot(df[['Request Hour', 'Trip_Duration']], diag_kind='kde')
plt.suptitle('Pair Plot of Numerical Features', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

To understand the correlation between time of request and trip duration.

##### 2. What is/are the insight(s) found from the chart?

No strong correlation, meaning trip duration is not dependent on request time.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To address the supply-demand gap and improve customer satisfaction, Uber should implement the following strategies:

Driver Incentives during Peak Hours: Offer bonuses to drivers during high-demand periods (e.g., 5–9 AM, 5–9 PM) to increase availability and reduce cancellations.

Dynamic Driver Allocation: Use predictive analytics to position drivers in high-demand areas, especially in the City during rush hours.

Real-Time Demand Monitoring: Implement real-time alerts for surge demand zones and notify idle drivers to relocate.

Customer Communication: Provide estimated wait times and incentivize customers to choose flexible time slots during peak periods.

These measures will help balance demand and supply, reducing unfulfilled requests and enhancing overall efficiency.



# **Conclusion**

The EDA reveals significant insights into Uber's supply-demand dynamics. The main findings show that:

Peak Demand: Most requests occur during morning and evening rush hours (City).

Major Supply Gap: A large portion of unfulfilled rides originates from the City, especially during peak times.

Airport vs City: Airport requests are relatively well-served, while City faces severe shortages.

Status Breakdown: A significant number of trips are canceled or remain unassigned, indicating a shortage of drivers during specific time windows.

By leveraging these insights, Uber can improve operational efficiency through dynamic driver allocation, incentive schemes, and advanced demand forecasting. Implementing these recommendations will help reduce cancellations, improve user satisfaction, and enhance revenue growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***