<a href="https://colab.research.google.com/github/ajaymolsivan/Uber-Ride-Request-Analysis/blob/main/Sample_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -Uber Ride Request Analysis






##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

This project focuses on analyzing the Uber ride request data to uncover key operational issues affecting ride fulfillment. The dataset consists of thousands of ride requests made to Uber, including details such as pickup point (City or Airport), request timestamp, drop timestamp, driver ID, and the final status of the ride (Trip Completed, Cancelled, or No Cars Available).

The primary goal of this project was to perform a comprehensive exploratory data analysis to understand the patterns in ride requests and diagnose the root causes of service failures. The analysis began with data cleaning and preprocessing: converting timestamp fields to datetime format, handling missing values, checking for duplicates, and creating new features like hour, day, and time slots.

Through visualizations and statistical exploration, several key trends were identified. Most ride requests occur during two specific windows — morning rush (5–9 AM) and evening rush (5–9 PM). During the morning, a high number of cancellations were observed in trips originating from the City, while the evening showed a large number of "No Cars Available" responses, especially for pickups from the Airport. This indicates a mismatch between demand and supply during critical time windows.

Further, it was found that most failed requests had no driver assigned, suggesting inefficiencies in Uber's driver allocation strategy. A correlation heatmap and pair plots showed that numerical relationships in the dataset were weak, emphasizing that categorical and temporal factors drive most of the insights.

The insights gained from this analysis directly point toward actionable business strategies. These include enhancing driver availability during rush hours, dynamically rebalancing driver deployment between City and Airport, improving driver-passenger matching algorithms, and leveraging predictive models to anticipate demand spikes. If implemented, these solutions could significantly improve trip completion rates, reduce customer dissatisfaction, and positively impact Uber's operational efficiency and market reputation.

This EDA project not only answers the "what" and "when" of service failures but also provides recommendations on "how" to resolve them, bridging the gap between data and decision-making.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Uber is experiencing a high number of unfulfilled ride requests, especially during peak hours. These unfulfilled requests — marked as “Cancelled” or “No Cars Available” — lead to customer dissatisfaction and revenue loss. The objective is to identify patterns in trip requests and determine the operational bottlenecks causing these service failures.

#### **Define Your Business Objective?**

The business objective is to uncover the key factors contributing to unfulfilled Uber ride requests and propose actionable solutions that improve the overall fulfillment rate. The goal is to optimize driver allocation, minimize cancellations and car unavailability, and ensure a smoother ride experience for users, especially during critical time slots and locations.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('/mnt/data/Uber Request Data.csv')


### Dataset First View

In [None]:
# Dataset First Look
df.head()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape


### Dataset Information

In [None]:
# Dataset Info
df.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()


In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False)


### What did you know about your dataset?

The dataset contains detailed information about ride requests made through Uber over a specific time period. It includes variables such as Request id, Pickup point, Driver id, Status, Request timestamp, and Drop timestamp. Upon initial exploration, I observed that the Status column captures the final outcome of each request—whether the trip was completed, cancelled, or no cars were available. The Request timestamp and Drop timestamp fields were initially in string format and required conversion to datetime for time-based analysis. Several missing values were identified, especially in the Drop timestamp and Driver id columns, which were mostly associated with unfulfilled requests. The dataset has a mix of categorical and datetime variables, with no duplicate records found. These insights laid the groundwork for further data cleaning, feature engineering, and analysis to identify patterns in demand, cancellations, and supply gaps across different hours and pickup locations.



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns


In [None]:
# Dataset Describe
df.describe(include='all')

### Variables Description

The Uber Request Data dataset contains information about ride requests made through the Uber platform. Each row in the dataset represents a unique ride request and includes several important variables. The Request id serves as a unique identifier for each trip. The Pickup point indicates the location from where the trip was requested, either the City or the Airport. The Driver id represents the assigned driver's identifier, which may be missing if a trip was cancelled or no car was available. The Status column shows the final outcome of the request—commonly Trip Completed, Cancelled, or No Cars Available. The Request timestamp records the date and time the ride was requested, while the Drop timestamp captures when the ride ended. This column is often blank when the trip was not completed.

During data wrangling, new variables can be derived to aid analysis. These include the hour and day of the request (extracted from the Request timestamp), custom time slots like morning or evening peak, and trip duration calculated from the time difference between drop and request timestamps. These features help uncover patterns in demand, cancellation rates, and service gaps at different times of day or pickup locations.Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
    print(f"{col} → {df[col].nunique()} unique values")
    print(df[col].unique(), '\n')


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'])
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], errors='coerce')

df['hour'] = df['Request timestamp'].dt.hour
df['day'] = df['Request timestamp'].dt.day_name()


### What all manipulations have you done and insights you found?

Datetime Conversion:

Converted Request timestamp and Drop timestamp to proper datetime formats using pd.to_datetime.

Used errors='coerce' to handle missing or malformed timestamps.

Feature Engineering:

Extracted hour, day of week, and date from Request timestamp.

Created a custom time slot column to categorize time into ranges like "Early Morning", "Morning Rush", "Daytime", and "Evening Rush".

Calculated trip duration (in minutes) for completed trips by subtracting Request timestamp from Drop timestamp.

Missing Value Handling:

Identified missing values in Drop timestamp and Driver id for unfulfilled trips.

Used isnull() to analyze how missing values correlate with Status.

Categorical Cleaning:

Standardized categories in Pickup point and Status columns if needed (e.g., fixing inconsistent case or spacing).

Removed Duplicates:

Checked for and confirmed the absence (or presence) of duplicate rows using df.duplicated().


Insights You Found:
Demand Patterns:

The number of requests is significantly higher during Morning Rush (5 AM – 9 AM) and Evening Rush (5 PM – 9 PM).

Most requests in the morning originate from the City going toward the Airport.

Supply Issues:

No Cars Available is the most frequent issue during the Evening Rush, particularly at the Airport pickup point.

In contrast, Cancelled trips are more common in the Morning Rush, especially from the City.

Peak Hours vs Status:

Trip Completed status dominates during off-peak hours.

Unfulfilled requests (Cancelled or No Cars Available) increase significantly during rush hours, suggesting a supply-demand mismatch.

Pickup Point Analysis:

The City faces higher cancellation rates.

The Airport suffers more from "No Cars Available" issues, indicating fewer available drivers during certain hours.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
sns.countplot(data=df, x='Status')
plt.title('Overall Trip Status Distribution')
plt.show()



##### 1. Why did you pick the specific chart?

To understand the overall performance of Uber requests — how many are completed vs failed.

##### 2. What is/are the insight(s) found from the chart?

A large portion of trips are either cancelled or marked as “No Cars Available”.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Negative — This indicates a high failure rate in fulfilling customer requests, leading to customer dissatisfaction and loss of revenue.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
sns.countplot(data=df, x='Pickup point')
plt.title('Requests by Pickup Point')
plt.show()


##### 1. Why did you pick the specific chart?

To compare the number of requests originating from the City and Airport.


##### 2. What is/are the insight(s) found from the chart?


Slightly more requests are from the City than the Airport.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


 Positive — Helps allocate more resources or drivers to high-demand areas.
If ignored, mismatch in supply at key locations can cause service failure.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
sns.countplot(data=df, x='hour')
plt.title('Requests by Hour of the Day')
plt.show()


##### 1. Why did you pick the specific chart?

To identify peak demand hours during the day.


##### 2. What is/are the insight(s) found from the chart?


Sharp demand spikes in the morning (5–9 AM) and evening (5–9 PM).



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


positive — Uber can increase driver availability during peak hours.If not acted upon, unmet demand during these hours can hurt user trust.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
sns.countplot(data=df, x='hour', hue='Status')
plt.title('Trip Status by Hour')
plt.show()


##### 1. Why did you pick the specific chart?

To explore the success/failure of trips across different hours.


##### 2. What is/are the insight(s) found from the chart?


Most failures (cancelled/no car) happen during peak hours.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


 Negative — Demand-supply gap is evident during rush hours. Addressing this can significantly improve performance.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
sns.countplot(data=df, x='Pickup point', hue='Status')
plt.title('Trip Status by Pickup Point')
plt.show()


##### 1. Why did you pick the specific chart?

To understand how pickup location affects trip success or failure.





##### 2. What is/are the insight(s) found from the chart?


City has more cancellations, Airport has more "No Cars Available".


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Positive — Customized driver deployment strategies can be built for each pickup point.
 Ignoring this can lead to local inefficiencies.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
morning_rush = df[(df['hour'] >= 5) & (df['hour'] <= 9)]
sns.countplot(data=morning_rush, x='hour', hue='Status')
plt.title('Morning Rush Trip Status')
plt.show()


##### 1. Why did you pick the specific chart?

To analyze performance during the crucial 5–9 AM slot.



##### 2. What is/are the insight(s) found from the chart?


Heavy cancellations from City in morning rush.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Negative — Many customers commuting to the Airport are affected. Leads to loss of brand reliability during important hours.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
evening_rush = df[(df['hour'] >= 17) & (df['hour'] <= 21)]
sns.countplot(data=evening_rush, x='hour', hue='Status')
plt.title('Evening Rush Trip Status')
plt.show()


##### 1. Why did you pick the specific chart?

To analyze service in the evening rush (5–9 PM).


##### 2. What is/are the insight(s) found from the chart?


Most requests from Airport end with “No Cars Available”.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Negative — Passengers arriving at Airport are stranded, causing serious customer dissatisfaction.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
df['day'] = df['Request timestamp'].dt.day_name()
sns.countplot(data=df, x='day', hue='Status')
plt.title('Trip Status by Day of Week')
plt.show()



##### 1. Why did you pick the specific chart?

To assess if problems are worse on specific weekdays.


##### 2. What is/are the insight(s) found from the chart?


Patterns remain consistent; no significant weekday variation.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive — Uber can focus more on hour-based optimization than weekday-based.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
heatmap_data = df.groupby(['hour', 'Pickup point'])['Request id'].count().unstack()
sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='Blues')
plt.title('Requests by Hour and Pickup Point')
plt.show()


##### 1. Why did you pick the specific chart?

To visualize request volume by hour and location together.


##### 2. What is/are the insight(s) found from the chart?


City is busy in morning; Airport in evening.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.


Positive — Helps in planning dynamic driver deployment strategies.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
cancelled = df[df['Status'] == 'Cancelled']
cancelled['date'] = cancelled['Request timestamp'].dt.date
cancelled.groupby('date')['Request id'].count().plot(kind='line')
plt.title('Cancelled Requests Over Time')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

To see if cancellations increase over time or remain consistent.


##### 2. What is/are the insight(s) found from the chart?


Cancellations peak daily in morning.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Negative — Indicates a persistent problem, not a one-time issue. Needs immediate operational fix.



#### Chart - 11

In [None]:
# Chart - 11 visualization code
no_car = df[df['Status'] == 'No Cars Available']
no_car['date'] = no_car['Request timestamp'].dt.date
no_car.groupby('date')['Request id'].count().plot(kind='line', color='orange')
plt.title('No Cars Available Over Time')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

To study trends in unavailability of vehicles.



##### 2. What is/are the insight(s) found from the chart?

"NCAs" peak daily in evening.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Negative — Failure to meet demand during key evening hours means lost revenue and users shifting to competitors.Answer Here.Answer Here


#### Chart - 12

In [None]:
# Chart - 12 visualization code
df['Has Driver'] = df['Driver id'].notnull()
sns.countplot(data=df, x='Status', hue='Has Driver')
plt.title('Driver Allocation and Trip Outcome')
plt.show()


##### 1. Why did you pick the specific chart?

To examine how much trip outcome depends on driver assignment.


##### 2. What is/are the insight(s) found from the chart?


Trips without a driver mostly end up unfulfilled.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Positive — Suggests better driver assignment algorithms could improve trip completion rate.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
df['trip_duration'] = (df['Drop timestamp'] - df['Request timestamp']).dt.total_seconds() / 60
completed = df[df['Status'] == 'Trip Completed']
sns.histplot(data=completed, x='trip_duration', bins=30, kde=True)
plt.title('Trip Duration Distribution (in minutes)')
plt.show()


##### 1. Why did you pick the specific chart?

To understand time taken for completed trips.



##### 2. What is/are the insight(s) found from the chart?


Most trips last 10–40 mins — helpful for planning.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 Positive — Insights help drivers optimize shift duration and route planning.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
corr_data = df[['hour', 'trip_duration']].corr()
sns.heatmap(corr_data, annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()


##### 1. Why did you pick the specific chart?

To check if numerical variables (like hour and duration) correlate.


##### 2. What is/are the insight(s) found from the chart?


No strong correlation found

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df[['hour', 'trip_duration']].dropna())
plt.suptitle('Pair Plot - Hour vs Trip Duration', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?


To explore relationships and spot patterns/clusters.

##### 2. What is/are the insight(s) found from the chart?


No obvious linear trends or clusters.


## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The primary business objective is to identify why a large number of Uber ride requests are not being fulfilled and how to improve operational efficiency. Based on the analysis, the following actionable suggestions can help Uber address these issues:

Address Peak Hour Failures: Most trip failures (cancellations and no cars available) occur during the morning (5–9 AM) and evening (5–9 PM) rush hours. Uber should increase driver availability during these critical hours, possibly by offering surge incentives or flexible shift planning to drivers.

Location-Based Strategy: The City faces higher cancellation rates in the morning, while the Airport struggles with car unavailability in the evening. Uber should consider implementing location-based driver rebalancing, sending idle drivers from low-demand zones to these high-demand pickup points during peak periods.

Improved Driver Assignment System: Many unfulfilled requests had no driver assigned, indicating a flaw in the matching algorithm or driver availability tracking. Enhancing the driver-passenger matching system can significantly improve fulfillment rates.

Predictive Demand Planning: Using historical data to forecast demand spikes allows for better supply allocation and real-time intervention. Uber can use ML models to anticipate surge areas and proactively deploy drivers.

Customer Communication Improvements: Providing real-time updates, expected wait times, and alternate ride suggestions during peak times could reduce customer frustration and improve retention

# **Conclusion**

The exploratory data analysis of Uber ride requests reveals critical issues related to service unavailability during peak hours. A substantial number of requests are not completed due to either cancellations or lack of available cars, especially from the City in the morning and the Airport in the evening. These failures are not random—they follow consistent hourly and locational patterns. While the dataset shows no strong linear correlation between numerical variables, categorical variables such as Pickup point, Status, and hour offer valuable insights. The solution lies in better driver allocation, real-time demand forecasting, and operational adjustments tailored to both time and location. Implementing these recommendations will not only improve trip fulfillment rates but also enhance customer satisfaction and revenue growth.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***