# **Project Name**    - Uber Supply & Demand GAP - EDA



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Aditya Anilkumar


# **Project Summary**

This project analyzes Uber ride request data to identify patterns and imbalances in supply and demand across different times of the day and pickup locations (City vs Airport).

Using Python (Pandas, NumPy, Matplotlib, Seaborn), the goal is to uncover the root causes behind incomplete trip requests (due to cancellations or unavailability of cars) and recommend data-driven solutions.

# **GitHub Link -**

Provide your GitHub Link here.

https://github.com/aditya18101999

# **Problem Statement**


The objective of this project is to analyze the supply-demand gap in Uber requests using the provided dataset. We aim to identify the core reasons for the unmet demand for Uber rides and pinpoint the specific times and locations where this gap is most significant. By understanding these patterns, we can propose data-driven recommendations to help Uber balance its supply and demand more effectively.

#### **Define Your Business Objective?**

The primary goal of this Exploratory Data Analysis (EDA) is to:
1.  Understand the distribution of Uber requests and their statuses (completed, cancelled, no cars available)
2.  Identify peak demand hours and time slots.
3.  Analyze the demand patterns and fulfillment rates across different pickup points (Airport vs. City).
4.  Quantify and visualize the supply-demand gap.
5.  Derive actionable insights and propose recommendations to mitigate the gap.

This EDA leverages Python with the Pandas and NumPy libraries, complementing the prior analysis performed using Excel and SQL, as outlined in the project briefing.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('Cleaned_Uber Request Data.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

## ***2. Understanding Your Variables***

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
missing_percent = df.isnull().mean() * 100
missing_percent = missing_percent[missing_percent > 0].sort_values(ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x=missing_percent.values, y=missing_percent.index, palette="magma")
plt.title("Missing Data Percentage by Column")
plt.xlabel("Percentage (%)")
plt.ylabel("Column Name")
plt.show()

### Check Unique Values for each variable.

In [None]:
# Dataset Columns
df.columns

In [None]:
# To get the count of occurrences for each unique value in 'Pickup point'
print("\n--- Count of Occurrences for each 'Pickup point' ---")
print(df['Pickup point'].value_counts())

# You can apply the same method to other categorical columns:
print("\n--- Count of Occurrences for each 'Status' ---")
print(df['Status'].value_counts())

print("\n--- Count of Occurrences for each 'Time Slot' ---")
print(df['Time Slot'].value_counts())

In [None]:
# Dataset Describe
df.describe()

### What did you know about your dataset?

What I Know About the Dataset  
The dataset, Cleaned_Uber Request Data.csv, serves as the main data for our study of the Uber supply-demand gap. It includes individual Uber request records and provides insights into different aspects of each ride request.  

One important observation from the initial data inspection is the presence of NA (Not Available) values in the Drop timestamp column. This is a key indicator, as it directly relates to requests that were either 'Cancelled' by the user or driver or where 'No Cars Available' were found to fulfill the request. This feature is essential for identifying and measuring the unmet demand for Uber rides. The data seems to capture requests from July 2016, based on the format and range of the Request timestamp entries.

* The dataset includes columns like `Request id`, `Pickup point`, `Status`, `Request timestamp`, `Request Date`, `Request hour`, `Time Slot`, and `Drop timestamp`.
* `Request timestamp` and `Drop timestamp` are currently string types and need to be converted to datetime objects for time-based analysis.
* The `Drop timestamp` column has many missing values. This is expected, since trips that were 'Cancelled' or had 'No Cars Available' would not have a drop timestamp. This information is important for understanding unmet demand.
* `Pickup point` and `Status` are important categorical variables for our analysis.
* NO Duplicate Values to be handled.

### Variables Description

*Request id*: This column contains a unique numerical identifier for each Uber request recorded in the dataset.

*Pickup point*: This variable specifies where the Uber request originates. The main categories are 'Airport' and 'City'.

*Status*: This variable indicates the final outcome of the Uber request. It can have one of three values:

* 'Trip Completed': The ride was successfully finished.

* 'Cancelled': The ride request was cancelled, either by the rider or the driver.

* 'No Cars Available': No Uber vehicle was available to fulfill the request.

*Request timestamp*: This column records the exact date and time when the user initiated the Uber request.

*Request Date*: This column provides the date, which is extracted from the Request timestamp.

*Request hour*: This column shows the hour of the day (0-23) when the request was made, taken from the Request timestamp.

*Time Slot*: This variable groups specific hours of the day into broader segments, such as 'Morning', 'Evening', and 'Night'. This helps analyze demand patterns across set periods.

*Drop timestamp*: This column records the exact date and time when a completed Uber trip ended. It contains NA values for requests that were not fulfilled, such as those with 'Cancelled' or 'No Cars Available' statuses.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# Convert 'Request timestamp' and 'Drop timestamp' to datetime objects
# 'errors='coerce'' will turn invalid parsing into NaT (Not a Time)
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], format='%d-%m-%Y %H:%M', errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], format='%d-%m-%Y %H:%M', errors='coerce')

# Verify the data types after conversion
print("\n--- DataFrame Info after Datetime Conversion ---")
print(df.info())

# Derive 'Trip Duration' in minutes for completed trips
# For 'Cancelled' and 'No Cars Available' statuses, 'Trip Duration' will be NaN, which is appropriate
df['Trip Duration'] = (df['Drop timestamp'] - df['Request timestamp']).dt.total_seconds() / 60

# Display head with new 'Trip Duration' column
print("\n--- First 5 Rows with 'Trip Duration' ---")
print(df.head())

### What all manipulations have you done and insights you found?

Following the initial data review, I made the necessary changes to the Cleaned_Uber Request Data.csv to prepare it for analysis such as Used pre-created Request hour and Time Slot columns to segment the day into logical time periods. This process revealed several key insights about the Uber supply-demand gap.

Data Manipulations Performed:  
The main change was converting the raw timestamp strings into a usable datetime format. This step is crucial for analyzing time series and calculating durations.

Timestamp Conversion:  
The Request timestamp and Drop timestamp columns, which were originally stored as strings, were converted into proper datetime objects using Pandas' to_datetime function. This change allows for accurate time-based calculations and filtering.

For the Drop timestamp, the conversion handled errors by changing invalid parses (such as for canceled or unavailable trips) to NaT (Not a Time). This correctly indicates the lack of a drop time for unfulfilled requests.

Feature Engineering (Trip Duration):  
I calculated a new column, Trip Duration, for completed trips. This was done by subtracting the Request timestamp from the Drop timestamp and converting the result into minutes. This metric helps us understand the length of successful rides.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 Status Distribution (Trip Completed / Cancelled / No Cars Available)

In [None]:
# Chart - 1 visualization
# Chart 1: Trip Status Distribution
sns.countplot(data=df, x='Status', palette='Set2')
plt.title("Trip Request Status Distribution")
plt.xlabel("Trip Status")
plt.ylabel("Number of Requests")
plt.grid(axis='y')
plt.show()

##### 1. Why did you pick the specific chart?

* I chose a countplot (bar chart) because it clearly shows how many ride requests fall under each trip status:

  * Trip Completed

  * Cancelled

  * No Cars Available

* It’s a simple and effective way to understand the overall performance of the Uber service and detect any imbalance or service issues.

##### 2. What is/are the insight(s) found from the chart?

A large number of requests were not completed, either due to driver cancellations or no cars being available, especially during peak times.

"No Cars Available" is slightly more frequent than cancellations, indicating a driver supply shortage issue.

Completed trips form less than two-thirds of total requests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights are critical for operational planning:

They highlight the unfulfilled demand, helping Uber optimize driver availability.

The company can improve customer satisfaction and revenue by reducing cancellations and car unavailability.

Negative impact insight:

High "No Cars Available" requests suggest lost revenue and poor customer experience, leading to potential churn.

Without corrective action, this could negatively affect Uber’s brand perception and market share during peak hours.

#### Chart - 2 Trip Status vs Time Slot

In [None]:
# Chart - 2 visualization code
# Chart 2: Trip Status vs Time Slot
sns.countplot(data=df, x='Time Slot', hue='Status', palette='coolwarm')
plt.title("Trip Status by Time Slot")
plt.xlabel("Time Slot")
plt.ylabel("Number of Requests")
plt.xticks(rotation=45)
plt.grid(axis='y')
plt.legend(title='Status')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This clustered bar chart (countplot) lets us compare how trip statuses vary across different time slots (e.g., Morning, Evening, Night).
I chose this chart because time-based supply-demand trends are critical for operational decisions in a real-time ride-sharing business like Uber.

##### 2. What is/are the insight(s) found from the chart?

* Morning (5 AM – 10 AM): High number of "No Cars Available" – indicating a supply shortage.

* Evening (5 PM – 10 PM): Large number of "Cancelled" trips, especially from City pickups.

* Late Night: Fewer requests, but still some availability issues.

Overall, the supply-demand gap is most prominent during Morning and Evening, the common office commute times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help Uber align driver availability with demand:

Morning supply shortage: Suggests the need for more drivers or incentive schemes near the Airport.

Evening cancellations: May indicate driver fatigue, shift changes, or traffic issues — Uber can work on driver retention, incentives, or staggered scheduling.

Negative impact insight:

If left unaddressed, the persistent peak-hour failures can lead to customer dissatisfaction, negative app reviews, and long-term revenue loss.



#### Chart - 3 Pickup Point vs Trip Status


In [None]:
# Chart - 3 visualization code
# Chart 3: Pickup Point vs Trip Status
sns.countplot(data=df, x='Pickup point', hue='Status', palette='pastel')
plt.title("Trip Status by Pickup Location")
plt.xlabel("Pickup Point")
plt.ylabel("Number of Requests")
plt.grid(axis='y')
plt.legend(title='Status')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

This clustered bar chart was selected to show how trip outcomes vary based on the pickup location: City or Airport.
It's important to visualize this dimension because Uber's demand-supply dynamics can differ significantly between urban centers and transport hubs.

##### 2. What is/are the insight(s) found from the chart?

* Airport: Has a higher number of "No Cars Available" — especially in the morning, indicating a lack of inbound drivers to meet demand.

* City: Shows more cancellations, likely due to driver-side cancellations or traffic constraints during peak times.

* Completed trips are reasonably distributed across both locations but are relatively lower during peak periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Absolutely. These findings allow Uber to fine-tune its geo-specific driver allocation:

* Airport issues can be resolved by rebalancing fleet movement — encouraging drivers to remain near the Airport during expected high-demand windows.

* City cancellations can be addressed through incentives, better driver training, or dynamic pricing during high-traffic zones or rush hours.

Negative impact insight:

* Unreliable Airport pickups — especially for time-sensitive customers — can significantly hurt brand trust and user retention.

* City cancellations may result in higher app uninstalls or customer churn, especially if repeated.



#### Chart - 4 Hourly trip status distribution

In [None]:
# Chart - 4 visualization code
# Chart 4: Hourly trip status distribution
hourly_status = df.groupby(['Request hour', 'Status']).size().unstack().fillna(0)

# Plot stacked bar chart
hourly_status.plot(kind='bar', stacked=True, figsize=(12, 6), colormap='Paired')
plt.title("Hourly Trip Requests by Status")
plt.xlabel("Hour of the Day")
plt.ylabel("Number of Requests")
plt.xticks(rotation=0)
plt.grid(axis='y')
plt.legend(title='Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This stacked bar chart shows how trip outcomes change hour by hour throughout the day.
I picked this visualization to identify hourly peaks in demand, and to detect when Uber fails to meet that demand — either due to no cars available or high cancellations.

##### 2. What is/are the insight(s) found from the chart?

* Peak demand at 9 AM and 6 PM — aligning with standard office commute times.

* Around 9 AM, many requests go unfulfilled due to no cars available, especially for Airport pickups.

* Around 6 PM, there is a spike in cancellations, mainly for City pickups heading to Airport.

* Midday and late-night hours have fewer requests and fewer issues.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. This chart helps Uber schedule driver shifts and vehicle availability more effectively:

Deploy more drivers around 9 AM (Airport) and 6 PM (City) to meet surge demand.

Reduce cancellations by understanding and resolving driver-side constraints during these hours.

Negative impact insight:

Without intervention, missed morning and evening peak requests can lead to frustration, lost bookings, and long-term revenue leakage.

A pattern of failure at key hours may also impact corporate clients and frequent travelers, who value reliability.

#### Chart - 5 Normalized trip status distribution by Time Slot

In [None]:
# Chart - 5 visualization code
# Chart 5: Normalized trip status distribution by Time Slot
time_status_pct = df.groupby(['Time Slot', 'Status']).size().unstack().fillna(0)
time_status_pct = time_status_pct.div(time_status_pct.sum(axis=1), axis=0) * 100

# Plot
time_status_pct.plot(kind='bar', stacked=True, figsize=(10, 6), colormap='tab10')
plt.title("Percentage of Trip Statuses per Time Slot")
plt.xlabel("Time Slot")
plt.ylabel("Percentage of Requests")
plt.xticks(rotation=45)
plt.legend(title='Status')
plt.grid(axis='y')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a normalized stacked bar chart to understand what percentage of requests in each time slot end up completed, cancelled, or unfulfilled.
This view gives deeper insight than raw counts because it shows how efficient each time slot is in terms of successful trips, regardless of volume.

##### 2. What is/are the insight(s) found from the chart?

Morning has the worst fulfillment rate – more than 40% of requests are unfulfilled (mostly due to no cars available).

Evening has a higher share of cancellations, likely driver-driven.

Afternoon and Night slots perform better, with a higher % of completed trips.

The efficiency of Uber operations is significantly lower during peak times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — this chart directly shows how efficiently Uber meets demand across different periods. It’s useful for:

* Prioritizing resource allocation (e.g., more drivers in the Morning).

* Reducing customer dissatisfaction in high-failure time slots.

* Benchmarking operational performance against ideal targets.

Negative impact insight:

* If 40–50% of peak-time users can't get a ride, that’s lost revenue and brand damage.

* Without a fix, this leads to negative reviews, low retention, and increased churn, especially among high-frequency users.

#### Chart - 6 Heatmap of Incomplete Trips by Hour & Pickup Point(Heatmap of Ument Demnad)

In [None]:
# Chart - 6 visualization code
# Filter for incomplete trips only
incomplete = df[df['Status'].isin(['Cancelled', 'No Cars Available'])]

# Group by hour and pickup point
heatmap_data = incomplete.groupby(['Request hour', 'Pickup point']).size().unstack().fillna(0)

# Plot heatmap
plt.figure(figsize=(10,6))
sns.heatmap(heatmap_data, annot=True, fmt='.0f', cmap='Reds', linewidths=0.5)
plt.title("Heatmap of Incomplete Trips (Cancellations + No Cars) by Hour & Pickup Point")
plt.xlabel("Pickup Point")
plt.ylabel("Hour of Day")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a heatmap to highlight the intersection between time (hour) and location (pickup point) where trip failures are most frequent.
This format allows us to visually identify hotspots of demand failure where operational fixes are urgently needed.

##### 2. What is/are the insight(s) found from the chart?

* Airport between 5 AM–10 AM shows the highest failure rate, dominated by "No Cars Available" — a clear morning supply gap.

* City between 5 PM–9 PM has another failure cluster, mainly due to cancellations, hinting at driver-side issues during evening hours.

* Outside peak hours, both locations show minimal failure volume.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Definitely. This heatmap helps Uber make hyper-targeted operational decisions:

* Focus morning resources at the Airport and incentivize drivers to queue there.

* Address evening City cancellations through driver incentives, shorter shift windows, or penalty optimizations.

Negative impact insight:

* Without fixing these hotspots, Uber will continue to lose high-value, peak-time customers who often ride repeatedly.

* It can also trigger surge pricing perception issues if gaps persist without visible improvement.



#### Chart - 7 Heatmap of Requests by Day & Time Slot

In [None]:
# Chart - 7 visualization code
# Add derived feature: Day of week
df['Request Day'] = df['Request timestamp'].dt.day_name()

# Pivot: Time Slot vs Day
request_volume = df.groupby(['Request Day', 'Time Slot']).size().unstack().fillna(0)

# Sort days in order
ordered_days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
request_volume = request_volume.reindex(ordered_days)

# Plot heatmap
plt.figure(figsize=(10,6))
sns.heatmap(request_volume, annot=True, fmt='.0f', cmap='Blues', linewidths=0.5)
plt.title("Heatmap of Total Trip Requests by Day and Time Slot")
plt.xlabel("Time Slot")
plt.ylabel("Day of the Week")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose this heatmap to understand how trip requests fluctuate across days and time slots.
It provides a macro-level view of temporal demand, helping identify weekly peaks and resource strain over the week — a key driver of availability issues.

##### 2. What is/are the insight(s) found from the chart?

* Weekdays (Mon–Fri) see consistently high demand during Morning and Evening slots.

* Fridays have the highest total request volumes, especially in the Evening.

* Weekends (Sat–Sun) show lower demand overall but still experience Morning gaps.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These findings enable day-wise fleet planning and scheduling:

* Uber can increase fleet capacity or driver shifts on Fridays, particularly during Evening.

* On weekends, Uber might reduce operational costs by scaling down drivers during low-traffic hours.

* Heatmaps like this support smart workforce management, cost optimization, and improved customer experience during peak slots.

Negative impact insight:

* If peak-day demand isn’t met (especially on Fridays), Uber risks losing high-revenue rides — especially from repeat customers or professionals relying on the platform.

#### Chart - 8 Daily Request Volume Over Time

In [None]:
# Chart - 8 visualization code
# Convert to date only
df['Request Date'] = pd.to_datetime(df['Request Date'], format='%d-%m-%Y')

# Group by date to get request volume per day
daily_requests = df.groupby('Request Date').size()

# Plot line chart
plt.figure(figsize=(12,6))
sns.lineplot(x=daily_requests.index, y=daily_requests.values, marker='o')
plt.title("Trend of Daily Trip Requests Over Time")
plt.xlabel("Date")
plt.ylabel("Number of Requests")
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I chose a line plot to visualize the trend of trip requests over time.
It helps assess whether demand is increasing, stable, or inconsistent, and identify spikes or drops in usage patterns — which are critical for supply planning.

##### 2. What is/are the insight(s) found from the chart?

* Demand shows consistent peaks and dips — potentially related to weekday–weekend cycles.

* A few days show notably lower activity, possibly due to service outages, driver strikes, or external events (weather, etc.).

* Overall, demand is relatively stable, with some short bursts of higher usage.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — this analysis helps Uber plan for:

* Recurring demand patterns to improve driver scheduling and availability.

* Anomaly detection — understanding the causes behind dips or surges helps Uber prepare better in the future.

* Forecasting — if extended, this could feed into a model for predictive demand allocation.

Negative impact insight:

* Unexpected dips could indicate technical issues or operational inefficiencies, which, if unnoticed, might cause long-term service unreliability.

#### Chart - 9 Correlation Heatmap (Multivariate Analysis)

In [None]:
# Chart - 9 visualization code
# For correlation, we need numeric values
# Convert status to numeric encoding (optional, for analysis only)
df_corr = df.copy()
df_corr['Status_Encoded'] = df_corr['Status'].map({
    'Trip Completed': 1,
    'Cancelled': 0,
    'No Cars Available': -1
})

# Select numeric features
numeric_cols = ['Request hour', 'Status_Encoded']
corr_matrix = df_corr[numeric_cols].corr()

# Plot heatmap
plt.figure(figsize=(6, 4))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title("Correlation Heatmap: Request Hour vs Trip Status")
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

I used a correlation heatmap to explore relationships between numerical variables, especially between Request hour and trip outcomes.
This helps identify if certain hours negatively correlate with successful trip completion.


##### 2. What is/are the insight(s) found from the chart?

A negative correlation is visible between Request hour and trip status, suggesting higher failure rates during peak hours.

This confirms that supply-demand imbalance intensifies as the day progresses, especially in the morning and evening.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — understanding such correlations helps Uber forecast risk periods for trip failures, leading to smarter scheduling and routing decisions.

It also helps prioritize driver availability during hours statistically linked with demand failure.



#### Chart - 10 Pairplot of Numeric Features

In [None]:
# Chart - 10 visualization code


# Use encoded status and hour
sns.pairplot(df_corr[['Request hour', 'Status_Encoded']], diag_kind='kde', kind='scatter')
plt.suptitle("Pairplot: Request Hour vs Trip Status", y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pairplot to visually examine how the numeric variables relate to each other, especially to identify clusters or trends across Request hour and trip success/failure.

##### 2. What is/are the insight(s) found from the chart?

Most failed trips (status -1 or 0) are clustered around early mornings and late evenings.

Completed trips (status = 1) are more spread out, often during mid-day and late night.

Suggests that trip success is time-sensitive, confirming prior analysis.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes — the pairplot helps identify critical time windows where the system underperforms.
Uber can use this to:

Prioritize interventions in those time bands.

Deploy predictive alerts or surge readiness tools based on cluster detection.



## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

The project identified key factors contributing to Uber's supply-demand gap through structured exploratory data analysis using Python. The following solutions were developed:

Time-Based Demand Optimization:

* Morning (5–10 AM) and Evening (5–10 PM) were identified as high-demand times with many unfulfilled requests.

* It was recommended to realign driver shifts and use predictive scheduling models to increase availability during these peak periods.

Location-Based Resource Allocation:

* It was found that airport pickups often face “No Cars Available” during morning hours.

* Geofencing incentives were suggested to keep more drivers near the airport and ensure better coverage.

Driver Behavior Management:

* Higher cancellation rates were noted for city pickups during evening hours.

* It was proposed to introduce cancellation penalties, improve driver engagement programs, or suggest traffic-aware routing to address this issue.

Operational Forecasting:

* Trend and heatmap analyses revealed daily and weekly request patterns, helping Uber forecast resource demand more effectively.

In short, these insights give Uber a way to better match rider demand with available drivers. This will improve service efficiency, boost customer satisfaction, and reduce lost revenue.

# **Conclusion**

This EDA project revealed key imbalances between Uber's rider demand and driver supply. We looked at the time, location, and behavior in the data and found:

* Peak-hour mismatches, especially at the Airport in the morning and in the City during evenings.

* High failure rates, such as cancellations and unavailability, that result in lost revenue and a poor user experience.

* Patterns throughout the week and day that Uber can use to improve operations.

These insights can help Uber implement data-driven solutions like:

* Dynamic driver allocation,

* Adjustments in surge pricing,

* Targeted incentives,

* Improved scheduling models.

Overall, the project shows how making decisions based on data can address real-world business issues and enhance platform performance on a large scale.