# **Project Name**    - Uber Supply Demand Gap Report



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual
##### **Intern Name**     - Amrita


# **Project Summary -**

The Uber Supply Demand Gap analysis explores ride request data from July 2016 to uncover patterns in customer demand and Uber’s supply limitations. This project focuses on identifying time periods and pickup locations (City or Airport) where customer requests go unfulfilled — either due to driver cancellations or unavailability of cars.

The data contains over 6,700 ride requests, each with status information such as "Trip Completed", "Cancelled", and "No Cars Available". Additional attributes include pickup point, request timestamp, and driver ID (if assigned). To extract meaningful insights, the data was cleaned and structured to add derived features like “Request Hour” and “Time Slot” (Morning, Night, etc.).

The EDA uncovers critical business issues — most notably, a large supply-demand gap during Night and Early Morning time slots, especially for Airport pickups. A deeper dive revealed that many drivers cancel rides in the Morning, while Uber often shows "No Cars Available" at Night. This points to a mismatch between rider demand and available driver supply during off-peak hours.

A total of 20+ charts were generated using Pandas, Seaborn, and Matplotlib libraries, structured into Univariate, Bivariate, and Multivariate sections. KPIs were calculated, and a clean, interactive Excel dashboard was created.

Business Recommendation: Introduce rush-hour incentives for drivers during critical hours and improve supply through Night Shift scheduling.






# **GitHub Link -**

https://github.com/Amrita16592/Uber-Supply-Demand-gap-Internship-project_1

# **Problem Statement**


**Uber often fails to meet rider demand, especially in specific locations or hours of the day. Users face cancellations or no-car situations. The business must identify when and why these gaps occur and what can be done to reduce them.**

#### **Define Your Business Objective?**

To identify patterns in Uber's ride request failures by analyzing time slots, pickup points, and driver behavior — and to provide actionable recommendations to reduce unfulfilled ride requests and improve customer satisfaction.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
import warnings
warnings.filterwarnings("ignore")


### Dataset Loading

In [None]:
import pandas as pd
from google.colab import files

# --- Method 1: Upload the file directly to Colab's session storage ---
# This is often the easiest way for a single file.
# 1. Run this cell.
# 2. A "Choose Files" button will appear below the cell output.
# 3. Click "Choose Files" and select your 'Uber Request Data (1).csv' file from your computer.
# The file will be uploaded to the temporary Colab environment.

uploaded = files.upload()

# Get the name of the uploaded file
# This will be 'Uber Request Data (1).csv'
file_name = list(uploaded.keys())[0]
print(f"Uploaded file: {file_name}")

# Load the dataset into a Pandas DataFrame using pd.read_csv()
try:
    df = pd.read_csv(file_name)
    print("\nDataset loaded successfully!")
    print("First 5 rows of the DataFrame:")
    print(df.head())
    print("\nDataFrame Info:")
    df.info()
except Exception as e:
    print(f"An error occurred while loading the dataset: {e}")
    print("Please ensure you uploaded the correct CSV file and it's not corrupted.")


### Dataset First View

In [None]:
# Display the first 5 rows
df.head()


### Dataset Rows & Columns count

In [None]:
# Shape of the dataset
print("Total Rows:", df.shape[0])
print("Total Columns:", df.shape[1])


### Dataset Information

In [None]:
# Dataset structure and info
df.info()


#### Duplicate Values

In [None]:
# Check for duplicates
df.duplicated().count()


#### Missing Values/Null Values

In [None]:
# Check for missing values
df.isnull().sum()


In [None]:
# Heatmap for missing values
plt.figure(figsize=(10, 5))
sns.heatmap(df.isnull(), cbar=False, cmap="YlGnBu", yticklabels=False)
plt.title("Missing Values Heatmap")
plt.show()

### What did you know about your dataset?

- Total Requests: ~6745
- Columns include Request timestamp, Drop timestamp, Status, Pickup point, etc.
- Missing values mainly in 'Driver id' and 'Drop timestamp'
- Timestamps need conversion
- Need to extract features like hour and time slots


## ***2. Understanding Your Variables***

In [None]:
df.columns


In [None]:
df.describe(include="all")


### Variables Description

- Request id: Unique ID of the request
- Pickup point: Either 'Airport' or 'City'
- Driver id: Missing if not assigned
- Status: Ride outcome
- Request timestamp: Time of request
- Drop timestamp: Time of drop (if trip completed)


### Check Unique Values for each variable.

In [None]:
for col in df.columns:
    print(f"{col}: {df[col].nunique()} unique values")


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Convert timestamps to datetime
def parse_datetime(x):
    for fmt in ("%d/%m/%Y %H:%M", "%d-%m-%Y %H:%M:%S"):
        try:
            return pd.to_datetime(x, format=fmt)
        except:
            continue
    return pd.NaT

df["Request timestamp"] = df["Request timestamp"].apply(parse_datetime)
df["Drop timestamp"] = df["Drop timestamp"].apply(parse_datetime)

# Extract additional time features
df["Request Hour"] = df["Request timestamp"].dt.hour
df["Request Date"] = df["Request timestamp"].dt.date

# Create time slots
def get_time_slot(hour):
    if 4 <= hour < 8:
        return "Early Morning"
    elif 8 <= hour < 12:
        return "Morning"
    elif 12 <= hour < 17:
        return "Afternoon"
    elif 17 <= hour < 21:
        return "Evening"
    elif 21 <= hour <= 23:
        return "Night"
    else:
        return "Late Night"

df["Time Slot"] = df["Request Hour"].apply(get_time_slot)

# Check cleaned data
df.head()


### What all manipulations have you done and insights you found?

- Parsed timestamps properly
- Extracted request hour and request date
- Added Time Slot feature
- Identified most missing values come from unassigned requests
- Dataset ready for visualization


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
import matplotlib.pyplot as plt

# Count each request status
status_counts = df['Status'].value_counts()

# Pie chart
plt.figure(figsize=(6, 6))
colors = ['#66b3ff', '#ff9999', '#99ff99']
status_counts.plot.pie(autopct='%1.1f%%', startangle=90, colors=colors, shadow=True)
plt.title('Distribution of Trip Status')
plt.ylabel('')
plt.show()


##### 1. Why did you pick the specific chart?

A pie chart provides a quick overview of proportions. It clearly shows how the ride requests are distributed among Completed, Cancelled, and No Cars Available.

##### 2. What is/are the insight(s) found from the chart?



*   A large portion of rides are not completed.
*   “No Cars Available” and “Cancelled” together contribute to more than half of the requests, revealing a significant supply-demand issue.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can drive a positive business impact.

From the chart, we observed that:
- **Most ride failures occur during the Evening and Morning time slots.**
- **"No Cars Available" is the major issue in the Evening**, and **"Cancelled" rides are dominant in the Morning.**

These insights help Uber:
- **Adjust driver availability** based on time-specific demand.
- Introduce **driver incentives during high-cancellation windows**.
- Improve **supply forecasting models** for peak hours.

On the negative side, these gaps indicate:
- Poor customer experience in critical time slots.
- Potential **loss of revenue** and **brand trust**, especially when customers can't find a ride during busy times.

Therefore, acting on these insights is critical to **reduce negative growth** and improve operational efficiency.


#### Chart - 2

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Group data by Time Slot and Status
time_status = df.groupby(['Time Slot', 'Status']).size().unstack()

# Plot stacked bar chart
time_status.plot(kind='bar', stacked=True, figsize=(10, 6), colormap='Set2')
plt.title('Trip Request Status by Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Number of Requests')
plt.xticks(rotation=45)
plt.legend(title='Status')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A stacked bar chart shows how each status contributes within every time slot, allowing us to compare both absolute volume and category distribution.

##### 2. What is/are the insight(s) found from the chart?



*   Morning and Evening time slots have the highest number of requests.
*   The “No Cars Available” issue peaks in the Evening.


*  Cancellations are common in the Morning, especially from the City side.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These patterns help Uber optimize driver allocation:



*   More drivers in evenings near Airport
*   Targeted incentives in mornings in the City





This can reduce failed rides and improve customer satisfaction.

#### Chart - 3

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(7, 5))
sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set2')
plt.title('Trip Status by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')
plt.legend(title='Status')
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?







A grouped bar chart is ideal to compare how trip statuses vary based on location — here, City vs Airport.
It helps in understanding which location struggles more with cancellations or availability.



##### 2. What is/are the insight(s) found from the chart?



*   "No Cars Available" is dominant for Airport pickups, especially in the Evening.
*  "Cancelled" trips are significantly higher in the City, especially during the Morning.

*  Completed trips are fewer compared to the total demand at both locations.






##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These insights help Uber:


*   Balance driver distribution between City and Airport.
*  Incentivize drivers to avoid cancellations in the City, especially in the Morning.

*  Add predictive load-balancing logic in their app to reduce peak-time rejections.

Negative Impact Insight:


*   High failure rates at the Airport during peak hours can lead to customer churn from business travelers — a valuable customer segment.









#### Chart - 4

In [None]:
import matplotlib.pyplot as plt

# Group by hour and status
hourly_status = df.groupby(['Request Hour', 'Status']).size().unstack()

# Line plot
plt.figure(figsize=(10, 6))
hourly_status.plot(kind='line', marker='o', figsize=(10, 6), colormap='Dark2')
plt.title('Hourly Request Status Trends')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.xticks(range(0, 24))
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A line plot helps in visualizing the trend of requests and failures over time (hourly). It's ideal for detecting time-based peaks and performance issues.



##### 2. What is/are the insight(s) found from the chart?



*   Requests peak around 8 AM and 5-6 PM (Morning & Evening rush hours.

*   Cancellations spike around 8-9 AM, showing driver-side issues in the morning.
*   No Cars Available sharply increases after 5 PM, especially during Airport pickups.










##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These insights are critical for time-based driver supply planning:


*   Add more drivers in the evening at the Airport
*   Prevent cancellations in the morning from the City

Not addressing these time-slot failures can lead to repeated user dissatisfaction, especially for commuters, leading to brand damage and reduced app usage.





#### Chart - 5

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

heatmap_data = df.groupby(['Time Slot', 'Pickup point']).size().unstack()

plt.figure(figsize=(8, 5))
sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='YlGnBu')
plt.title('Heatmap of Requests by Time Slot and Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Time Slot')
plt.show()


##### 1. Why did you pick the specific chart?







A heatmap is great to show concentrations across two categories — here, time and pickup location.

##### 2. What is/are the insight(s) found from the chart?



*   Morning has the highest load from the City.
*   Evening load is more from the Airport.








##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber can adjust operations based on when and where demand surges, avoiding under-serviced regions and peak-hour failures.

#### Chart - 6

In [None]:
failed = df[df['Status'] != 'Trip Completed']
failed['Driver id'].value_counts().head(10).plot(kind='bar', figsize=(8, 4), color='tomato')
plt.title('Top Drivers with Most Failed Requests')
plt.xlabel('Driver ID')
plt.ylabel('Number of Failed Trips')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

**Bar charts are best to show top contributors to a problem.**

##### 2. What is/are the insight(s) found from the chart?

**Some drivers are repeatedly associated with failed rides (cancel/no response).**

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Uber may train, warn, or suspend such drivers to improve success rates and maintain service standards.**

#### Chart - 7

In [None]:
sns.countplot(data=df, x='Request Hour', color='skyblue')
plt.title('Trip Request Count per Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Requests')
plt.grid(axis='y', linestyle='--', alpha=0.5)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

To identify when people request rides the most, allowing better scheduling of supply.

##### 2. What is/are the insight(s) found from the chart?

Most requests are between 7–10 AM and 5–8 PM — peak travel times.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Use this to adjust driver shift patterns, reduce wait time, and prevent ride failures in high-demand slots.

#### Chart - 8

In [None]:
completion = df[df['Status'] == 'Trip Completed']
completion['Time Slot'].value_counts().plot(kind='bar', color='limegreen')
plt.title('Trip Completions per Time Slot')
plt.ylabel('Completed Trips')
plt.xlabel('Time Slot')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Shows when trips are successfully completed, helping locate best-performing time

##### 2. What is/are the insight(s) found from the chart?

Midday hours see a higher completion rate, while morning and evening hours struggle.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber can offer discounts in midday slots and improve logistics for peak slots to balance demand.

#### Chart - 9

In [None]:
cancel = df[df['Status'] == 'Cancelled']
cancel['Time Slot'].value_counts().plot(kind='bar', color='orange')
plt.title('Cancellations per Time Slot')
plt.ylabel('Cancelled Trips')
plt.xlabel('Time Slot')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

To find when cancellations happen most, helping analyze driver reliability and user satisfaction.

##### 2. What is/are the insight(s) found from the chart?

Morning cancellations are very high, mostly due to driver non-acceptance.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber can incentivize morning shifts or auto-penalize last-minute cancellations, improving ride success rate.

#### Chart - 10

In [None]:
nocars = df[df['Status'] == 'No Cars Available']
nocars['Time Slot'].value_counts().plot(kind='bar', color='red')
plt.title('No Cars Available per Time Slot')
plt.ylabel('Requests Unfulfilled')
plt.xlabel('Time Slot')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?


To detect peak failure times due to no supply.






##### 2. What is/are the insight(s) found from the chart?

Evening rides, especially from the Airport, have no available drivers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber should ensure driver shift rotation and airport coverage in the evening to match supply with demand.

#### Chart - 11

In [None]:
completed = df[df['Status'] == 'Trip Completed']
sns.countplot(data=completed, x='Pickup point', palette='coolwarm')
plt.title('Completed Trips by Pickup Point')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

To measure which pickup location sees more successful rides.

##### 2. What is/are the insight(s) found from the chart?

City has slightly higher completion rates than Airport.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This allows Uber to set location-specific KPIs and allocate resources based on past outcomes.

#### Chart - 12

In [None]:
df['Status'].value_counts().plot(kind='bar', color=['limegreen', 'orange', 'red'])
plt.title('Overall Status Distribution')
plt.ylabel('Number of Trips')
plt.grid(axis='y')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?



A simple bar plot to show success vs failure rate.





##### 2. What is/are the insight(s) found from the chart?

More than 60% of requests are unsuccessful.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight is alarming and justifies urgent attention from ops and driver management teams to improve efficiency.

#### Chart - 13

In [None]:
# Convert timestamps to datetime and extract Request Hour
def parse_datetime(x):
    for fmt in ("%d/%m/%Y %H:%M", "%d-%m-%Y %H:%M:%S"):
        try:
            return pd.to_datetime(x, format=fmt)
        except:
            continue
    return pd.NaT

df["Request timestamp"] = df["Request timestamp"].apply(parse_datetime)
df["Request Hour"] = df["Request timestamp"].dt.hour

df.groupby('Request Hour').size().rolling(2).mean().plot(kind='line', figsize=(8,5), color='blue')
plt.title('Rolling Average Hourly Demand')
plt.xlabel('Hour')
plt.ylabel('Avg Requests')
plt.grid()
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Rolling averages smoothen demand trends, eliminating noise.

##### 2. What is/are the insight(s) found from the chart?

Demand steadily climbs in the morning and again in the evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.



Can guide smart pricing or promos, and help optimize driver shift start times.





#### Chart - 14 - Correlation Heatmap

In [None]:
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


##### 1. Why did you pick the specific chart?

To find out if any numeric variables are strongly correlated (e.g., hour and failure rate.

##### 2. What is/are the insight(s) found from the chart?

No strong linear correlations exist — categorical and temporal features dominate.

#### Chart - 15 - Pair Plot

In [None]:
import seaborn as sns

sns.pairplot(df[['Request Hour', 'Driver id']], kind='scatter', diag_kind='hist')
plt.suptitle('Pair Plot - Numeric Variables', y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

To observe numeric variable relationships visually.

##### 2. What is/are the insight(s) found from the chart?







Driver IDs are spread evenly. Request Hour has some trend patterns.

#### Chart - 16

In [None]:
failures = df[df['Status'] != 'Trip Completed']

# Calculate failure ratio per hour
failure_ratio = failures.groupby(df['Request Hour']).size() / df.groupby(df['Request Hour']).size()

# Plot
failure_ratio.plot(kind='bar', color='crimson', figsize=(8,5))
plt.title('Failed Trip Ratio by Hour')
plt.xlabel('Hour of Day')
plt.ylabel('Failure Ratio')
plt.grid()
plt.tight_layout()
plt.show()

1. Why did you pick the specific chart?

To quantify failure rates across the day and help predict worst-performing hours.

##### 2. What is/are the insight(s) found from the chart?

Failure rate spikes in the evening and morning rush hours.

#### Chart - 17

In [None]:
import seaborn as sns

heat_data = pd.crosstab(df['Request Hour'], df['Pickup point'])
plt.figure(figsize=(10,6))
sns.heatmap(heat_data, annot=True, cmap='YlGnBu')
plt.title('Pickup Point vs Hour-wise Demand')
plt.xlabel('Pickup Point')
plt.ylabel('Hour of Day')
plt.show()


1. Why did you pick the specific chart?

Heatmaps show which hour and location has the most requests visually and quickly.

2. What is/are the insight(s) found from the chart?

City dominates morning hours, while Airport dominates evening hours.



#### Chart - 18

In [None]:
status_by_pickup = pd.crosstab(df['Pickup point'], df['Status'])
status_by_pickup.plot(kind='bar', stacked=True, colormap='viridis', figsize=(8,5))
plt.title('Pickup Point vs Trip Status')
plt.ylabel('Number of Requests')
plt.grid(axis='y')
plt.show()


1. Why did you pick the specific chart?

Stacked bars help compare trip outcomes across multiple categories — City and Airport.

2. What is/are the insight(s) found from the chart?

Airport sees more "No Cars Available"

City sees more cancellations

#### Chart - 19

In [None]:
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True, errors='coerce')
df['Weekday'] = df['Request timestamp'].dt.day_name()

# Create a crosstab of status vs day of week
status_by_day = pd.crosstab(df['Weekday'], df['Status'])

# Reorder the weekdays
status_by_day = status_by_day.reindex([
    'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'
    ])

# Plot
status_by_day.plot(kind='bar', figsize=(10, 5), colormap='Accent')
plt.title('Trip Status by Day of Week')
plt.xlabel('Day of Week')
plt.ylabel('Number of Requests')
plt.grid()
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

1. Why did you pick the specific chart?

This shows whether day of week affects Uber trip success/failure.

2. What is/are the insight(s) found from the chart?

Weekdays see much higher failure rates, especially cancellations.

Weekends are relatively smoother.

#### Chart - 20

In [None]:
status_hour = pd.crosstab(df['Request Hour'], df['Status'])
status_hour.plot.area(stacked=True, figsize=(10,6), colormap='Set3', alpha=0.8)
plt.title('Hourly Trend of Trip Status (Area Plot)')
plt.xlabel('Hour')
plt.ylabel('Number of Requests')
plt.grid()
plt.show()


1. Why did you pick the specific chart?

Area charts show how different trip statuses stack together over time. It's great for spotting overall pattern shifts.

2. What is/are the insight(s) found from the chart?



Total volume rises in peak hours, but failures increase more than completions.

Evening failures dominate over the successful rides.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

To solve the supply-demand mismatch and reduce trip failures, Uber should adopt the following targeted actions:



*   Time-Specific Driver Allocation: Deploy more drivers during peak hours (especially 8–10 AM and 5–8 PM) based on high demand and failure trends.



*   Location-Based Incentives: Offer special incentives to drivers operating in high-failure areas like Airport (evening) and City (morning) to boost availability.

*   Cancellation Reduction Strategy: Implement stricter cancellation penalties or soft warnings for drivers who cancel frequently, especially in the morning slots.
*    User Feedback Loop: Improve user experience by providing real-time updates on driver availability and alternate suggestions when rides are likely to fail.


*   Predictive Scheduling & Demand Forecasting: Use predictive analytics to forecast upcoming demand surges and automate driver scheduling.


Implementing these data-driven strategies will help Uber meet customer expectations, reduce trip failure rates, and improve operational efficiency.



# **Conclusion**

This EDA project uncovered deep insights into Uber’s ride request data, highlighting a significant gap between supply and demand, especially during peak hours. Through comprehensive analysis and visualization, we identified the root causes of trip failures — such as driver shortages, cancellation behaviors, and location/time imbalances.

The analysis suggests that by focusing on smart driver allocation, policy-level interventions, and demand-aware scheduling, Uber can significantly improve trip completion rates, enhance customer satisfaction, and drive positive business impact.

In conclusion, data-driven operational strategies are essential for Uber to bridge the supply-demand gap, improve service reliability, and maintain a competitive advantage in the ride-hailing industry.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***