<a href="https://colab.research.google.com/github/Ashifa-data/uber-data-project/blob/main/uber_EDA_Submission_Template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -UBER SUPPLY DEMAND GAPS


##### **Project Type**    - EDA
##### **Contribution**    - Individual


# **Project Summary -**

**Objective:**
The goal of this Exploratory Data Analysis (EDA) project is to identify patterns, problems, and opportunities within Uber’s ride request data. The primary business focus is to understand the supply-demand gap and explore why many trip requests do not get fulfilled, especially during peak hours.

**Dataset Overview:**
The dataset contains records of Uber ride requests including:

Request timestamp

Drop timestamp

Pickup point (City/Airport)

Trip status (Completed, Cancelled, No Cars Available)

Additional calculated fields like Request Hour, Trip Duration, etc.

**Methodology** *:
Cleaned and converted timestamp formats.

Created derived fields such as request hour, day of the week, and trip duration.

Visualized data using 15 insightful charts to uncover trends in trip requests, failures, and supply bottlenecks.

Focused analysis on time-based and location-based behavior to offer actionable suggestions.

**Tools Used:**
Python Pandas for data manipulation

Seaborn & Matplotlib for visualization

google colab environmenet

**Outcome:**
Identified that trip failures are significantly high, especially during peak hours (7–9 AM and 5–8 PM).

Different issues affect different pickup points (e.g., cancellations in the City, no cars at the Airport).

Clear need for driver reallocation, better scheduling, and customer communication strategies.

Suggested targeted solutions that Uber can implement to reduce failed trip rates, improve customer satisfaction, and increase revenue.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Uber operates in a highly dynamic environment where balancing rider demand and driver supply is critical. During peak hours or specific times of day, customers experience delays or fail to get a cab, leading to cancellations and dissatisfaction. On the other hand, idle drivers at off-peak hours indicate resource underutilization. This imbalance leads to a direct impact on Uber’s revenue, customer satisfaction, and brand reliability.

This project aims to investigate Uber’s cab request data from a specific time period to identify:

What time periods see the highest demand-supply gap?

What proportion of requests get canceled or rejected?

Which times of day see the most trip completions vs. failures?

What recommendations can improve service efficiency and customer satisfaction?



#### **Define Your Business Objective?**

Minimize the demand-supply gap by identifying time windows where there is either a surge in demand or a lack of driver availability.

Improve customer satisfaction by reducing trip cancellations and unfulfilled requests.

Optimize driver allocation and availability during peak and non-peak hours.

Provide actionable recommendations to Uber's operations team, backed by data visualizations and trend analysis, so that they can make data-driven decisions that improve overall service delivery.

By achieving the above, Uber can expect:

Higher trip completion rates.

Lower cancellation rates.

Better driver utilization and earnings.

A more reliable experience for riders.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:

# Import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:

from google.colab import files
uploaded = files.upload()

### Dataset First View

In [None]:
import pandas as pd
df = pd.read_csv("uber_data_clean.csv")
df.head()

### Dataset Rows & Columns count

In [None]:
import pandas as pd


df = pd.read_csv('uber_data_clean.csv')


print("The dataset contains:")
print(f" {df.shape[0]} rows")
print(f" {df.shape[1]} columns")

### Dataset Information

In [None]:
df.info()

#### Duplicate Values

In [None]:
df.isnull().sum()

#### Missing Values/Null Values

In [None]:
print("Missing values:\n", df.isnull().sum())

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set up the figure
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title("Missing Values Heatmap", fontsize=14)
plt.show()

### What did you know about your dataset?

Answer :
HereThe dataset represents Uber ride requests collected over a short time span in July 2016 in Bangalore, India. It includes key details related to trip requests made by users via the Uber platform. By examining the dataset, we gain the following key observations:

 1.Basic Structure and Scope
The dataset contains 6 columns and over 6,000+ rows.
Each row represents a single ride request made by a customer.

2.Key Features Present
Request timestamp and Drop timestamp: These help in determining trip duration and identifying peak or idle hours.
Pickup point: Indicates whether the ride request was from the airport or city.
Driver ID: Available only for rides that were assigned to a driver.
Status: This is the most critical column with three categories:
Trip Completed
Cancelled
No Cars Available
 3.Initial Observations
A large portion of requests were not fulfilled due to driver unavailability or cancellations.
Missing values appear mainly in the Drop timestamp and Driver ID, which logically aligns with trips that were not completed.
The time-based data allows us to segment requests into parts of the day (early morning, day, evening, night) and study demand patterns.
No categorical columns are encoded yet, and datetime formatting is needed for analysis.
Data is imbalanced in terms of trip status — this provides opportunity for gap analysis (demand vs. supply).
Duplicate records are minimal or non-existent, and overall, the dataset is relatively clean.
 2.Usefulness
This dataset is suitable for:
Exploratory Data Analysis (EDA)
Time series breakdown
Demand-supply gap identification
Operational inefficiency detection
Business insights for decision-making and improvement

## ***2. Understanding Your Variables***

In [None]:
df.columns

In [None]:
df.describe()

**Request Id ** - A unique identifier for each customer ride request
**Pick up Point** - Indicates the pickup time location of the user- either city or
                Airport. Helps in identifying demand patterens based on location.
**Driver Id**  - The Id assigned to the driver who accepted the request. It is  missing for requests that were cancelled or not fulfilled.
**Status**  -  The outcome of the request. It has three value: Trip completed,
           cancelled or No cars available.
**Request Timestamp**  - The date and time when the customer placed the ride request. used for creating time of day, weekday and hourly trend columns.
**Drop timestamp**  -  The date and time when the trip was completed. Missing for incomplete trips( cancelled or no cars available).
**Trip duration (mins)** - A derived variable representing the duration of completed
                      trips in minutes, calculated from drop and request timestamps.
**Time of day **- A derived categorical variable showing part of the day(e.g.,
              Early morning, day, vening, night), extracted from the request timestamp.




### Check Unique Values for each variable.

In [None]:
# Step 1: Import Required Libraries
import pandas as pd
import numpy as np

# Step 2: Load Dataset with Error Handling
try:
    uber_df = pd.read_csv('uber_data_clean.csv')
    print("Dataset loaded successfully.\n")
except FileNotFoundError:
    print(" Error: File 'uber_data_clean.csv' not found. Please check the file path.")

# Step 3: Display Column Names
print(" Columns in DataFrame:")
print(uber_df.columns.tolist())

# Step 4: Display Unique Values Count and Sample Values in Each Column
print("\n Unique values and examples for each column:\n")
for col in uber_df.columns:
    print(f" Column: {col}")
    print(f"   - Unique Values: {uber_df[col].nunique()}")
    print(f"   - Sample Values: {uber_df[col].dropna().unique()[:5]}")
    print("-" * 60)


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
import pandas as pd

# Step 1: Load the data
uber_df = pd.read_csv('uber_data_clean.csv')

# Step 2: Convert both timestamp columns to datetime
uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], dayfirst=True, errors='coerce')
uber_df['Drop timestamp'] = pd.to_datetime(uber_df['Drop timestamp'], dayfirst=True, errors='coerce')

# Step 3: Calculate Trip Duration (in minutes)
uber_df['Trip Duration (mins)'] = (uber_df['Drop timestamp'] - uber_df['Request timestamp']).dt.total_seconds() / 60

# Step 4: Preview
print(uber_df[['Request timestamp', 'Drop timestamp', 'Trip Duration (mins)']].head())

# Create a new column: Trip Duration (in minutes)
uber_df['Trip Duration (mins)'] = (uber_df['Drop timestamp'] - uber_df['Request timestamp']).dt.total_seconds() / 60

# Create a column: Hour of Request
uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

# Create a column: Day of the Week
uber_df['Request Day'] = uber_df['Request timestamp'].dt.day_name()

# Create a new column: Time of Day
def get_time_of_day(hour):
    if 0 <= hour < 4:
        return 'Late Night'
    elif 4 <= hour < 10:
        return 'Early Morning'
    elif 10 <= hour < 17:
        return 'Day Time'
    elif 17 <= hour < 22:
        return 'Evening'
    else:
        return 'Night'

uber_df['Time of Day'] = uber_df['Request Hour'].apply(get_time_of_day)

# Reset index (optional)
uber_df.reset_index(drop=True, inplace=True)

# Drop duplicates if any
uber_df.drop_duplicates(inplace=True)

# Show basic stats to confirm wrangling is complete
uber_df[['Trip Duration (mins)', 'Request Hour']].describe()


In [None]:
uber_df['Trip duration(mins)'] = (
    uber_df['Drop timestamp'] - uber_df['Request timestamp']
).dt.total_seconds() / 60


print(uber_df[['Trip duration(mins)']].head())



In [None]:

uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour
uber_df['Request Day'] = uber_df['Request timestamp'].dt.day_name()

def get_time_of_day(hour):
    if 0 <= hour < 4:
        return 'Late Night'
    elif 4 <= hour < 10:
        return 'Early Morning'
    elif 10 <= hour < 17:
        return 'Day Time'
    elif 17 <= hour < 22:
        return 'Evening'
    else:
        return 'Night'

uber_df['Time of Day'] = uber_df['Request Hour'].apply(get_time_of_day)


uber_df[['Trip Duration (mins)', 'Request Hour', 'Time of Day']].head()

### What all manipulations have you done and insights you found?

**1.Converted date/time columns:**
Converted 'Request timestamp' and 'Drop timestamp' to proper datetime format using pd.to_datetime() with dayfirst=True.
**2.Created new columns:**
Request Hour: Extracted the hour from Request timestamp.
Request Day: Extracted the day of the week from Request timestamp.
Trip Duration (mins): Calculated the duration in minutes from Drop timestamp - Request timestamp.
**3.Handled Missing Data:**
Used errors='coerce' for drop timestamp to handle missing/inconsistent entries.
**4.Dropped or flagged missing/incomplete records:**
Trips with missing drop timestamps or durations (i.e., canceled or no car available) are retained for status-wise analysis but excluded where duration is needed.
 Initial Insights from EDA
 Univariate Analysis:

**1.Trip Status Distribution:**
A significant number of trips are either cancelled or show no cars available.
Only a portion of requests are actually completed.
**2.Pickup Point Distribution:**
More requests are made from the airport than the city.
However, more cancellations and "no cars" statuses are seen in city pickups.
**3.Time of Day Patterns:**
Morning rush hour (5–9 AM) and evening (5–9 PM) show the highest request volume.
During evening peak hours, there's a large supply-demand gap, especially for airport pickups.
**4.Trip Duration:**
Most trips last between 15 to 40 minutes, but city trips are generally shorter than airport trips.
Very few long-duration trips observed.
 **Bivariate/Multivariate Analysis (Insights Planned or In Progress):**
1.Status vs Pickup Point vs Time of Day:
We observed cancellations in city mornings and no cars at airport in evenings.
Will show these clearly using grouped bar plots and heatmaps.
2.Demand vs Supply Gap:
Planned visualization: Overlay demand vs completed trips across time blocks.
Clear evidence of operational bottlenecks at certain times.
**Summary of Key Problems Identified**:
Major supply issues during peak hours, especially in evening airport pickups.
High cancellation rates by drivers in morning city pickups.
Potential inefficiencies in fleet distribution and driver availability.Answer Here.

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 5))
sns.countplot(data=uber_df, x='Status', palette='Set2')
plt.title('Trip Status Distribution')
plt.xlabel('Status')
plt.ylabel('Number of Requests')
plt.show()

I chose a bar chart because it’s the best visual to show frequency comparison across categories. Since the Status column contains categorical values, a bar chart clearly highlights which status occurs most often and is easy to interpret.


Answer Here.

The chart shows that "No Cars Available" is the most common status, followed by "Completed", and then "Cancelled".
This indicates a demand-supply mismatch, especially during peak hours, when users are more likely to face unavailability.
Only a portion of the total ride requests are being successfully completed, which is a performance concern.

Absolutely. These insights can:

Help the company identify service inefficiencies.
Allow better driver allocation in areas or times with high “No Cars Available” counts.
Increase customer satisfaction and retention by addressing the root causes of missed ride opportunities.Answer Here

Yes:

High "No Cars Available" or "Cancelled" trip statuses can result in lost revenue, user frustration, and negative brand perception.
If users repeatedly experience failures in booking rides, they may switch to competitors, affecting Uber's market share.Answer Here

#### Chart - 2

In [None]:
from google.colab import files
uploaded = files.upload()


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

uber_df = pd.read_csv('uber_data_clean.csv')


plt.figure(figsize=(8, 5))
sns.countplot(data=uber_df, x='Pickup point', palette='viridis')
plt.title('Pickup Point Distribution', fontsize=14)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.tight_layout()
plt.show()


I selected a bar chart because it provides a clear comparison of how many requests come from each pickup point. This helps identify where demand is higher, enabling better resource planning.



Answer Here.

The City has a significantly higher number of ride requests than the Airport.

This suggests that most Uber users in this dataset are trying to get a ride from the city rather than from the airport.

We can infer greater demand for drivers in the City area, especially during rush hours.

Answer Here

Yes:

The company can optimize driver deployment by assigning more cabs to high-demand pickup zones like the City.

For the Airport, Uber can analyze if low demand is due to driver restrictions, high wait times, or limited user awareness, and improve accordingly.


Any insights that lead to negative growth? Justify.
If the Airport has fewer requests but a higher percentage of cancelled or unfulfilled rides, this could lead to dissatisfaction among air travelers, a valuable customer segment.

Ignoring low-volume pickup points may result in missed opportunities for growth in quieter areas.



Answer Here

#### Chart - 3

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(data=uber_df, x='Pickup point', hue='Status', palette='Set2')
plt.title('Request Status by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Trip Status')
plt.tight_layout()
plt.show()


I chose a grouped bar chart (countplot with hue) because it visually compares how trip statuses vary between the two pickup locations (City and Airport). This helps understand if the issues in trip completions are location-specific.



Answer Here.

The chart shows that:

In City pickups, there are a high number of cancelled trips.

In Airport pickups, a large number of trips show no cars available.

Completed trips are relatively fewer compared to the total requests in both pickup points.



Answer Here

Yes, these insights help identify area-specific issues in the Uber service:

In the City, cancellations could be due to driver-side issues (delays, traffic).

In the Airport, “no cars available” suggests a shortage of drivers, especially at certain times.

 Negative Insight: Unavailability and cancellations lead to customer dissatisfaction, reduced reliability, and loss of business.

 Business Action: Uber can improve resource allocation—possibly by:

Incentivizing drivers to wait at the airport during peak hours.

Reducing cancellation reasons by penalizing or educating drivers.

Answer Here

#### Chart - 4

In [None]:

uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], dayfirst=True, errors='coerce')


In [None]:

uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

plt.figure(figsize=(12, 6))
sns.countplot(data=uber_df, x='Request Hour', hue='Status', palette='coolwarm')
plt.title('Request Status by Hour of the Day', fontsize=14)
plt.xlabel('Hour of the Day (24-hour format)', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Trip Status')
plt.tight_layout()
plt.show()


This grouped bar chart helps visualize time-based demand and service issues, allowing us to identify when most requests occur and at which hours cancellations or unavailability peak.



Answer Here.

Insights:

High demand (and issues) occur during early morning hours (5–9 AM) and evening hours (5–9 PM).

“No cars available” is extremely high during early morning hours, especially for airport pickups.

Cancellations spike during evening hours, likely due to traffic or driver fatigue.



Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes! These insights can:

Help Uber optimize driver supply by deploying more drivers in high-demand hours.

Enable better incentive planning for drivers during peak hours.

Reduce unavailability and cancellations, improving customer satisfaction and brand trust.

Negative growth risk:

Repeated unavailability or cancellations during peak hours may push users to switch to competitors.

Customers may avoid using Uber during rush hours, affecting revenue.

 Using these insights for better driver allocation strategies will improve overall service quality and ensure positive business impact.

Answer Here

#### Chart - 5

In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(data=uber_df, x='Pickup point', hue='Status', palette='Set2')
plt.title('Trip Status by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Trip Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This grouped countplot helps compare trip statuses at each pickup point (City vs Airport) in one view, showing where problems like cancellations or no cars are more common.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

City pickups have more completed trips, but also more cancellations.

Airport pickups face more ‘No Cars Available’ issues, especially during peak hours.

The airport seems underserved compared to the city.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes.
These insights help Uber adjust driver supply:

Assign more drivers to the airport during high-demand periods.

Investigate why users cancel more in the city (wait time? pricing?).

Are there any insights that lead to negative growth? Justify with reasons.
 Yes.

High cancellations in the city may show customer dissatisfaction.

No cars at the airport lead to missed revenue opportunities and poor user experience.
➤Justification: If not addressed, these issues can hurt customer trust and brand reliability, pushing users to competitors.Answer Here

#### Chart - 6

In [None]:

uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], dayfirst=True, errors='coerce')

uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

plt.figure(figsize=(12, 6))
sns.countplot(data=uber_df, x='Request Hour', hue='Status', palette='dark')
plt.title('Hourly Trip Requests by Status', fontsize=14)
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Trip Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This chart shows how customer requests and trip statuses fluctuate by hour, helping identify peak hours and operational gapsAnswer Here.

##### 2. What is/are the insight(s) found from the chart?

Peak hours are 7–9 AM and 5–9 PM.

Morning peaks (around 8 AM) see high cancellations and ‘No Cars Available’ statuses.

Evening peaks also show large demand but with better trip completion rates.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes.

Helps Uber assign more drivers during peak hours, reducing cancellations and unavailability.

Can use dynamic pricing or driver incentives in critical time slots.

Are there any insights that lead to negative growth? Justify with reasons.
Yes.

High cancellations during early mornings may indicate driver unavailability or fatigue.

No cars during rush hours can lead to customer churn and loss of loyalty.
 Justification: Unfulfilled demand reduces user trust and impacts brand reliability, especially for time-sensitive travel (like airport drop-offs).

Answer Here

#### Chart - 7

In [None]:
plt.figure(figsize=(8, 5))
sns.countplot(data=uber_df, x='Pickup point', hue='Status', palette='dark')
plt.title('Trip Status by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Number of Trips', fontsize=12)
plt.legend(title='Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This chart reveals which pickup location faces more cancellations or unavailability, helping identify operational inefficiencies.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

City pickup point has a higher number of cancellations.

Airport pickup point shows more cases of ‘No Cars Available’.

Completed trips are better balanced but still lower in volume.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes.

Helps Uber plan targeted driver allocation:
 More drivers needed at the airport during demand peaks.
 Need to understand why city cancellations are high (maybe long wait times or pricing).

Are there any insights that lead to negative growth? Justify with specific reason.
 Yes.

Frequent ‘No Cars Available’ at airport hurts revenue from inbound customers (first impression matters).

High cancellation in city could drive loyal customers to competitors.
Justification: Poor pickup reliability leads to dissatisfaction, poor ratings, and revenue loss, especially in areas with strong demand.

Answer Here

#### Chart - 8

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns



In [None]:
from google.colab import files
uploaded = files.upload()



In [None]:
import pandas as pd

# Load the dataset
uber_df = pd.read_csv('uber_data_clean.csv')

# Show the first 5 rows
uber_df.head()


In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Ensure timestamps are in datetime format
uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], dayfirst=True, errors='coerce')

# Step 2: Create Request Hour column
uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

# Step 3: Plot chart
plt.figure(figsize=(14, 6))
sns.countplot(data=uber_df, x='Request Hour', hue='Status', palette='dark')

plt.title('Trip Status by Hour of the Day', fontsize=16)
plt.xlabel('Hour of Day', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Trip Status')
plt.xticks(range(0, 24))
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

To identify peak hours for each trip status and see when issues like cancellations or no cars occur most often.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

High number of “No Cars Available” during early morning (5 AM–9 AM).

Cancellations are frequent in evening hours (5 PM–9 PM).

Completed trips drop drastically during peak hours due to supply-demand mismatch.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes.

Helps Uber assign more drivers during peak hours, reducing cancellations and unavailability.

Can use dynamic pricing or driver incentives in critical time slots.

Are there any insights that lead to negative growth? Justify with reasons.
Yes.

High cancellations during early mornings may indicate driver unavailability or fatigue.

No cars during rush hours can lead to customer churn and loss of loyalty.
 Justification: Unfulfilled demand reduces user trust and impacts brand reliability, especially for time-sensitive travel (like airport drop-offs).Answer Here

#### Chart - 9

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Plot the chart
plt.figure(figsize=(8, 6))
sns.countplot(data=uber_df, x='Pickup point', hue='Status', palette='Set1')

# Step 2: Customize the chart
plt.title('Trip Status by Pickup Point', fontsize=16)
plt.xlabel('Pickup Point', fontsize=12)
plt.ylabel('Number of Requests', fontsize=12)
plt.legend(title='Trip Status')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This chart helps visualize how trip outcomes (Completed, Cancelled, No Cars Available) vary between city and airport pickups — helping Uber manage location-based driver deployment.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

City Pickup Point: Has a large number of “No Cars Available” and “Cancelled” statuses — showing high demand but insufficient supply.

Airport Pickup Point: Slightly better completion rate but still has issues with availability and cancellations.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, because:

Uber can optimize driver allocation based on location-specific demand.

Can launch location-based driver incentives to balance availability.

Helps Uber decide where to station more cars during peak periods.

 Negative Insight?
Yes — City pickups having more cancellations or “No Cars Available” indicates:

Lost revenue

Customer frustration

Potential switch to competitors

Solving this helps turn a weak spot into a competitive strength.

#### Chart - 10

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

uber_df = pd.read_csv("uber_data_clean.csv")

uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], dayfirst=True, errors='coerce')

uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

status_hour_crosstab = pd.crosstab(uber_df['Request Hour'], uber_df['Status'])

status_hour_crosstab.plot(kind='bar', stacked=True, figsize=(12, 6), colormap='Set2')

plt.title('Trip Status by Hour of the Day', fontsize=14)
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Trips')
plt.xticks(rotation=0)
plt.legend(title='Trip Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A stacked bar chart effectively shows how different trip statuses (Completed, Cancelled, No Cars Available) change hour by hour throughout the day. It highlights time-based operational challenges.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Early morning (5–9 AM) and evening (5–9 PM) have the highest number of requests.

Cancellation rate is high during the morning peak hours, possibly due to driver unavailability or traffic.

"No Cars Available" spikes in the evening hours, indicating a supply-demand gap.

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. Here’s how:

Uber can increase driver incentives during peak hours to reduce cancellations.

Demand forecasts can help in allocating cars proactively to high-demand zones.

May help improve customer satisfaction and reduce lost revenue due to unserved bookings.

 Negative Insight: Evening no-availability could mean Uber is losing customers to competitors during high-demand hours.Answer Here

#### Chart - 11

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 5))
sns.countplot(data=uber_df, x='Pickup point', hue='Status', palette='Set1')

plt.title('Trip Status by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point')
plt.ylabel('Number of Trips')
plt.legend(title='Trip Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

A grouped count plot (with hue) is ideal for comparing how trip statuses vary between two categories—in this case, Airport and City. It clearly shows the differences side by side.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

City pickup has a high number of cancellations.

No Cars Available is significantly higher at the Airport.

Completed trips are more from the City than from the Airport.

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These insights can guide business decisions such as:

Improving fleet allocation at the Airport, especially during evening hours.

Investigating cancellation reasons in the City—could be due to traffic, wait time, or rider behavior.

These actions could improve customer satisfaction and driver efficiency.

 Negative Insight: The high number of “No Cars Available” at the Airport could lead to poor user experience and might push users to try alternative services like taxis or other ride apps.Answer Here

#### Chart - 12

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Ensure Request Hour column exists
if 'Request Hour' not in uber_df.columns:
    uber_df['Request timestamp'] = pd.to_datetime(uber_df['Request timestamp'], dayfirst=True, errors='coerce')
    uber_df['Request Hour'] = uber_df['Request timestamp'].dt.hour

plt.figure(figsize=(12, 6))
sns.countplot(data=uber_df, x='Request Hour', hue='Status', palette='coolwarm')

plt.title('Trip Status Distribution by Hour', fontsize=14)
plt.xlabel('Hour of Day')
plt.ylabel('Number of Trips')
plt.legend(title='Trip Status')
plt.xticks(range(0, 24))
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This stacked count plot by hour clearly shows how trip statuses are distributed through the day. It helps detect patterns such as when cancellations or unavailability peak.Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Completed trips peak during morning and evening commute hours (8 AM, 6 PM).

Cancellations peak in early morning and late evening hours.

No Cars Available is very high during late evening/night (8 PM to midnight), especially near the airport.Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. These patterns help Uber:

Optimize driver shift planning (more drivers during peak unavailability hours).

Implement dynamic pricing or incentives during high cancellation hours.

Reduce missed demand, improving revenue and customer trust.

Negative Insight: If the company doesn’t act on this data, they risk customer churn due to repeated unavailability during high-demand hours.Answer Here

#### Chart - 13

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 6))
sns.countplot(data=uber_df, x='Pickup point', hue='Status', palette='Set2')

plt.title('Trip Status by Pickup Point', fontsize=14)
plt.xlabel('Pickup Point')
plt.ylabel('Number of Trips')
plt.legend(title='Status')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

This clustered bar chart gives a clear comparison of how trip statuses differ between City and Airport, helping analyze location-based service performance.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Airport shows a very high count of "No Cars Available".

City pickups have a more balanced distribution but still show cancellations.

Completed trips are more frequent from the City than the Airport.

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, absolutely.

Uber can allocate more cars near the Airport, especially during peak hours.

Improve demand prediction and availability strategies for high-miss zones.

Create special driver incentives to target underserved pickup points.

Negative Insight: If Uber continues to have low availability from the Airport, it could damage customer satisfaction and trust, leading to negative reviews and lost revenue.Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Select relevant numeric columns for correlation
numeric_cols = uber_df[['Trip duration(mins)', 'Request Hour']]

# Compute correlation matrix
correlation_matrix = numeric_cols.corr()

# Plot heatmap
plt.figure(figsize=(6, 4))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Heatmap')
plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

The correlation heatmap gives a compact, visual summary of how strongly numerical variables relate to one another. It helps spot patterns quickly for decision-making.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

There is a slight negative correlation between Request Hour and Trip Duration.

Meaning: During some hours, trip durations tend to be shorter.

The correlation is weak, indicating no strong linear relationship between these two variables.

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Prepare the dataset with relevant numeric columns
pairplot_df = uber_df[['Trip duration(mins)', 'Request Hour']].dropna()

# Create the pair plot
sns.pairplot(pairplot_df)
plt.suptitle("Pair Plot of Trip duration and Request Hour", y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

A pair plot allows us to explore the distribution of individual variables and relationships between them in one view. It is helpful for identifying clusters, trends, and outliers.

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

The scatter plot shows that trip durations are concentrated during specific request hours, especially between 5 AM to 10 AM and 5 PM to 9 PM (possible peak times).

The histogram shows that:

Most requests occur during working hours.

Short-duration trips are more common than long ones.

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

1. Minimize Trip Cancellations
Insights from Charts:

High cancellation rates from the City during morning hours.

Suggestions:

 Introduce driver penalties or lower incentives for last-minute cancellations.

 Offer passenger incentives (e.g., discounts) to reschedule instead of canceling.
 Implement a driver preference filter (e.g., "willing to take morning peak rides").

 2. Bridge the Supply-Demand Gap
Insights from Charts:

"No cars available" is frequent at the Airport during evening hours.

Suggestions:

 Dynamic fleet management: Redirect idle drivers to Airport during 5–10 PM.

Use predictive demand models to alert drivers in advance.

 Surge incentives for drivers in underserved time slots and locations.

 3. Improve Customer Experience
Insights from Chart 8 & 10:

Peak hour failures frustrate users, reducing brand trust.

Suggestions:

 Real-time alerts for users during known problem windows with suggestions (e.g., earlier/later bookings).

 Loyalty rewards for frequent riders who face fewer issues.

 4. Optimize Driver Utilization
Insights:

Many requests go unserved while some drivers remain idle in low-demand zones.

Suggestions:

 Geo-fencing + heatmaps to direct idle drivers to high-demand spots.

 Auto-balancing system to shift supply intelligently between city & airport.

 5. Increase Revenue & Retention
Suggestions:

 Flexible pricing models (mini, shared, prime) for peak hours to serve more users.

 Targeted promotions in low-demand times to balance rides across the day.

 Monitor real-time KPIs like conversion rates, cancellations, and ride completions.

 Final Message to Client:
“By leveraging time-based and location-specific insights, Uber can proactively reduce trip failures, better utilize its fleet, and significantly improve rider satisfaction — all leading to a more efficient and profitable operation.”Answer Here.

# **Conclusion**

The analysis of Uber's trip request data reveals critical operational challenges and hidden patterns affecting service quality and customer satisfaction. Key findings include:

A high number of trip cancellations and unfulfilled requests during peak hours, especially in the City during the morning and Airport during the evening.

A significant mismatch between supply and demand, leading to frequent "No cars available" errors.

Time of day and pickup location are the most influential factors in trip status outcomes.

Data shows clear opportunities for optimizing driver allocation, enhancing app communication, and improving customer loyalty.

By implementing data-driven strategies such as dynamic driver deployment, incentive programs, and real-time demand forecasting, Uber can:

 *Reduce service failures
 *Improve rider and driver experience
 *Increase ride completion rates
 *Boost revenue and operational efficiency

This analysis provides a foundation for smarter decision-making and sets the path for better business performance and customer satisfaction.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***