# **Project Name** - **UBER SUPPLY DEMAND GAP ANALYSIS**

- ##### **Project Type**    - Exploratory Data Analysis (EDA)
- ##### **Contribution**    - Individual


# **Project Summary** - **Uber Supply-Demand Gap Analysis**

#**Uber Supply-Demand Gap Analysis**

- **Project Objective:** The primary goal was to analyze Uber's request data to identify the root causes of the supply-demand gap, specifically focusing on "Cancelled" trips and "No Cars Available" for airport-city routes.

- **Data Overview:** The analysis utilized a dataset of 6,745 ride requests, featuring variables such as Request ID, Pickup Point, Status and Timestamps.

- **Methodology & Tools:**

    **1. Excel:** Used for initial data cleaning, formatting mixed timestamps, and creating an interactive dashboard with slicers for business stakeholders.

    **2. Python (Pandas/NumPy):** Employed for advanced feature engineering, including the creation of Time Slots (Early Morning, Day Time, Evening, Late Night) and a Gap Status indicator.

    **3. SQL:** Used to perform high-level aggregations and identify peak failure hours through complex queries.

- **Key Findings - The "Evening" Gap:**
    - The most severe supply-demand gap occurs during the Evening slot (5 PM – 10 PM), with a failure rate of approximately 66.5%.
    - The root cause is a massive shortage of supply at the Airport, where riders frequently receive "No Cars Available" messages due to high arrival demand and insufficient driver presence.

- **Key Findings - The "Early Morning" Gap:**
    - A secondary peak in the gap occurs during the Early Morning slot (5 AM – 10 AM).
    - The primary issue here is Driver Cancellations for trips originating in the City. This suggests drivers avoid airport trips in the morning because they fear a lack of return fares from the airport back to the city.

- **Overall Performance:** The analysis revealed a low overall trip completion rate of only 42%, highlighting a significant loss in potential revenue and customer satisfaction.

- **Actionable Recommendations:**
   - **For the Airport (Evening):** Implement targeted driver incentives and surge pricing to encourage drivers to move toward the airport during peak arrival hours.
   - **For the City (Morning):** Introduce cancellation penalties for drivers during peak morning hours and provide "return trip" guarantees to make airport runs more attractive.

- **Business Impact:** By addressing these specific temporal and spatial gaps, Uber can optimize its driver distribution, increase successful trip counts, and significantly improve the reliability of its service for airport commuters.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement -**


- **Core Challenge:** Uber faces a significant imbalance between the number of ride requests (demand) and the availability of drivers (supply), particularly for trips connecting the City and the Airport.

- **Service Disruptions:** This imbalance results in two major types of service failures:
    - **Cancellations:** Drivers frequently cancel requests, especially for trips heading toward the airport during specific hours.
    - **No Cars Available:** Riders often face situations where no drivers are nearby to accept their requests, leading to unfulfilled demand.

- **Impact on Stakeholders:**
    - **For Riders:** These gaps lead to a poor user experience, long wait times and unreliability for critical travel (like catching a flight).
    - **For Drivers:** Inefficient distribution of cars means drivers might miss out on high-demand periods or spend too much time on unprofitable routes.
    - **For Uber:** Every unfulfilled request represents a direct loss of revenue and potential damage to the brand's reputation for reliability.

- **Analytical Goal:** The objective is to use data analytics to pinpoint the exact time slots and pickup locations where these gaps are most severe and to identify the underlying reasons—whether they are structural (lack of cars) or behavioral (driver cancellations)—to provide data-driven solutions.

#### **Define Your Business Objective ?**

- **Maximize Trip Completion Rates:** The primary objective is to reduce the number of unfulfilled requests (Cancellations and No Cars Available) to ensure that a higher percentage of rider demands are successfully met.

- **Optimize Revenue Generation:** By bridging the supply-demand gap, Uber can capture lost revenue from the 58% of requests that currently go unfulfilled, directly increasing the company's bottom line.

- **Improve Operational Efficiency:** Identify specific peak hours and locations of failure to allow for better driver distribution and resource allocation, ensuring drivers are where the demand is highest.

- **Enhance Customer Satisfaction and Loyalty:** Reducing wait times and the frequency of "No Cars Available" messages improves the reliability of the service, fostering trust and long-term loyalty among airport commuters.

- **Develop Data-Driven Strategies:** Provide actionable insights for management to implement targeted interventions, such as dynamic pricing, driver incentives and optimized shift scheduling, to balance the marketplace effectively.

# **General Guidelines -**  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
# Essential Libraries for Uber Supply-Demand Analysis
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# To display plots directly in the notebook
%matplotlib inline

# To ignore unnecessary warnings
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
# Load Dataset
# Load the Uber Request Dataset
df = pd.read_csv('/content/Uber Request Data.csv')

# Display the first 5 rows to verify successful loading
df.head()

### Dataset First View

In [None]:
# Dataset First Look
# Display the first 5 rows to verify successful loading
df.head()

# Display the last 5 rows of the dataset
df.tail()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
# Check the dimensions of the dataset
print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]}")

### Dataset Information

In [None]:
# Dataset Info
# Check the technical summary of the dataset
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
# Count the number of duplicate rows in the entire dataset
duplicate_count = df.duplicated().sum()
print(f"Total Duplicate Rows: {duplicate_count}")

# Specifically check for duplicate Request IDs
id_duplicates = df['Request id'].duplicated().sum()
print(f"Duplicate Request IDs: {id_duplicates}")


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
# Count missing values for each column
print("Missing Values Count:")
print(df.isnull().sum())


In [None]:
# Visualizing the missing values
# Visualizing the missing values using a heatmap
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10, 5))
sns.heatmap(df.isnull(), cbar=False, yticklabels=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

- **Summary of Dataset Findings**
  - **Size:** 6,745 requests and 6 columns.
  - **Quality:** No duplicate entries; data is clean and unique.
  - **Completion Rate:** Only 42% of trips were completed, indicating a major supply-demand gap.
  - **Missing Data:** Driver id and Drop timestamp have missing values, but only for unfulfilled trips (logical gaps).
  - **Key Issue:** Timestamps are in mixed string formats and require standardization for analysis.
  - **Focus:** The study centers on Airport vs. City pickups and the reasons for trip failures.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
# List all columns in the dataset
print("Dataset Columns:", df.columns.tolist())


In [None]:
# Dataset Describe
# Code to generate these summaries
# Statistical summary of numerical variables
print("Numerical Summary:")
print(df.describe())

# Summary of categorical variables
print("\nCategorical Summary:")
print(df.describe(include='object'))


### Variables Description

In [None]:
# Quick description of variables and their types
for col in df.columns:
    print(f"Variable: {col}")
    print(f"Type: {df[col].dtype}")
    print(f"Unique Values: {df[col].nunique()}")
    # Removed: print(f"Description: {get_description(col)}") # Conceptual helper
    # Adding more descriptive output for a better understanding of variables
    if df[col].dtype == 'object':
        print(f"Top 5 Unique Values: {df[col].value_counts().head(5).index.tolist()}")
    else:
        print(f"Min: {df[col].min()}, Max: {df[col].max()}")
    print("-" * 30)

### Check Unique Values for each variable.

In [None]:
# Check unique values for each variable
for col in df.columns:
    print(f"Unique values in '{col}': {df[col].nunique()}")
    if df[col].nunique() < 10:
        print(f"Categories: {df[col].unique()}")
    print("-" * 30)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
# 1. Standardize Timestamps
# The data has mixed formats (e.g., 11/7/2016 and 13-07-2016).
# We use format='mixed' and dayfirst=True to handle this correctly.
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True, format='mixed')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], dayfirst=True, format='mixed')

# 2. Extract Time-Based Features
# Extracting the hour and day of the week helps identify peak demand periods.
df['Request Hour'] = df['Request timestamp'].dt.hour
df['Request Day'] = df['Request timestamp'].dt.day_name()

# 3. Define Time Slots
# Categorizing hours into slots makes the analysis more interpretable.
def get_time_slot(hour):
    if 5 <= hour < 10:
        return 'Early Morning'  # Morning Rush
    elif 10 <= hour < 17:
        return 'Day Time'
    elif 17 <= hour < 22:
        return 'Evening'        # Evening Rush
    else:
        return 'Late Night'

df['Time Slot'] = df['Request Hour'].apply(get_time_slot)

# 4. Define the Supply-Demand Gap
# Demand = Total Requests
# Supply = Trip Completed
# Gap = Cancelled or No Cars Available
df['Gap Status'] = df['Status'].apply(lambda x: 'Gap' if x != 'Trip Completed' else 'No Gap')

# 5. Final Check
# View the newly created columns
df[['Request timestamp', 'Request Hour', 'Time Slot', 'Gap Status']].head()


What all manipulations have you done and insights you found?

- **Data Manipulations:**
    - **Standardization:** Converted mixed-format date strings into uniform datetime objects.
    - **Extraction:** Derived Request Hour and Day to pinpoint when demand occurs.
    - **Categorization:** Grouped hours into Time Slots (e.g., Early Morning, Evening) for easier analysis.
    - **Labeling:** Created a Gap Status column to easily count unfulfilled vs. completed trips.

- **Key Insights:**
    - **The Evening Crisis:** A 67% gap exists in the Evening (5 PM–10 PM), caused by a total lack of cars at the Airport.
    - **The Morning Problem:** High Cancellations occur in the Early Morning (5 AM–10 AM) for trips starting in the City.
    - **Low Efficiency:** Only 42% of all requests are completed; 58% result in a service failure.
    - **Location Split:** The Airport suffers from under-supply, while the City suffers from driver behavior (cancellations).

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load the dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert Request timestamp to datetime
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")

# Extract hour from timestamp
df["Hour"] = df["Request timestamp"].dt.hour

# Demand: total ride requests per hour
hourly_demand = df.groupby("Hour")["Request id"].count()

# Supply: completed trips per hour
hourly_supply = (
    df[df["Status"] == "Trip Completed"]
    .groupby("Hour")["Request id"]
    .count()
)

# Plot Demand vs Supply
plt.figure()
plt.plot(hourly_demand.index, hourly_demand.values, label="Demand")
plt.plot(hourly_supply.index, hourly_supply.values, label="Supply")

plt.xlabel("Hour of Day")
plt.ylabel("Number of Trips")
plt.title("Chart 1: Hourly Demand vs Supply")
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

1. The problem is **time-based** (demand and supply vary by hour).
2. A **line chart** best shows **trends over time**.
3. It allows **easy comparison** between demand and supply.
4. The **gap is clearly visible** during peak hours.
5. It supports **quick business decision-making** and storytelling.


##### 2. What is/are the insight(s) found from the chart?

1. Demand is **consistently higher than supply** across most hours.
2. The **largest supply–demand gap** occurs during **morning (5–10 AM)** and **evening (5–9 PM)** peak hours.
3. Supply does **not scale proportionally** with rising demand.
4. Peak-hour gaps explain **“No Cars Available”** and **trip cancellations**.
5. Off-peak hours show **lower demand and relatively balanced supply**.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Helps identify **peak hours with supply shortage**.
2. Enables **better driver allocation and incentives**.
3. Reduces cancellations and improves **customer satisfaction**.
4. Increases **revenue and demand fulfillment**.

---

**⚠️ Negative Growth Risks :**

1. **Unmet demand** leads to lost revenue.
2. Repeated unavailability causes **customer churn**.
3. Poor peak-hour service damages **brand trust**.

---

- **Conclusion :**
  - Acting on these insights leads to positive growth.
  - Ignoring them can result in customer churn, revenue loss, and negative brand impact.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp to datetime
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")

# Create pivot table
pivot = pd.pivot_table(
    df,
    index="Pickup point",
    columns="Status",
    values="Request id",
    aggfunc="count"
)

# Plot bar chart
plt.figure()
pivot.plot(kind="bar")

plt.xlabel("Pickup Point")
plt.ylabel("Number of Requests")
plt.title("Chart 2: Trip Status by Pickup Point (City vs Airport)")
plt.show()

##### 1. Why did you pick the specific chart?

1. The goal was to **compare performance across locations** (City vs Airport).
2. A **bar chart** is best for **categorical comparisons**.
3. It clearly shows differences in **trip outcomes (Completed, Cancelled, No Cars Available)**.
4. Makes **location-specific supply issues** easy to identify.
5. Helps stakeholders decide **where operational improvements are needed**.

##### 2. What is/are the insight(s) found from the chart?

1. **Airport pickups face severe supply shortages**, shown by high “No Cars Available” cases.
2. **City trips have more cancellations** compared to airport trips.
3. **Trip completion is higher in the City** than at the Airport.
4. Supply–demand issues **vary by pickup location**, not uniformly across the system.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Helps identify **location-specific issues** (Airport vs City).
2. Enables **targeted driver allocation and incentives** at airports.
3. Improves **trip completion rates** and customer satisfaction.
4. Reduces revenue loss from unmet demand.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High “No Cars Available” at airports**

   * Leads to lost high-value trips and customer dissatisfaction.
2. **High cancellation rates in the city**

   * Indicates unreliable service, increasing customer churn.
3. **Poor airport availability**

   * Damages brand perception among frequent travelers.

---

- **Conclusion :** Acting on these insights drives growth; ignoring them can result in revenue loss and customer attrition.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp and extract hour
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Hour"] = df["Request timestamp"].dt.hour

# Filter No Cars Available and group by hour
no_cars_hourly = (
    df[df["Status"] == "No Cars Available"]
    .groupby("Hour")["Request id"]
    .count()
)

# Plot bar chart
plt.figure()
plt.bar(no_cars_hourly.index, no_cars_hourly.values)
plt.xlabel("Hour of Day")
plt.ylabel("Number of Requests")
plt.title("Chart 3: No Cars Available by Hour")
plt.show()

##### 1. Why did you pick the specific chart?

1. The issue **“No Cars Available” is time-dependent**.
2. A **bar chart** clearly shows **frequency by hour**.
3. It helps **identify peak failure hours** instantly.
4. Easy to compare **high vs low problem periods**.
5. Supports **operational decision-making** like driver incentives and surge planning.

##### 2. What is/are the insight(s) found from the chart?

1. **“No Cars Available” peaks during morning and evening hours**.
2. Peak hours show **severe supply shortages**.
3. Off-peak hours have **significantly fewer failures**.
4. Supply does **not scale with rising demand**.
5. These hours contribute most to **lost trips and revenue**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Identifies **exact hours with severe car unavailability**.
2. Enables **hour-based driver incentives and surge planning**.
3. Helps reduce **lost trips and improve fulfillment rate**.
4. Improves **customer experience and retention**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High “No Cars Available” during peak hours**

   * Customers face repeated failures and may **switch to competitors**.
2. **Unserved peak demand**

   * Direct **revenue loss** during high-value time slots.
3. **Poor service reliability at peak times**

   * Damages **brand trust** and long-term usage.

---

- **Conclusion :** Acting on these insights drives growth; ignoring them risks customer churn and revenue decline.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp and extract hour
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Hour"] = df["Request timestamp"].dt.hour

# Filter cancelled trips and count by hour
cancelled_hourly = (
    df[df["Status"] == "Cancelled"]
    .groupby("Hour")["Request id"]
    .count()
)

# Plot bar chart
plt.figure()
plt.bar(cancelled_hourly.index, cancelled_hourly.values)
plt.xlabel("Hour of Day")
plt.ylabel("Number of Cancellations")
plt.title("Chart 4: Trip Cancellations by Hour")
plt.show()

##### 1. Why did you pick the specific chart?

1. Cancellations are **time-dependent**, so hourly analysis is required.
2. A **bar chart** clearly shows the number of cancellations per hour.
3. It helps identify **peak hours with maximum cancellations**.
4. Easy to compare **high- and low-cancellation periods**.
5. Supports decisions to **reduce wait times and improve driver availability**.

##### 2. What is/are the insight(s) found from the chart?

1. **Trip cancellations are highest during peak demand hours**.
2. Morning and evening periods show **maximum cancellations**.
3. High cancellations indicate **long wait times or driver unavailability**.
4. Off-peak hours have **very low cancellation rates**.
5. Cancellations contribute directly to **poor customer experience and lost revenue**.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Identifies **peak hours with high cancellations**.
2. Enables **better driver allocation and incentive planning**.
3. Helps reduce **wait times and cancellation rates**.
4. Improves **customer satisfaction and revenue**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High cancellations during peak hours**

   * Customers lose trust and may switch to competitors.
2. **Unreliable service experience**

   * Leads to lower repeat usage and customer churn.
3. **Lost completed trips**

   * Directly impacts revenue and growth.

---

- **Conclusion :** Acting on these insights supports growth; ignoring them risks customer dissatisfaction and revenue loss.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")

# Demand: total requests by pickup point
demand = df.groupby("Pickup point")["Request id"].count()

# Supply: completed trips by pickup point
supply = (
    df[df["Status"] == "Trip Completed"]
    .groupby("Pickup point")["Request id"]
    .count()
)

# Calculate gap
gap = demand - supply

# Plot bar chart
plt.figure()
plt.bar(gap.index, gap.values)
plt.xlabel("Pickup Point")
plt.ylabel("Demand – Supply Gap")
plt.title("Chart 5: Supply–Demand Gap by Pickup Point")
plt.show()

##### 1. Why did you pick the specific chart?

1. It directly measures the **core problem – the supply–demand gap**.
2. Enables **clear comparison between City and Airport** locations.
3. A **bar chart** is best for comparing gap values across categories.
4. Highlights **where unmet demand is highest**.
5. Helps prioritize **location-specific business actions**.


##### 2. What is/are the insight(s) found from the chart?

1. **Airport has a significantly higher supply–demand gap** than the City.
2. Indicates **severe driver shortage at airport pickups**.
3. City performs better but still shows **unmet demand**.
4. Location is a **key factor** influencing supply shortages.
5. High airport gap results in **lost high-value trips and revenue**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Identifies **Airport as the highest priority problem area**.
2. Enables **location-specific driver incentives and deployment**.
3. Helps reduce **unmet demand and lost revenue**.
4. Improves **service reliability for high-value airport trips**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **Large supply–demand gap at airports**

   * Causes loss of premium trips and frequent traveler customers.
2. **Persistent unmet demand**

   * Directly results in **revenue leakage**.
3. **Poor airport experience**

   * Damages brand image and reduces repeat usage.

---

- **Conclusion :** Using these insights drives growth; ignoring them leads to revenue loss and customer churn.

#### Chart - 6

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp and extract day name
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Day"] = df["Request timestamp"].dt.day_name()

# Demand: total requests by day
daily_demand = df.groupby("Day")["Request id"].count()

# Supply: completed trips by day
daily_supply = (
    df[df["Status"] == "Trip Completed"]
    .groupby("Day")["Request id"]
    .count()
)

# Ensure correct day order and handle missing days
day_order = ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
daily_demand = daily_demand.reindex(day_order).fillna(0)
daily_supply = daily_supply.reindex(day_order).fillna(0)

# Plot
plt.figure()
plt.plot(daily_demand.index, daily_demand.values, label="Demand")
plt.plot(daily_supply.index, daily_supply.values, label="Supply")
plt.xlabel("Day of Week")
plt.ylabel("Number of Trips")
plt.title("Chart 6: Demand vs Supply by Day of Week")
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

1. The objective was to analyze **weekly demand–supply patterns**.
2. A **line chart** is best for showing **trends over time (days of the week)**.
3. It allows **direct comparison** between demand and supply.
4. Helps identify **weekdays with maximum gaps**.
5. Supports **weekly driver planning and scheduling decisions**.

##### 2. What is/are the insight(s) found from the chart?

1. **Weekdays show higher demand** compared to weekends.
2. **Supply consistently falls short of demand**, especially on working days.
3. The **largest demand–supply gap occurs on weekdays**, indicating office/airport travel impact.
4. **Weekends have lower demand and a relatively balanced supply**.
5. Weekly demand patterns are **predictable**, enabling better planning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Helps Uber **plan driver availability by day of the week**.
2. Enables **higher incentives on high-demand weekdays**.
3. Improves **trip fulfillment and customer satisfaction**.
4. Increases **revenue by reducing unmet weekday demand**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **Persistent weekday supply shortage**

   * Leads to missed trips during high-revenue periods.
2. **Unmet weekday demand**

   * Causes customer frustration and **churn to competitors**.
3. **Inconsistent weekday service**

   * Damages brand reliability for daily commuters.

---

- **Conclusion :** Acting on these insights supports growth, while ignoring them can result in revenue loss and customer attrition.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp and extract hour
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Hour"] = df["Request timestamp"].dt.hour

# Demand per hour
hourly_demand = df.groupby("Hour")["Request id"].count()

# Supply per hour
hourly_supply = (
    df[df["Status"] == "Trip Completed"]
    .groupby("Hour")["Request id"]
    .count()
)

# Calculate gap
hourly_gap = hourly_demand - hourly_supply

# Plot bar chart
plt.figure()
plt.bar(hourly_gap.index, hourly_gap.values)
plt.xlabel("Hour of Day")
plt.ylabel("Demand – Supply Gap")
plt.title("Chart 7: Hourly Demand–Supply Gap")
plt.show()

##### 1. Why did you pick the specific chart?

1. It **directly quantifies the supply–demand gap** instead of showing demand and supply separately.
2. A **bar chart** clearly shows the magnitude of the problem by hour.
3. Highlights **peak hours with the largest unmet demand**, which is critical for operational planning.
4. Easy for stakeholders to **see exactly when interventions are needed**.
5. Supports **data-driven decision-making** like surge pricing or driver incentives.

##### 2. What is/are the insight(s) found from the chart?

1. **Maximum supply–demand gap occurs during morning and evening peak hours**.
2. Midday and late-night hours show **smaller gaps**, meaning supply meets demand better.
3. Confirms that **peak-hour shortages are the main operational issue**.
4. Clearly identifies **exact hours that need driver allocation or incentives**.
5. Supports **targeted interventions to reduce unmet demand and revenue loss**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Identifies **exact hours with the largest supply–demand gaps**.
2. Enables **hour-specific driver allocation and surge pricing**.
3. Helps reduce **lost trips and revenue leakage**.
4. Improves **customer satisfaction by meeting demand during peak hours**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **Persistent large gaps during peak hours**

   * Leads to unserved trips and **lost revenue opportunities**.
2. **Customer dissatisfaction** due to unavailability

   * May result in **churn to competitors**.
3. **Operational inefficiency** if gaps are ignored

   * Increases cancellations and damages **brand trust**.

---

- **Conclusion :** Acting on these insights drives growth, while ignoring them risks revenue loss and negative customer experience.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Total requests by pickup point
total_requests = df.groupby("Pickup point")["Request id"].count()

# Cancelled requests by pickup point
cancelled_requests = (
    df[df["Status"] == "Cancelled"]
    .groupby("Pickup point")["Request id"]
    .count()
)

# Cancellation rate
cancellation_rate = (cancelled_requests / total_requests) * 100

# Plot bar chart
plt.figure()
plt.bar(cancellation_rate.index, cancellation_rate.values)
plt.xlabel("Pickup Point")
plt.ylabel("Cancellation Rate (%)")
plt.title("Chart 8: Cancellation Rate by Pickup Point")
plt.show()

##### 1. Why did you pick the specific chart?

1. The goal was to compare **cancellation behavior across locations**.
2. A **bar chart** clearly shows differences between City and Airport.
3. Using **cancellation rate (%)** gives a fair comparison instead of raw counts.
4. Helps identify **where service reliability is weaker**.
5. Supports **location-specific operational improvements**.

##### 2. What is/are the insight(s) found from the chart?

1. **City pickup point has a higher cancellation rate** than the Airport.
2. Indicates **service reliability issues in city trips**.
3. Airport trips are **more stable once a driver is assigned**.
4. Location significantly impacts **customer and driver behavior**.
5. High city cancellations lead to **poor customer experience and lost revenue**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Helps identify **locations with higher cancellation rates**.
2. Enables **location-specific strategies** (better ETAs, driver incentives, matching logic).
3. Reduces cancellations, improving **trip completion and revenue**.
4. Enhances **customer trust and satisfaction**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High cancellation rate in City areas**

   * Customers may lose trust and switch to competitors.
2. **Unreliable city service**

   * Leads to repeat cancellations and **customer churn**.
3. **Lost completed trips**

   * Directly impacts **revenue and growth**.

---

- **Conclusion :** Acting on these insights drives positive growth; ignoring them risks customer dissatisfaction and revenue loss.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Count trip status
status_counts = df["Status"].value_counts()

# Plot bar chart
plt.figure()
plt.bar(status_counts.index, status_counts.values)
plt.xlabel("Trip Status")
plt.ylabel("Number of Requests")
plt.title("Chart 9: Overall Trip Status Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

1. It provides a **high-level overview of all trip outcomes**.
2. Helps quickly compare **successful vs failed requests**.
3. A **bar chart** clearly shows the proportion of each status.
4. Highlights the **overall scale of operational issues**.
5. Sets context for deeper supply–demand analysis.

##### 2. What is/are the insight(s) found from the chart?

1. A **large share of ride requests do not get completed**.
2. **“No Cars Available” and cancellations together form a significant portion** of outcomes.
3. Indicates a **persistent supply–demand imbalance** in the system.
4. Shows **high revenue leakage** due to failed trips.
5. Highlights the need for **improving supply availability and reliability**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Clearly quantifies the **scale of failed trips**, creating urgency for action.
2. Supports **supply optimization and driver onboarding strategies**.
3. Helps reduce **revenue leakage** from unfulfilled requests.
4. Improves **overall service reliability and customer trust**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High proportion of failed trips**

   * Customers face repeated failures and may **switch to competitors**.
2. **Frequent “No Cars Available” cases**

   * Results in **lost demand and direct revenue loss**.
3. **System-wide inefficiency**

   * Damages brand perception and **reduces long-term usage**.

---

- **Conclusion :** Acting on these insights enables growth, while ignoring them risks customer churn, revenue loss, and brand damage.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp and extract hour
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Hour"] = df["Request timestamp"].dt.hour

# Total requests per hour
total_requests = df.groupby("Hour")["Request id"].count()

# Cancelled requests per hour
cancelled_requests = (
    df[df["Status"] == "Cancelled"]
    .groupby("Hour")["Request id"]
    .count()
)

# Cancellation rate (%)
cancellation_rate = (cancelled_requests / total_requests) * 100

# Replace NaN with 0
cancellation_rate = cancellation_rate.fillna(0)

# Plot
plt.figure()
plt.plot(cancellation_rate.index, cancellation_rate.values)
plt.xlabel("Hour of Day")
plt.ylabel("Cancellation Rate (%)")
plt.title("Chart 10: Hourly Cancellation Rate")
plt.show()

##### 1. Why did you pick the specific chart?

1. It shows **cancellation behavior as a rate**, not just raw counts.
2. Allows **fair comparison across different hours** with varying demand.
3. A **line chart** clearly captures **hourly trends**.
4. Helps identify **hours with poor service reliability**.
5. Supports **targeted actions to reduce cancellations during peak times**.

##### 2. What is/are the insight(s) found from the chart?

1. **Cancellation rate is highest during peak hours**.
2. Indicates **longer wait times and service pressure** when demand is high.
3. Off-peak hours show **lower and more stable cancellation rates**.
4. Confirms that **service reliability drops during peak demand periods**.
5. Highlights specific hours where **improving driver availability can reduce cancellations**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Identifies **hours with high cancellation rates**, enabling targeted fixes.
2. Supports **better driver allocation and incentive planning** during peak hours.
3. Helps reduce **cancellations and increase completed trips**.
4. Improves **customer experience and retention**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High cancellation rates during peak hours**

   * Customers experience unreliable service and may **switch to competitors**.
2. **Repeated peak-hour failures**

   * Leads to **customer dissatisfaction and churn**.
3. **Lost completed trips**

   * Directly causes **revenue loss**.

---

- **Conclusion :** Acting on these insights supports growth; ignoring them risks customer churn, revenue loss, and damage to brand reliability.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp and extract hour
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Hour"] = df["Request timestamp"].dt.hour

# Demand by Pickup point and Hour
demand = df.groupby(["Pickup point", "Hour"])["Request id"].count().unstack()

# Supply by Pickup point and Hour
supply = (
    df[df["Status"] == "Trip Completed"]
    .groupby(["Pickup point", "Hour"])["Request id"]
    .count()
    .unstack()
)

# Fill missing values
demand = demand.fillna(0)
supply = supply.fillna(0)

# Plot for City
plt.figure()
plt.plot(demand.columns, demand.loc["City"], label="City Demand")
plt.plot(supply.columns, supply.loc["City"], label="City Supply")
plt.title("Chart 11A: City – Demand vs Supply by Hour")
plt.xlabel("Hour")
plt.ylabel("Number of Trips")
plt.legend()
plt.show()

# Plot for Airport
plt.figure()
plt.plot(demand.columns, demand.loc["Airport"], label="Airport Demand")
plt.plot(supply.columns, supply.loc["Airport"], label="Airport Supply")
plt.title("Chart 11B: Airport – Demand vs Supply by Hour")
plt.xlabel("Hour")
plt.ylabel("Number of Trips")
plt.legend()
plt.show()

##### 1. Why did you pick the specific chart?

1. It analyzes **demand and supply together across both time and location**.
2. Helps identify **peak-hour gaps separately for City and Airport**.
3. A **line chart** clearly shows hourly trends for each pickup point.
4. Enables **more granular and actionable insights** than overall hourly charts.
5. Supports **targeted, location-specific operational decisions**.

##### 2. What is/are the insight(s) found from the chart?

1. **Airport experiences a much larger demand–supply gap during peak hours**.
2. City demand is **more evenly distributed**, but still faces shortages at peaks.
3. Airport demand shows **sharp spikes at specific hours**, indicating travel-related patterns.
4. Supply does **not scale adequately with airport peak demand**.
5. Confirms that supply issues are **both time- and location-dependent**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Enables **hour-wise and location-specific driver deployment**.
2. Helps prioritize **airport peak hours**, which are high-revenue trips.
3. Supports **targeted incentives and surge strategies**.
4. Reduces unmet demand, improving **customer satisfaction and retention**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **Large airport peak-hour gaps**

   * Causes loss of high-value airport rides and frequent travelers.
2. **Supply not scaling with airport demand**

   * Results in repeated failures and **customer churn**.
3. **Location-specific service inconsistency**

   * Damages brand reliability and long-term usage.

---

- **Conclusion :** Acting on these insights drives growth and efficiency, while ignoring them can lead to revenue loss, customer dissatisfaction, and negative brand impact.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Total demand
total_demand = df["Request id"].count()

# Completed trips
completed_trips = df[df["Status"] == "Trip Completed"]["Request id"].count()

# Failed trips (Cancelled + No Cars Available)
failed_trips = df[df["Status"] != "Trip Completed"]["Request id"].count()

# Create data for plotting
labels = ["Total Demand", "Completed Trips", "Failed Trips"]
values = [total_demand, completed_trips, failed_trips]

# Plot bar chart
plt.figure()
plt.bar(labels, values)
plt.xlabel("Trip Category")
plt.ylabel("Number of Requests")
plt.title("Chart 12: Demand vs Completed vs Failed Trips")
plt.show()

##### 1. Why did you pick the specific chart?

1. It provides a **clear end-to-end view of system performance**.
2. Directly compares **total demand vs successful vs failed trips**.
3. A **bar chart** makes the gap and losses easy to understand.
4. Highlights **revenue leakage at a glance**.
5. Strong for **final conclusions and business storytelling**.

##### 2. What is/are the insight(s) found from the chart?

1. A **large portion of total demand is not converted into completed trips**.
2. **Failed trips (Cancelled + No Cars Available)** represent significant business loss.
3. Indicates **major revenue leakage** due to supply and reliability issues.
4. Shows that improving supply can **directly increase completed trips**.
5. Highlights the need for **system-wide operational improvements**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Clearly shows **where demand is being lost**, helping focus improvement efforts.
2. Supports **better supply planning and driver onboarding**.
3. Reducing failed trips can **directly increase revenue**.
4. Improves **customer experience and trust** by increasing completion rates.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **High proportion of failed trips**

   * Leads to customer frustration and **churn to competitors**.
2. **Unconverted demand**

   * Represents **direct revenue loss**.
3. **System-wide inefficiency**

   * Damages brand perception and **reduces long-term usage**.

---

- **Conclusion :** Acting on these insights drives positive growth; ignoring them risks revenue loss, customer dissatisfaction, and negative brand impact.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("Uber Request Data.csv")

# Convert timestamp and extract hour
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")
df["Hour"] = df["Request timestamp"].dt.hour

# Hourly demand
hourly_requests = df.groupby("Hour")["Request id"].count()

# Plot
plt.figure()
plt.plot(hourly_requests.index, hourly_requests.values)
plt.xlabel("Hour of Day")
plt.ylabel("Number of Requests")
plt.title("Chart 13: Hourly Ride Request Distribution")
plt.show()

##### 1. Why did you pick the specific chart?

1. It focuses purely on **customer demand patterns** by hour.
2. A **line chart** clearly shows hourly trends and peaks.
3. Helps identify **natural demand peaks** without supply influence.
4. Useful for **forecasting and proactive planning**.
5. Supports **better driver scheduling before shortages occur**.

##### 2. What is/are the insight(s) found from the chart?

1. Ride requests **peak during morning and evening hours**.
2. Demand is **lowest during late-night and early-morning hours**.
3. Indicates **commute- and airport-driven travel patterns**.
4. Demand follows a **predictable daily cycle**.
5. Peak demand hours are **primary pressure points for supply**.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**✅ Positive Business Impact :**

1. Enables **accurate demand forecasting** by hour.
2. Supports **proactive driver scheduling and incentives** before peak demand.
3. Helps reduce **supply shortages and failed trips**.
4. Improves **service reliability and customer satisfaction**.

---

**⚠️ Insights Leading to Negative Growth (with reasons) :**

1. **Ignoring predictable peak demand hours**

   * Leads to unmet demand and **lost revenue**.
2. **Poor preparation for demand spikes**

   * Causes cancellations and **customer churn**.
3. **Repeated peak-hour failures**

   * Damages brand trust and **long-term usage**.

---

- **Conclusion :** Using these insights drives growth; ignoring them risks revenue loss and customer dissatisfaction.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")

# Feature engineering
df["Hour"] = df["Request timestamp"].dt.hour
df["Completed"] = (df["Status"] == "Trip Completed").astype(int)
df["Cancelled"] = (df["Status"] == "Cancelled").astype(int)
df["No_Cars"] = (df["Status"] == "No Cars Available").astype(int)

# Select numerical columns
corr_df = df[["Hour", "Completed", "Cancelled", "No_Cars"]]

# Correlation matrix
corr = corr_df.corr()

# Plot heatmap
plt.figure()
plt.imshow(corr)
plt.colorbar()
plt.xticks(range(len(corr.columns)), corr.columns, rotation=45)
plt.yticks(range(len(corr.columns)), corr.columns)
plt.title("Chart 14: Correlation Heatmap")
plt.show()

##### 1. Why did you pick the specific chart?

1. To **identify relationships between key variables** at a glance.
2. Helps validate whether **time (hour)** influences trip outcomes.
3. A heatmap makes **strength and direction of relationships** easy to interpret.
4. Supports **data-backed justification** of earlier findings.
5. Useful as an **advanced analytical visualization**.


##### 2. What is/are the insight(s) found from the chart?

1. **Hour is positively correlated** with cancellations and “No Cars Available”.
2. **Completed trips are negatively correlated** with cancellations and no cars.
3. As **failure cases increase, successful trips decrease**.
4. Confirms that **time of day impacts service reliability**.
5. No extreme correlations → issues are **operational and time-driven**, not random.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv("/content/Uber Request Data.csv")

# Convert timestamp
df["Request timestamp"] = pd.to_datetime(df["Request timestamp"], errors="coerce")

# Feature engineering
df["Hour"] = df["Request timestamp"].dt.hour
df["Completed"] = (df["Status"] == "Trip Completed").astype(int)
df["Cancelled"] = (df["Status"] == "Cancelled").astype(int)
df["No_Cars"] = (df["Status"] == "No Cars Available").astype(int)

# Pair plot
sns.pairplot(df[["Hour", "Completed", "Cancelled", "No_Cars"]])
plt.suptitle("Chart 15: Pair Plot of Hour vs Trip Outcomes", y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

1. To **analyze relationships between multiple variables at once**.
2. Shows both **individual distributions** and **pairwise interactions**.
3. Helps validate patterns seen in earlier charts and the correlation heatmap.
4. Useful for **advanced exploratory data analysis (EDA)**.
5. Strengthens conclusions with **visual evidence of relationships**.


##### 2. What is/are the insight(s) found from the chart?

1. **Trip failures (Cancelled, No Cars Available) increase during certain hours**.
2. **Completed trips show an inverse pattern** compared to failure cases.
3. Confirms that **time (hour) influences trip outcomes**.
4. No strong linear relationships across all variables, indicating **multiple operational factors** affect performance.
5. Supports earlier findings of **peak-hour supply stress**.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

- These steps will **reduce unmet demand, increase completed trips, improve customer satisfaction, and drive revenue growth**.

1. **Increase driver availability during peak hours**
   Focus on morning and evening peaks using dynamic incentives and surge pricing.

2. **Improve airport-specific supply planning**
   Deploy dedicated drivers and higher incentives at airports to reduce “No Cars Available”.

3. **Optimize driver scheduling using demand forecasts**
   Use hourly and day-wise demand patterns to proactively align supply.

4. **Reduce cancellations through better matching & ETAs**
   Improve pickup-time accuracy and driver–rider matching to lower cancellation rates.

5. **Use data-driven monitoring and alerts**
   Track real-time gaps and failure rates to take immediate operational action.


# **Conclusion -**

The analysis clearly shows that Uber’s main challenge is a **supply–demand mismatch during peak hours**, especially at **airport locations**. High numbers of **“No Cars Available” cases and cancellations** lead to lost revenue and poor customer experience. Demand patterns are **predictable by hour and day**, which provides a strong opportunity for proactive planning. By improving **driver availability during peak times**, using **location- and time-based incentives**, and reducing cancellations through better matching and scheduling, Uber can significantly **increase trip completion rates, customer satisfaction, and overall business performance**.


### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***