# **Project Name**    - Uber Supply Demand and Gap Analysis



##### **Project Type**    - EDA
##### **Contribution**    - Individual
##### **Team Member 1 -** Ashwin kanth Marapally

# **Project Summary -**

This project uses Python to perform exploratory data analysis (EDA) on a cleaned Uber ride request dataset. The dataset includes detailed information such as request timestamps, pickup points, ride statuses, and derived fields like request hour, daypart, and gap type. Our analysis focuses on identifying demand surges, supply shortfalls, and uncovering trends that can lead to better decision-making for operational improvements.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


Uber faced a significant mismatch between rider demand and driver supply, especially during peak hours and at high-traffic pickup points like airports. This resulted in high cancellation rates and unfulfilled ride requests, leading to revenue loss and customer dissatisfaction.

The objective of this project is to analyze the Uber request data using Python and identify patterns that contribute to this supply-demand gap. The ultimate goal is to derive actionable insights that can help Uber optimize driver allocation, reduce cancellations, and better meet user demand.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files

### Dataset Loading

In [None]:
# Load Dataset

uber_request_dataset = pd.read_csv('/content/Uber Request Data.csv')

### Dataset First View

In [None]:
# Dataset First Look

uber_request_dataset.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count

print("Rows: ", uber_request_dataset.shape[0])
print("Columns: ", uber_request_dataset.shape[1])

### Dataset Information

In [None]:
# Dataset Info
uber_request_dataset.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
uber_request_dataset.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

missing_values = pd.DataFrame({
    "Missing values": uber_request_dataset.isnull().sum(),
    "Missing percentage (%)": uber_request_dataset.isnull().mean()*100
}).sort_values(by="Missing values", ascending=False)

print(missing_values)


In [None]:
# Visualizing the missing values

plt.figure(figsize=(10, 6))
sns.heatmap(uber_request_dataset.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

* The dataset contains Uber ride request records, including key details such as timestamps, pickup locations, driver IDs, and trip statuses.

* The dataset consists of 6745 ride requests.

**Missing Values:**

* Driver id: Missing in ~39% of records, typically for unassigned trips

* Drop timestamp: Missing in ~58% of rows, often for trips that were not completed

* These missing values were retained to reflect real operational gaps

**Date/Time Format:**

Mixed timestamp formats were cleaned and standardized to enable time-based analysis. Request hour and Daypart were extracted for better time segmentation.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns

uber_request_dataset.columns.tolist()

In [None]:
# Dataset Describe

uber_request_dataset.describe()

### Variables Description

**Request id:** Unique identifier for each ride

**Pickup point:** Indicates pickup location - either City or Airport

**Driver id:** ID assigned when a driver is allotted

**Status:** Ride outcome — Trip Completed, Cancelled, or No Cars Available

**Request timestamp & Drop timestamp:** Times of request and trip completion

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

uber_request_dataset.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Create copy of the dataset

uber_request_dataset_copy = uber_request_dataset.copy()

In [None]:
# Data type conversion

# Step 1: First conversion attempt
rt1 = pd.to_datetime(uber_request_dataset_copy['Request timestamp'], dayfirst=True, errors='coerce')
dt1 = pd.to_datetime(uber_request_dataset_copy['Drop timestamp'], dayfirst=True, errors='coerce')

# Step 2: Find the rows where conversion failed (NaT)
failed_rt = uber_request_dataset_copy['Request timestamp'][rt1.isna()]
failed_dt = uber_request_dataset_copy['Drop timestamp'][dt1.isna()]

# Step 3: Retry failed rows with alternate format (dash instead of slash)
# Note: We assume these rows may be in another format like '%d-%m-%Y %H:%M:%S'
rt2 = pd.to_datetime(failed_rt, format='%d-%m-%Y %H:%M:%S', errors='coerce')
dt2 = pd.to_datetime(failed_dt, format='%d-%m-%Y %H:%M:%S', errors='coerce')

# Step 4: Fill in missing values from second attempt
rt1[rt1.isna()] = rt2
dt1[dt1.isna()] = dt2

# Step 5: Assign back to DataFrame
uber_request_dataset_copy['Request timestamp'] = rt1
uber_request_dataset_copy['Drop timestamp'] = dt1

In [None]:
# Adding new columns "Request hour" and "Daypart"

uber_request_dataset_copy["Request hour"] = uber_request_dataset_copy['Request timestamp'].dt.hour

uber_request_dataset_copy["Daypart"] = pd.cut(uber_request_dataset_copy['Request hour'], bins = [-1, 4, 8, 12, 16, 20, 23], labels = ['Late Night', 'Early Morning', 'Morning', 'Afternoon', 'Evening', 'Night'])


In [None]:
# Classification of gap types based on trip status

def classifg_gap(status):
  if status == "Cancelled":
    return "Cancellation"
  elif status == "No Cars Available":
    return "No Availability"
  elif status == "Trip Completed":
    return "No Gap"
  else:
    return "Unknown"

uber_request_dataset_copy['Gap Type'] = uber_request_dataset_copy['Status'].apply(classifg_gap)

In [None]:
# cross checking final dataset

uber_request_dataset_copy.head()

In [None]:
# Saving cleaned dataset for furthur analysis

uber_request_dataset_copy.to_excel("Uber_Valid_Cleaned.xlsx", index=False)
uber_request_dataset_copy.to_csv("Uber_Valid_Cleaned.csv", index=False)


files.download("Uber_Valid_Cleaned.xlsx")
files.download("Uber_Valid_Cleaned.csv")

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 Status Distribution

sns.countplot(data=uber_request_dataset_copy,  x='Status',hue="Status", palette='pastel', legend=False)
plt.title('Status Distribution')
plt.xlabel('Status')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

A countplot is ideal for visualizing the frequency of each category in a column like Status.


##### 2. What is/are the insight(s) found from the chart?

**Insight:** Only 42% of requests are completed. The rest fail due to No Cars Available or Cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact:** Highlights a major supply gap. Improve driver availability and reduce cancellations with incentive models.

#### Chart - 2

In [None]:
# Chart - 2 Requests by Pickup Point

sns.countplot(data=uber_request_dataset_copy, x="Pickup point",hue="Status", palette="Set2", legend=False)
plt.title('Trip Status by Pickup Point')
plt.xlabel("Pickup Point")
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

A grouped bar chart (via hue) allows comparison of statuses across each pickup point.

##### 2. What is/are the insight(s) found from the chart?

**Insight:**  Airport has higher cancellations and no car availability compared to City.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact:** Airport needs targeted driver allocation. Increase driver shifts near Airport during peak times.

#### Chart - 3

In [None]:
# Chart - 3 Requests per Hour

plt.figure(figsize=(10, 6))
sns.countplot(data=uber_request_dataset_copy, x="Request hour", color="skyblue")
plt.title('Requests per Hour of Day')
plt.xlabel("Request Hour")
plt.ylabel("Count")
plt.show()


##### 1. Why did you pick the specific chart?

Shows demand fluctuations by hour - great for spotting rush hours.

##### 2. What is/are the insight(s) found from the chart?

**Insight:** Requests peak between 7–10 AM and 5–9 PM.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact:** These are rush hours; failing to serve them reduces profit and loyalty. Prioritize driver coverage during these windows.

#### Chart - 4

In [None]:
# Chart - 4 Gap Type by Daypart

plt.figure(figsize=(10, 6))
sns.countplot(data=uber_request_dataset_copy, x="Daypart", hue="Gap Type", palette='Accent')
plt.title('Gap Type by Daypart')
plt.xlabel("Daypart")
plt.ylabel("Count")
plt.show()

##### 1. Why did you pick the specific chart?

Stacked bar helps compare gap reasons across time of day.

##### 2. What is/are the insight(s) found from the chart?

**Insight**: No Cars Available dominates during Evening, while Cancellations are high in Early Morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact**: Tailored actions can reduce specific gaps per time block. Offer driver incentives in evenings and improve app experience.

#### Chart - 5

In [None]:
# Chart - 5 Trip Completion Rate by Pickup Point

plt.figure(figsize=(10, 6))
completion_data = uber_request_dataset_copy[uber_request_dataset_copy['Status'] == 'Trip Completed']['Pickup point'].value_counts()
completion_data.plot.pie(autopct='%1.1f%%', startangle=90, colors=['lightgreen', 'pink'])
plt.title('Trip Completion by Pickup Point')
plt.show()

##### 1. Why did you pick the specific chart?

Pie charts are best for comparing proportional contributions to a whole.

##### 2. What is/are the insight(s) found from the chart?

**Insight:** City accounts for more than 50% of completed trips.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact:** Airport zone is underperforming.Evaluate Airport-specific issues like waiting time or miscommunication.

#### Chart - 6

In [None]:
# Chart - 6 Request Hour vs Pickup Point

plt.figure(figsize=(10, 6))
heatmap_data = pd.crosstab(uber_request_dataset_copy['Request hour'], uber_request_dataset_copy['Pickup point'])
sns.heatmap(heatmap_data, annot=True, fmt='d', cmap='Blues')
plt.title('Requests by Hour and Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Request Hour')
plt.show()


##### 1. Why did you pick the specific chart?

Heatmaps are excellent for identifying patterns in 2D categorical relationships.

##### 2. What is/are the insight(s) found from the chart?

**Insight:** High demand overlaps at Airport (7–9 AM, 5–9 PM) but supply isn’t scaling.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact:** Peak-hour mismatch affects revenue and trust.Both time and location must guide real-time driver placement.

#### Chart - 7

In [None]:
# Chart - 7 Cancelled Trips by Hour line chart

plt.figure(figsize=(10, 6))
cancelled_data = uber_request_dataset_copy[uber_request_dataset_copy['Status'] == 'Cancelled']
cancelled_data['Request hour'].value_counts().sort_index().plot(kind='line', marker='o')
plt.title('Cancelled Trips by Hour')
plt.xlabel('Request Hour')
plt.ylabel('Count')
plt.show()


##### 1. Why did you pick the specific chart?

Line charts help show trends over time.

##### 2. What is/are the insight(s) found from the chart?

**Insight:** Cancellation peaks in early morning and late evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**Impact:** These hours represent lost demand.Remind drivers and verify bookings before these windows.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

* Offering time-based incentives to drivers for better coverage during critical periods.

* Proactively allocating more drivers to high-demand zones like the airport

* Smart driver reallocation systems should predict high-demand periods based on request hour and pickup point.

* In-app nudges for ride confirmations can reduce cancellations in the morning.

* Real-time supply heatmaps can guide available drivers toward underserved zones during peak hours.

# **Conclusion**

The Python-based exploratory data analysis of Uber's ride request data has revealed critical insights into the supply-demand dynamics during the observed period. By visualizing patterns across trip statuses, request times, pickup points, and gap types, we identified clear evidence of operational inefficiencies—especially during peak hours and at high-traffic locations like the airport.

Our analysis showed that the majority of failed trips occurred due to either "No Cars Available" or "Cancelled" statuses, with these gaps peaking during morning and evening rush hours. The Airport pickup point consistently experienced a higher rate of trip failures compared to the City, indicating a significant imbalance in driver distribution. Additionally, the evening hours were especially underserved, contributing to unmet demand and lost revenue opportunities.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***