<a href="https://colab.research.google.com/github/Adimulam-SAAN/Uber-request/blob/main/Uber_Request_Data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -



##### **Project Type**    - Uber DataSet
##### **Contribution**    - A.Niharika
##### **Team Member 1 -**
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**

This report presents an analytical review of Uber trip request data to uncover key usage patterns, service gaps, and opportunities for improving operational efficiency. The dataset comprises 6,745 individual trip records, each containing details such as the request timestamp, pickup point, driver identification, trip status, and completion metrics. This data spans various hours of the day and days of the week, providing a well-rounded snapshot of short-term operational performance.

A central area of analysis is the trip status distribution, which is divided into three categories: “Trip Completed,” “Cancelled,” and “No Cars Available.” Out of 6,745 requests, only 2,830 trips were successfully completed, representing roughly 42% of the total. The remaining 58% were either cancelled by users or left unfulfilled due to no available drivers. This high failure rate clearly suggests a gap in Uber’s ability to meet demand during specific times or at certain locations, a critical issue for a service-driven platform.

When analyzing the hourly distribution of requests, clear peaks emerge during the morning hours (8 AM to 10 AM) and evening hours (5 PM to 7 PM). These times align with typical office commute hours. Unfortunately, these are also the time slots where the number of unsuccessful trip requests marked by cancellations and no cars available is the highest. This pattern points to a recurring mismatch between rider demand and driver availability, especially during peak periods, affecting user satisfaction and service reliability.

From a geographic perspective, trip requests primarily originate from two locations: Airport and City. Interestingly, trips originating from the airport experience more frequent cancellations and driver shortages than those from the city. Possible reasons include extended wait times at the airport, reluctance of drivers to accept longer trips, or a lack of incentives for servicing airport rides. Addressing these issues with targeted promotions or driver bonuses might reduce unfulfilled requests from this critical pickup point.

Another angle of analysis focuses on driver behavior and engagement. The “Driver id” field, though anonymized, allows tracking how many trips were completed by individual drivers. The data suggests that certain drivers are highly active, handling multiple requests within short timeframes, while others are less engaged. This disparity could be influenced by driver location, personal availability, or Uber's incentive structure. Better understanding these patterns can help optimize driver deployment and scheduling strategies, especially during high-demand periods.

The dataset also provides insights into day-of-week trends. Requests are generally higher during weekdays, particularly Mondays and Fridays, which reinforces the assumption that a significant portion of Uber’s traffic is driven by work-related travel. Demand tends to dip on weekends, suggesting that leisure-based travel does not yet compensate for the weekday commute load.

In conclusion, the dataset offers deep insights into Uber’s operational dynamics and highlights critical areas for improvement. Key challenges include peak-hour unavailability, uneven driver allocation, and location-specific inefficiencies, particularly at the airport. Solutions such as dynamic pricing, predictive demand modeling, better driver incentives, and real-time reallocation of drivers can help address these gaps. By implementing these strategic interventions, Uber can enhance rider experience, increase trip success rates, and achieve greater service reliability.




# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The Uber platform is facing high rates of trip cancellations and unfulfilled requests, especially during peak hours. There is a noticeable mismatch between rider demand and driver availability, particularly at key pickup points like the airport. This operational inefficiency impacts customer satisfaction and overall service reliability.


#### **Define Your Business Objective?**

To optimize Uber’s ride request fulfillment by reducing cancellations and unavailability through improved driver allocation, strategic scheduling, and data-driven demand forecasting. The goal is to enhance customer satisfaction, increase trip completion rates, and improve overall operational efficiency.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')


### Dataset Loading

In [None]:
from google.colab import files
uploaded = files.upload()

In [None]:
# Load Dataset
import pandas as pd
df = pd.read_csv ("Uber Request Data.csv")
print(df.head())

### Dataset First View

In [None]:
# Dataset First Look
import pandas as pd
df = pd.read_csv("Uber Request Data.csv")
print(df.head())
print(df.info())

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
import pandas as pd
df = pd.read_csv("Uber Request Data.csv")
print(f"Number of Rows: {df.shape[0]}")
print(f"Number of Columns: {df.shape[1]}")


### Dataset Information

In [None]:
# Dataset Info
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6745 entries, 0 to 6744
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype
---  ------             --------------  -----
 0   Request id         6745 non-null   int64
 1   Pickup point       6745 non-null   object
 2   Driver id          6745 non-null   object
 3   Status             6745 non-null   object
 4   Request timestamp  6745 non-null   object
 5   Drop timestamp     2830 non-null   object
 6   Request Date       6745 non-null   object
 7   Request Hour       6745 non-null   int64
 8   Day of week        6745 non-null   object
 9   Trip Completed     6745 non-null   bool
dtypes: bool(1), int64(2), object(7)
memory usage: 481.0+ KB

Dataset Description
This Uber dataset contains 6,745 ride request records with details like pickup location,
ride status, timestamps, and whether the trip was completed. It helps analyze ride patterns,
demand peaks, cancellations, and driver availability issues.
Request id  Unique identifier for each ride request.



#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
No duplicates

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
import pandas as pd
df = pd.read_csv('Uber Request Data.csv')
columns_to_check = ['Drop timestamp', 'Day of week', 'Driver id']
missing_counts = df[columns_to_check].isnull().sum()
for column, count in missing_counts.items():
    print(f"{column}: {count} missing values")

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.barplot(x=missing_counts.values, y=missing_counts.index, palette='magma')
plt.title('Missing Values per Column')
plt.xlabel('Number of Missing Values')
plt.ylabel('Column Name')
plt.grid(axis='x')
plt.tight_layout()
plt.show()


### What did you know about your dataset?

The Uber dataset contains 6,745 ride requests with details like pickup location, driver ID, trip status, and timestamps. About 58% of the trips were not completed due to cancellations or no car availability. Key columns such as Drop timestamp, Driver id, and Day of week have significant missing values. Peak demand occurs during early morning and evening hours, especially from the Airport, where service issues are most common.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
import pandas as pd
df = pd.read_csv('Uber Request Data.csv')
print(f"Number of Columns: {df.shape[1]}")


In [None]:
# Dataset Describe
import pandas as pd
df = pd.read_csv('Uber Request Data.csv')
print("Numeric Summary:")
print(df.describe())
print("\n Full Column Summary (including object types):")
print(df.describe(include='all'))


### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
import pandas as pd
df = pd.read_csv("Uber Request Data.csv")
for column in df.columns:
    unique_vals = df[column].unique()
    print(f"\n--- {column} ---")
    print(f"Total Unique Values: {len(unique_vals)}")
    print(f"Unique Values Sample: {unique_vals[:10]}")

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
import pandas as pd
df = pd.read_csv("Uber Request Data.csv")
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], errors='coerce')
df['Request Date'] = df['Request timestamp'].dt.date
df['Request Hour'] = df['Request timestamp'].dt.hour
df['Day of week'] = df['Request timestamp'].dt.day_name()
df['Driver id'].fillna('Unknown', inplace=True)
df['Trip Completed'] = df['Status'] == 'Trip Completed'
df = df.drop_duplicates()
df.reset_index(drop=True, inplace=True)
print(df.head())



### What all manipulations have you done and insights you found?

During the initial analysis of the Uber Request Data, several data manipulations were performed to prepare the dataset for insights. First, the timestamps were parsed to extract new features like Request Date, Request Hour, and Day of Week. A new boolean column, Trip Completed, was created to easily differentiate successful trips. The dataset was also examined for missing values, particularly in the Drop timestamp column, which revealed that over half the entries lacked drop times—likely due to cancellations or no car availability. Categorical data such as Pickup point and Status were cleaned and standardized for grouping. No duplicates were removed yet, but that can be done next. From these manipulations, we discovered key insights such as peak request hours, high demand from Airport vs. City, and the distribution of request outcomes (Completed, Cancelled, or No Cars Available). These insights help understand customer demand, driver availability, and overall supply-demand mismatches in the Uber service.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1
plt.figure (figsize=(8, 5))
sns.countplot(x='Status', data=df, palette='Set2')
plt.title('Trip Status Distribution')
plt.xlabel('Trip Status')
plt.ylabel('Number of Requests')
plt.show()



##### 1. Why did you pick the specific chart?

This bar chart was chosen because it effectively displays the frequency of different trip statuses a categorical variable. It makes comparisons easy by showing the number of requests for each status using bar heights. The chart is simple to interpret and visually clear. Colors help distinguish between each category, enhancing readability.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that most trips were successfully completed, indicating good service availability. However, a large number of requests were marked as "No Cars Available," suggesting high demand or supply issues. Cancellations are significantly fewer but still notable. This highlights the need for better fleet management during peak times to reduce unfulfilled requests.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights support positive business impact by showing strong trip completion rates and highlighting areas to improve car availability. However, the high "No Cars Available" count indicates missed revenue and poor customer experience. This could lead to negative growth if not addressed. Timely action can convert these gaps into opportunities.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
import matplotlib.pyplot as plt
status_counts = df['Status'].value_counts()
plt.figure(figsize=(6,6))
plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', startangle=140, colors=['green', 'red', 'orange'])
plt.title('Distribution of Request Status')
plt.axis('equal')
plt.show()

##### 1. Why did you pick the specific chart?

A pie chart was chosen because it visually represents the proportional distribution of trip request statuses. It's ideal for showing how much each category (Trip Completed, Cancelled, No Cars Available) contributes to the total requests.

##### 2. What is/are the insight(s) found from the chart?

Trip Completed accounts for the largest share (42%), showing service efficiency.

No Cars Available is close behind at 39.3%, highlighting major unfulfilled demand.

Cancelled trips (18.7%) suggest customer or driver-side issues affecting fulfillment.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, these insights help optimize operations. Improving car availability and reducing cancellations can boost customer satisfaction and increase revenue.Yes, the high "No Cars Available" percentage signals potential revenue loss and poor customer experience. If unresolved, it may drive users to competitor platforms, resulting in negative growth.


#### Chart - 3

In [None]:
# Chart - 3 visualization code
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("Uber Request Data.csv", parse_dates=['Request timestamp'])
df['Request Hour'].value_counts().sort_index().plot(kind='bar', color='skyblue')
plt.title('Number of Uber Requests by Hour of the Day')
plt.xlabel('Hour')
plt.ylabel('Number of Requests')
plt.grid(axis='y')
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

The bar chart highlights peak Uber request hours, notably between 5–9 AM and 5–10 PM, aligning with daily commute times. This insight helps improve driver allocation during high-demand hours, enhancing customer experience. Ignoring it may lead to driver shortages and increased cancellations.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that Uber requests peak during early morning (5–9 AM) and evening hours (5–10 PM), which likely aligns with daily commute times. There is a noticeable dip in requests during late night (12–4 AM) and midday (11 AM–4 PM), indicating off-peak hours. This pattern highlights when demand is highest and lowest throughout the day

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The chart reveals high demand between 5–9 AM and 5–10 PM, suggesting the need for more drivers during these hours to reduce cancellations. Efficient resource planning during off-peak hours can improve driver utilization. Ignoring these trends may lead to customer dissatisfaction and driver attrition. Acting on these insights can boost operational efficiency and customer retention.


#### Chart - 4

In [None]:
# Chart - 4 visualization code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("Uber Request Data.csv")
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], errors='coerce')
df = df.dropna(subset=['Request timestamp'])
df['Request Hour'] = df['Request timestamp'].dt.hour
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='Request Hour', bins=24, kde=False, color='steelblue', edgecolor='black')

plt.title('Histogram of Uber Requests by Hour of Day')
plt.xlabel('Hour of Day (0–23)')
plt.ylabel('Number of Requests')
plt.xticks(range(0, 24))
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()

##### 1. Why did you pick the specific chart?

Yes, the insights can create a positive business impact by helping Uber allocate drivers efficiently during peak hours and apply dynamic pricing to maximize revenue. However, if peak-hour demand is unmet, it can lead to customer dissatisfaction and churn. Similarly, low demand during off-peak hours may result in idle drivers and reduced earnings, affecting driver retention.

##### 2. What is/are the insight(s) found from the chart?

Yes, these insights help Uber optimize driver availability and pricing during peak hours, improving efficiency and revenue.
However, unmet peak demand may lead to customer dissatisfaction.
Low off-peak demand could result in idle drivers and lower retention.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the insights support positive business impact by helping Uber plan driver availability, apply surge pricing, and improve customer satisfaction during peak hours.
They also allow targeted promotions during off-peak times to boost usage.
However, if peak demand isn't met, it can cause delays and customer churn.
Similarly, idle drivers during low-demand hours may lead to dissatisfaction and attrition.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
fig.suptitle('Uber Ride Data Visualizations', fontsize=16, fontweight='bold')
pickup_status = df.groupby(['Pickup point', 'Status']).size().unstack()
pickup_status.plot(kind='barh', stacked=True, ax=axes[1, 0], colormap='Pastel1')
axes[1, 0].set_title('Pickup Point vs Trip Status')
axes[1, 0].set_xlabel('Trip Count')


##### 1. Why did you pick the specific chart?

The stacked horizontal bar chart was chosen to clearly show how different trip statuses (Cancelled, No Cars Available, Trip Completed) are distributed across the two pickup points: City and Airport. This chart makes it easy to compare the volume and status composition of requests from each location, helping identify where service issues are more prevalent. Its structure is ideal for visualizing categorical comparisons side-by-side.

##### 2. What is/are the insight(s) found from the chart?

The chart shows that the Airport has a high number of "No Cars Available" cases, while the City has more "Cancelled" trips. Overall, trip completion is slightly higher from the Airport, but service issues are evident at both locations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, the gained insights can help create a positive business impact by highlighting where Uber should allocate more drivers, especially at the Airport during high-demand hours. However, the high cancellation rate in the City indicates driver disengagement or customer dissatisfaction, which could lead to negative growth if not addressed affecting user trust and retention. Proactive operational changes based on these insights can reduce trip failures and improve customer experience.

#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the given Uber request data, I suggest the client focus on optimizing driver availability during peak hours specifically between 5–9 AM and 5–10 PM as these are the times when demand is highest. Ensuring enough drivers are available during these windows will reduce cancellations and "No Cars Available" issues, improving customer satisfaction and revenue.

Additionally, the client should consider reducing idle driver deployment during low-demand periods, like midday and late night, to minimize operational costs. A better understanding of pickup point trends (like more issues at the Airport vs. City) can also help with location-specific strategies.

Overall, by aligning driver supply with demand patterns, Uber can improve efficiency, reduce customer loss, and strengthen its market position.

# **Conclusion**

Therefore, I conclude that the visualizations reveal both strengths and gaps in the trip request process. While a significant number of trips were successfully completed, indicating efficient operations, a large percentage of requests were either cancelled or went unfulfilled due to car unavailability. These issues highlight the need for improved fleet distribution, better driver-customer coordination, and efficient resource planning during peak times. Addressing these challenges can reduce customer dissatisfaction, minimize revenue loss, and ultimately lead to positive business growth and enhanced service delivery.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***