# **Project Name**    - *Uber Supply Demand Gap*



##### **Project Type**    - Explanatory Data Analysis of Uber Request Data
##### **Contribution**    - Individual
##### **By -** Manne Kovidha


# **Project Summary -**

Uber is facing a **problem of supply-demand gap**. This happens mostly during the night and early morning hours for rides from the airport to the city. During these times, many passengers are unable to get a cab. Either no cabs are available at night, or drivers cancel rides in the early morning.

The analysis and solution ideas were already provided to me. These helped me understand the issue clearly. Based on the figures (1.1, 1.2, and 1.3), there is a clear pattern. At night, there are very **few drivers available**. In the early morning, **many drivers cancel rides**. Figure 2 shows that most cancellations happen in the early morning and morning hours.

The analysis suggests two simple solutions. The first is to **give drivers extra pay (incentives)** during early morning and morning hours. This can be called rush hour pricing. It will encourage drivers to accept rides and reduce cancellations. The second solution is to **create night shifts for drivers**. This will help more cabs to be available during night hours. These two steps can reduce the gap between supply and demand.

Now, based on this background, I have to complete the remaining four tasks.
These tasks will support the analysis and help explore the data better. The tasks include: cleaning the data in Excel, creating dashboards in Excel, using SQL to find insights, and doing EDA (Exploratory Data Analysis) using Pandas in Python.

The first step is to **clean the data using Excel**. I will remove empty rows, correct formatting issues, and make sure all time and date values are proper. This will help make the data ready for analysis.

Next, I will **create dashboards in Excel**. These will include graphs and charts that show how many ride requests happen at different times, when drivers cancel the most, and when cab supply is lowest. These dashboards will help visualize the gap in a clear way.

After that, I will **use SQL to write simple queries**. These queries will help me find out how many ride requests were made at each hour, how many were cancelled, and how many rides were completed. I will also add comments (using --) in each query to explain what it does, in simple words.

Finally, I will **use Pandas in Python to do EDA**. This means I will explore the data further. I will group the data by hour, calculate totals and percentages, and create simple charts. This will help confirm the trends shown in the original analysis and may reveal more useful patterns.

In short, the analysis and solution part was already given to me. My job now is to **work with the data and find more insights**. By cleaning the data, building dashboards, writing SQL queries, and using Pandas for EDA, I will help support the given findings. These steps will help show the problem more clearly and make the solutions more useful with the help of real data.



# **GitHub Link -**

Click here to view my GitHub Repo https://github.com/Hiiiiii10/Uber-EDA-Project

# **Problem Statement**


**Write Problem Statement Here.**

#### **Define Your Business Objective?**

The main business objective is to identify and reduce the supply-demand gap in Uber services, especially during night and early morning hours for rides from the airport to the city. This involves understanding when and why ride requests go unfulfilled, such as due to driver unavailability or cancellations, and finding data-driven solutions to improve ride availability, reduce cancellations, and increase customer satisfaction during these critical time slots.



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from google.colab import files

df = pd.read_csv("https://raw.githubusercontent.com/Hiiiiii10/Uber-EDA-Project/main/UBERCLEANEDDATA.CSV")

This step loads the dataset that I uploaded on Github along with this EDA File. (It is already cleaned on excel).

### Dataset Loading

In [None]:
try:
    df = pd.read_csv("https://raw.githubusercontent.com/Hiiiiii10/Uber-EDA-Project/main/UBERCLEANEDDATA.CSV")
    print("Data loaded successfully.")
except Exception as e:
    print("Error loading file:", e)

The dataset is loaded.

### Dataset First View

In [None]:
# Dataset First Look #
df.head()



**Dataset First View**
This shows the first few records of the Uber dataset. This dataset initially contains records of Uber ride requests, including fields such as request timestamp, drop timestamp, ride status (Completed, Cancelled, or No Cars Available), and pickup point.

In Step 1, I performed preliminary data cleaning in Excel. This included:

Separating timestamps into distinct Date and Time columns to allow for easier time-based analysis.

Creating a new column to categorize each ride into a Time Slot (e.g., Early Morning, Morning, Afternoon, Evening, Night) based on the request time, enabling better identification of peak demand periods and supply-demand gaps.

These cleaned and derived columns now form the base for further exploration and visualization in the upcoming analysis steps.

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape


Dataset Dimensions
There are 6745 rows and 7 columns in the dataset.

### Dataset Information

In [None]:
# Dataset Info
df.info()


**Dataset Info**
- The dataset contains both object and datetime columns.
- All major columns like Request timestamp and Status are properly typed.
- This part shows the total cells that are not empty and their data types.

#### Duplicate Values

In [None]:
# Check for duplicates
df.duplicated().sum()


**Duplicate Rows Check**

There are 0 duplicate rows in the dataset.

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values
df.isnull().sum()


**Missing Values**
- Some drop timestamps are missing because the ride was never completed.
- Request timestamp and Status are complete with no nulls.
- Therefore, few cells of driver ID, and drop time and date fields are empty as they were either cancelled or there were no cars available.

In [None]:
# Visualizing the missing values
missing_values = df.isnull().sum()

# Columns with missing data
print("🧾 Missing Values Summary:")
print(missing_values[missing_values > 0])

# Plot missing values using seaborn heatmap
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10,6))
sns.heatmap(df.isnull(), cbar=False, cmap='Reds', yticklabels=False)
plt.title("Heatmap of Missing Values in Dataset")
plt.show()

### What did you know about your dataset?

- The dataset covers Uber rides with status labels like Completed, Cancelled, and No Cars Available. in the analysis, I will be focusing on unfulffied requests which are the ones that are cancelled and no cars available.
- Time of request plays a major role where the demand is not met during late nights, mornings and early mornings. And also pickup point from the airport.
- Missing values are only in drop time, which makes sense for unfulfilled trips.
- This data can help us understand peak demand times and supply issues.


## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns #
column_info = pd.DataFrame({
    'Column Name': df.columns,
    'Data Type': df.dtypes.values
})

column_info



The table below shows all column names and their data types in the dataset.

### Column Table


In [None]:
# Dataset Describe
df.describe()

**Dataset Describe**
Only columns like request ID or driver ID will show summary statistics since these columns are numerical and not categorical or datetime. This helps us understand the spread and distribution of numeric data.

### Variables Description

📋 Uber Dataset Column Description

| Column Name         | Description |
|---------------------|-------------|
| Request id          | Unique identifier for each trip request |
| Pickup point        | Location from where the request was made (City/Airport) |
| Status              | Final status of the trip (Completed, Cancelled, No Cars Available) |
| Request timestamp   | Date and time when the ride was requested |
| Drop timestamp      | Time when the trip was completed (null if not completed) |
| Request Hour        | Extracted hour from request timestamp |
| Time Slot           | Categorized time block (e.g., Night, Morning Rush, etc.) |re

### Check Unique Values for each variable.

In [None]:
for col in df.columns:
    print(f"\nUnique values in '{col}':")
    print(df[col].unique())


**Unique Values for Each Variable**
- Status has 3 values.
- Pickup point has 2 values.
- Time slot has 6 blocks.
- Request date has 5 values.

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Convert 'Request time' to datetime format
df['Request Time'] = pd.to_datetime(df['Request Time'], errors='coerce')

# Create a new column to extract the hour of request
df['Request Hour'] = df['Request Time'].dt.hour

# Create time slots based on hour
def get_time_slot(hour):
    if 0 <= hour < 7:
        return "Early Morning"
    elif 7 <= hour < 12:
        return "Morning"
    elif 12 <= hour < 16:
        return "Afternoon"
    elif 16 <= hour < 19:
        return "Evening"
    else:
        return "Night"

# Create 'Time Slot' column
df['EDA Time Slot'] = df['Request Hour'].apply(get_time_slot)

# Updated dataset
df.head()

**Data Wrangling Code**

In this section, I prepared the dataset for analysis by transforming and enriching the raw data using the `Request time` column.
This includes converting it into datetime format and generating new columns like `Request Hour` and `Time Slot` to help us analyze time-based trends effectively.

### What all manipulations have you done and insights you found?

**Manipulations Done:**
- Converted 'Request timestamp' and 'Drop timestamp' into datetime format.
- Created a new column 'Request Hour' to extract the hour of the request.
- Added a derived column 'Time Slot' to categorize requests into 6 time-based buckets.

**Insights from Wrangling:**
- Peak request hours can now be grouped logically using 'Time Slot'.
- Missing drop timestamps correspond to Cancelled or No Car Available rides — this makes sense.
- These steps make the dataset easier to analyze using groupby and plotting operations.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1: Status Distribution

In [None]:
# Chart - 1 visualization code
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(7,5))
sns.countplot(data=df, x='Status', hue='Status', palette='Set2', legend=False)
plt.title("Distribution of Trip Status")
plt.xlabel("Trip Status")
plt.ylabel("Number of Requests")
plt.show()


##### 1. Why did you pick the specific chart?

This countplot clearly shows how many requests fall into each category (Completed, Cancelled, No Cars Available).

##### 2. What is/are the insight(s) found from the chart?

There are a large number of unfulfilled requests — especially “No Cars Available,” indicating a possible supply issue.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes. This insight highlights a clear supply-demand gap that Uber can address by improving driver availability or incentivizing night driving.

#### Chart - 2: Pickup Point Distribution

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Pickup point', hue='Pickup point', palette='cool', legend=False)
plt.title("Pickup Location Distribution")
plt.xlabel("Pickup Point")
plt.ylabel("Number of Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To find which pickup point (Airport or City) has more ride requests.

##### 2. What is/are the insight(s) found from the chart?

The Airport has a slightly higher number of requests than the City.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber should do further analysis on where the airport - city routes are located and make sure it has continual availability there, particularly during flight arrival clusters.

#### Chart - 3: Time Slot Distribution

In [None]:
# CChart - 3 visualization code
plt.figure(figsize=(7,4))
sns.countplot(data=df, x='Request Time Slot', hue='Request Time Slot', order=['Early Morning','Morning','Afternoon','Evening','Night'], palette='muted', legend=False)
plt.title("Number of Requests by Time Slot")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To compare peak and low-demand periods.



##### 2. What is/are the insight(s) found from the chart?

Morning, early morning and night have the highest demand periods. Afternoon, is the lowest demand

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber can develop surge pricing or incentive programs for drivers during late nights and Morning Rush. Early Morning will require consideration to create focus and not miss demand.


#### Chart - 4: Request Hour Distribution

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,4))
sns.histplot(df['Request Hour'], bins=24, kde=True, color='orange')
plt.title("Requests per Hour")
plt.xlabel("Hour of Day")
plt.ylabel("Number of Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To observe trends in ride requests on an hourly basis.


##### 2. What is/are the insight(s) found from the chart?

There is a marked increase in ride requests between the hours of 5 AM to 10 AM, as well as an increase in the night hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Understanding hourly peaks can be helpful to anticipate hourly demand and place drivers wisely in the field, this reduces reducing unfulfilled requests.


#### Chart - 5: Cancellations Only – Time Slot Wise

In [None]:
# Chart - 5 visualization code
cancelled = df[df['Status'] == 'Cancelled']

plt.figure(figsize=(7,4))
sns.countplot(data=cancelled, x='Request Time Slot', hue='Request Time Slot', order=['Early Morning','Morning','Afternoon','Evening','Night'], palette='Reds', legend=False)
plt.title("Cancelled Requests by Time Slot")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To see when cancellations are occurring most.


##### 2. What is/are the insight(s) found from the chart?

Most cancellation occur in the Morning and early morning time period.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber should explore whether drivers are declining early morning rides and consider to give them incentive.


#### Chart - 6: No Cars Available – Time Slot Wise

In [None]:
# Chart - 6 visualization code
no_cars = df[df['Status'] == 'No Cars Available']

plt.figure(figsize=(7,4))
sns.countplot(data=no_cars, x='Request Time Slot', hue='Request Time Slot', order=['Early Morning','Morning','Afternoon','Evening','Night'], palette='Blues')
plt.title("No Cars Available by Time Slot")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To find when Uber has the biggest driver shortages.

##### 2. What is/are the insight(s) found from the chart?

The most ‘No Cars Available’ is during evening and nights period followed by early mornings.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Encouraging driver participation at night and scheduling drivers during low-supply hours can greatly boost customer demand.

#### Chart - 7: Status by Pickup Point

In [None]:
# Chart - 7 visualization code
plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set1')
plt.title("Trip Status by Pickup Point")
plt.ylabel("Number of Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To understand how the success/failure of rides changes by pickup location.


##### 2. What is/are the insight(s) found from the chart?

The Airport has more ‘No Cars Available’ cases, while the City has more ‘Cancelled’ rides.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Different strategies are needed for each location — improve car availability at Airport; address cancellations in City.


#### Chart - 8: Status by Time Slot


In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(8,4))
sns.countplot(data=df, x='Request Time Slot', hue='Status', order=['Early Morning','Morning','Afternoon','Evening','Night'], palette='Set2')
plt.title("Trip Status by Time Slot")
plt.xticks(rotation=45)
plt.ylabel("Number of Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To identify when Uber has the most supply-demand imbalance.

##### 2. What is/are the insight(s) found from the chart?

"Night" has many 'No Cars Available', and "Morning" has more cancellations.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber should focus on Night driver availability and reducing early morning driver cancellations.

#### Chart - 9: Request Hour vs Trip Status

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(12,4))
sns.histplot(data=df, x='Request Hour', hue='Status', multiple='stack', palette='coolwarm', bins=24)
plt.title("Request Hour vs Trip Status")
plt.xlabel("Hour of the Day")
plt.show()


##### 1. Why did you pick the specific chart?

To combine time-based analysis with status outcomes.

##### 2. What is/are the insight(s) found from the chart?

Peak demand hours (5–9 AM) face the most cancellations; night hours see unavailability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Enabling flexible scheduling and surge driver incentives can help meet demand at these times.


#### Chart - 10: Pickup Point vs Time Slot

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(8,5))
sns.countplot(data=df, x='Request Time Slot', hue='Pickup point', order=['Early Morning','Morning','Afternoon','Evening','Night'], palette='pastel')
plt.title("Pickup Point Distribution by Time Slot")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To see which pickup points are more active at different times.

##### 2. What is/are the insight(s) found from the chart?

Airport dominates Evening and Night pickups; City is busier during the day.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Airport shifts should be prioritized during early hours and night hours; City during the day.

#### Chart - 11: Cancellations by Hour

In [None]:
# Chart - 11 visualization code
cancelled = df[df['Status'] == 'Cancelled']
plt.figure(figsize=(10,4))
sns.countplot(data=cancelled, x='Request Hour', hue='Request Hour', palette='Reds')
plt.title("Cancellations by Hour")
plt.xlabel("Hour of Day")
plt.ylabel("Cancelled Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To pinpoint exact hours of high cancellation behavior.


##### 2. What is/are the insight(s) found from the chart?

Most cancellations happen between 7 AM and 10 AM.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

High morning cancellations may cause user frustration. Uber must understand driver reluctance at this time.

#### Chart - 12: No Cars Available by Hour

In [None]:
# Chart - 12 visualization code
no_cars = df[df['Status'] == 'No Cars Available']
plt.figure(figsize=(10,4))
sns.countplot(data=no_cars, x='Request Hour', hue='Request Hour',palette='Blues')
plt.title("No Cars Available by Hour")
plt.xlabel("Hour of Day")
plt.ylabel("Unfulfilled Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To check driver unavailability by hour.

##### 2. What is/are the insight(s) found from the chart?

Night (17 to 23) shows the highest lack of cars.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber can introduce incentives like surge pricing to attract night drivers and fix this supply issue.

#### Chart - 13: Completed vs Unfulfilled (Manual Grouping)

In [None]:
# Chart - 13 visualization code
df['Request Status Group'] = df['Status'].apply(lambda x: 'Fulfilled' if x == 'Trip Completed' else 'Unfulfilled')

plt.figure(figsize=(6,4))
sns.countplot(data=df, x='Request Status Group', hue='Request Status Group', palette='Accent', legend=False)
plt.title("Fulfilled vs Unfulfilled Requests")
plt.ylabel("Total Requests")
plt.show()


##### 1. Why did you pick the specific chart?

To summarize all failed trips in one bar (No Cars + Cancelled).

##### 2. What is/are the insight(s) found from the chart?

Unfulfilled requests outnumber completed ones.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This visualization simplifies Uber’s core problem: high rate of unfulfilled requests which is an immediate priority.

#### Chart - 14: Stacked Status by Pickup Point & Time Slot (using Crosstab)



In [None]:
# Chart - 13 visualization code
pd.crosstab(df['Request Time Slot'], df['Status']).plot(kind='bar', stacked=True, figsize=(10,6), colormap='viridis')
plt.title("Stacked Trip Status by Time Slot")
plt.ylabel("Number of Requests")
plt.xticks(rotation=45)
plt.show()


##### 1. Why did you pick the specific chart?

To combine three dimensions: time, status, and volume.

##### 2. What is/are the insight(s) found from the chart?

Night and Early Morning slots have large unfulfilled bars.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This gives a clear visual of **when** and **why** Uber loses rides. Stakeholders can use this to optimize driver shifts and customer satisfaction.


#### Chart - 15: Time Slot vs Pickup Point vs Status (Facet Grid)

In [None]:
# Chart - 13 visualization code
g = sns.catplot(data=df, x='Request Time Slot', hue='Status', col='Pickup point',
                kind='count', col_order=['City', 'Airport'],
                order=['Early Morning','Morning','Afternoon','Evening','Night'],
                height=5, aspect=1.2, palette='tab10')
g.fig.suptitle("Trip Status by Time Slot and Pickup Point", y=1.03)
plt.xticks(rotation=45)
plt.show()

##### 1. Why did you pick the specific chart?

This chart helps compare status outcomes across different time slots for both pickup points side by side.


##### 2. What is/are the insight(s) found from the chart?

- Airport has more 'No Cars Available' at night.
- City shows higher 'Cancelled' rides in the morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

With this breakdown, Uber can localize driver incentives based on **time + location**, reducing specific supply-demand gaps.

#### Chart - 16: Pie Charts – Status Ratio for City vs Airport

In [None]:
# Chart - 13 visualization code
import matplotlib.pyplot as plt

status_order = ['Trip Completed', 'Cancelled', 'No Cars Available']

fig, axes = plt.subplots(1, 2, figsize=(12, 6))

for i, loc in enumerate(['City', 'Airport']):
    data = df[df['Pickup point'] == loc]['Status'].value_counts().reindex(status_order, fill_value=0)

    axes[i].pie(data, labels=data.index, autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel'))
    axes[i].set_title(f"{loc} - Trip Status")

plt.tight_layout()
plt.show()


##### 1. Why did you pick the specific chart?

Pie charts offer a quick comparison of status proportions between City and Airport.

##### 2. What is/are the insight(s) found from the chart?

Airport has fewer Completed rides compared to City. Unfulfilled requests dominate both the City and Airport area. More no cars available in the airport area.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This highlights the need for Airport-specific driver recruitment or scheduling strategy.

#### Chart - 17: Heatmap – Status Distribution Across Hours and Pickup Point

In [None]:
# Chart - 13 visualization code
pivot = pd.crosstab(df['Request Hour'], df['Pickup point'] + " - " + df['Status'])
plt.figure(figsize=(12,6))
sns.heatmap(pivot, cmap='YlGnBu', linewidths=0.5)
plt.title("Status by Hour and Pickup Point")
plt.xlabel("Pickup Point - Status")
plt.ylabel("Request Hour")
plt.show()


##### 1. Why did you pick the specific chart?

A heatmap offers a dense view of patterns over hours and locations, across multiple statuses.


##### 2. What is/are the insight(s) found from the chart?

City has more cancellations from 7–10 AM
Airport faces severe car shortages between 5–9 PM


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Uber can use this grid to plan **driver shifts hourly**, especially in airport zones during low-supply hours.

#### Chart - 18: Boxplot – Request Hour by Trip Status

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(8,5))
sns.boxplot(data=df, x='Status', hue='Status', y='Request Hour', palette='Set3')
plt.title("Distribution of Request Hour by Trip Status")
plt.show()

##### 1. Why did you pick the specific chart?

To analyze when most different statuses occur, on average.

##### 2. What is/are the insight(s) found from the chart?

- Cancelled rides cluster around 8–10 AM.
- No Cars Available spreads widely, mostly evening-night.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Shows when to anticipate cancellations and drop-offs, helping Uber forecast **hour-wise issues**.


#### Chart - 19 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
numeric_cols = df.select_dtypes(include='number')


corr_matrix = numeric_cols.corr()

plt.figure(figsize=(8,6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap")
plt.show()



##### 1. Why did you pick the specific chart?

A correlation heatmap is useful to check for linear relationships between **numeric variables**, such as `Request Hour`, `Request Status Group, or any derived numerical metrics.


##### 2. What is/are the insight(s) found from the chart?

- Since the dataset is mostly categorical, only `Request Hour` and any other numeric variables like Driver id and request id are shown.
- There are negative correlations present between `Request Hour` and driver id.


#### Chart - 20 - Pair Plot

In [None]:
# Pair Plot visualization code
df_encoded = df.copy()

status_map = {'Trip Completed': 0, 'Cancelled': 1, 'No Cars Available': 2}
df_encoded['Status_Num'] = df_encoded['Status'].map(status_map)

pickup_map = {'City': 0, 'Airport': 1}
df_encoded['Pickup_Num'] = df_encoded['Pickup point'].map(pickup_map)

features = ['Request Hour', 'Pickup_Num', 'Status_Num']

sns.pairplot(df_encoded[features], hue='Status_Num', palette='husl', diag_kind='kde', corner=True)
plt.suptitle("Pair Plot of Numerical Features Colored by Status", y=1.02)
plt.show()


##### 1. Why did you pick the specific chart?

A pair plot helps visualize relationships between multiple numerical variables in pairs — using scatterplots and histograms.

##### 2. What is/are the insight(s) found from the chart?

- Trip statuses are clearly distinguishable based on `Request Hour`.
- ‘No Cars Available’ rides cluster in lower hours (Night/Early Morning).
- Pickup location seems less influential in this encoded context.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

**Optimize Driver Scheduling with Time-Based and Location-Based Insights**

Challenge: Unmet requests are growing the most during the Night and Early Morning from the Airport, due to no cars being available or drivers cancelling.


Solution: Data-driven shift planning:


- Notice upon assigning or incentivising drivers to cover shifts, include a Night and Early Morning shift at the airport.


- Take past requests to estimate hourly demand and deploy resources accordingly.

**Introduce Targeted Incentives for Low Supply Periods**

- Offer “Rush Hour Rewards” or night-time surge bonuses for drivers operating in critical slots (e.g., 12 AM – 4 AM).

- Tie bonuses to actual ride completions (not just logins) to avoid idle logins without service.

**Reduce Cancellations Through Driver Penalty & Reward System**

- Identify drivers with repeated cancellations during peak hours and create a flag/feedback system.

- Reward drivers who accept and complete early morning rides from the Airport, a high-cancellation zone.



# **Conclusion**

This EDA has revealed key operational inefficiencies in Uber's service — especially around **early mornings**, **airport rides**, and **driver cancellations**.  
Business recommendations:
- Schedule night/early morning drivers near airports
- Provide morning rush-hour driver incentives
- Analyze behavior-based cancellation trends
Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***