<a href="https://colab.research.google.com/github/AnamHJ24/datascience-python-challenges/blob/main/notebooks/Day_15.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Day 15 - Uber
You are a Business Analyst on the **Uber** Pool Product Team working to optimize driver compensation. The team aims to understand how trip characteristics impact driver earnings. Your goal is to develop data-driven recommendations that maximize driver earnings potential.

In [2]:
# Imoort required libraries
import pandas as pd
import numpy as np

# Import data file
url = "https://raw.githubusercontent.com/AnamHJ24/datascience-python-challenges/refs/heads/main/Data/Day_15.txt"
fct_trips = pd.read_csv(url)
fct_trips.head()


Unnamed: 0,trip_id,driver_id,ride_type,trip_date,rider_count,total_distance,total_earnings
0,101,1,UberPool,2024-07-05,3,10.5,22.5
1,102,1,UberPool,2024-07-15,2,8.0,18.0
2,103,2,UberPool,2024-08-10,4,15.0,35.0
3,104,3,UberX,2024-07-20,1,5.0,12.0
4,105,2,UberPool,2024-09-01,3,12.0,30.0


## Question 1
What is the average driver earnings per completed UberPool ride with more than two riders between July 1st and September 30th, 2024? This analysis will help isolate trips that meet specific rider thresholds to understand their impact on driver earnings.

## Solution

In [3]:
# Convert required column to datetime
fct_trips['trip_date'] = pd.to_datetime(fct_trips['trip_date'])

# Filter date between July 1st and September 30th 2024
july_sep_data = fct_trips[fct_trips['trip_date'].between('2024-7-1','2024-9-30')]

# Filter data for rides with more than two riders for UberPool
multiple_riders_uberpool = july_sep_data[
  (july_sep_data['rider_count'] > 2) &
  (july_sep_data['ride_type'] == 'UberPool')]

# Calculate average earnings per ride
avg_earnings = multiple_riders_uberpool['total_earnings'].mean()

print(f"Average driver earnings per completed UberPool ride between July 1st and September 30th: ${avg_earnings}")

Average driver earnings per completed UberPool ride between July 1st and September 30th: $36.05


## Question 2
For completed UberPool rides between July 1st and September 30th, 2024, derive a new column calculating earnings per mile (total_earnings divided by total_distance) and then compute the average earnings per mile for rides with more than two riders. This calculation will reveal efficiency metrics for driver compensation.

## Solution

In [4]:
# Calculate earnings per mile
multiple_riders_uberpool = multiple_riders_uberpool.copy()
multiple_riders_uberpool['earnings_per_mile'] = (multiple_riders_uberpool['total_earnings']/multiple_riders_uberpool['total_distance']).round(2)

print(multiple_riders_uberpool.head())

   trip_id  driver_id ride_type  trip_date  rider_count  total_distance  \
0      101          1  UberPool 2024-07-05            3            10.5   
2      103          2  UberPool 2024-08-10            4            15.0   
4      105          2  UberPool 2024-09-01            3            12.0   
5      106          4  UberPool 2024-09-15            5            20.0   
7      108          5  UberPool 2024-08-25            4            11.0   

   total_earnings  earnings_per_mile  
0            22.5               2.14  
2            35.0               2.33  
4            30.0               2.50  
5            50.0               2.50  
7            28.0               2.55  


In [5]:
# Calculate average earnings per mile
avg_earnings_per_mile = multiple_riders_uberpool['earnings_per_mile'].mean()
print(f"Average Uber Pool driver earnings per mile for rides between July 1st and September 30th: ${avg_earnings_per_mile}")

Average Uber Pool driver earnings per mile for rides between July 1st and September 30th: $2.393


## Question 3
Identify the combination of rider count and total distance that results in the highest average driver earnings per UberPool ride between July 1st and September 30th, 2024. This analysis directly recommends optimal trip combination strategies to maximize driver earnings.

## Solution

In [6]:
# Filter data for rides with UberPool
uberpool_riders = july_sep_data[(july_sep_data['ride_type'] == 'UberPool')]

# Calculate average earnings per ride
avg_earnings_per_combo  = uberpool_riders.groupby(['rider_count', 'total_distance'])['total_earnings'].mean().reset_index()

# Find the combination woth the highest earnings
optimal_combo = avg_earnings_per_combo.loc[avg_earnings_per_combo['total_earnings'].idxmax()]

print(f"The optimal combination is {optimal_combo['rider_count']} riders with a distance of {optimal_combo['total_distance']} miles, yielding average earnings of ${optimal_combo['total_earnings']} per ride.")

The optimal combination is 5.0 riders with a distance of 25.0 miles, yielding average earnings of $60.0 per ride.
