#**Drivers Lifetime Value**

##**Assignment**

After exploring and analyzing the data, please:

1. Recommend a Driver's Lifetime Value (i.e., the value of a driver to Lyft over the entire projected lifetime of a driver).
2. Please answer the following questions:
- What are the main factors that affect a driver's lifetime value?
- What is the average projected lifetime of a driver? That is, once a driver is onboarded, how long do they typically continue driving with Lyft?
- Do all drivers act alike? Are there specific segments of drivers that generate more value for Lyft than the average driver?
- What actionable recommendations are there for the business?

3. Prepare and submit a writeup of your findings for consumption by a cross-functional audience.

You can make the following assumptions about the Lyft rate card:

- Base Fare: USD 2.00
- Cost per Mile: USD 1.15
- Cost per Minute:  USD 0.22
- Service Fee USD:  1.75
- Minimum Fare USD:  5.00
- Maximum Fare USD:  400.00

##**Data Description**

You'll find three CSV files attached with the following data:

**driver_ids.csv**

- driver_id Unique identifier for a driver
- driver_onboard_date Date on which driver was on-boarded

**ride_ids.csv**

- driver_id Unique identifier for a driver
- ride_id Unique identifier for a ride that was completed by the driver
- ride_distance Ride distance in meters
- ride_duration Ride duration in seconds
- ride_prime_time Prime Time applied on the ride

**ride_timestamps.csv**

- ride_id Unique identifier for a ride
- event describes the type of event; this variable takes the following values:
  - requested_at - passenger requested a ride
  - accepted_at - driver accepted a passenger request
  - arrived_at - driver arrived at pickup point
  - picked_up_at - driver picked up the passenger
  - dropped_off_at - driver dropped off a passenger at destination

**timestamp Time of event**

You can assume that:

- All rides in the data set occurred in San Francisco
- All timestamps in the data set are in UTC


###**Practicalities**
Please work on the questions in the displayed order. Make sure that the solution reflects your entire thought process - it is more important how the code is structured rather than the final answers.

#### To download the dataset <a href="https://drive.google.com/drive/folders/1ZCuQJMgTfsdLnJIMBkZK36FjXXdeqzBM?usp=sharing"> Click here </a>

In [3]:
import pandas as pd


driver_ids = pd.read_csv('driver_ids.csv')
ride_ids = pd.read_csv('ride_ids.csv')
ride_timestamps = pd.read_csv('ride_timestamps.csv')


print("First few rows of driver_ids:")
print(driver_ids.head())

print("\nFirst few rows of ride_ids:")
print(ride_ids.head())

print("\nFirst few rows of ride_timestamps:")
print(ride_timestamps.head())

# Check for missing values
print("\nMissing values in driver_ids:")
print(driver_ids.isnull().sum())

print("\nMissing values in ride_ids:")
print(ride_ids.isnull().sum())

print("\nMissing values in ride_timestamps:")
print(ride_timestamps.isnull().sum())

# Get summary statistics for numeric columns
print("\nSummary statistics for driver_ids:")
print(driver_ids.describe())

print("\nSummary statistics for ride_ids:")
print(ride_ids.describe())


First few rows of driver_ids:
                          driver_id  driver_onboard_date
0  002be0ffdc997bd5c50703158b7c2491  2016-03-29 00:00:00
1  007f0389f9c7b03ef97098422f902e62  2016-03-29 00:00:00
2  011e5c5dfc5c2c92501b8b24d47509bc  2016-04-05 00:00:00
3  0152a2f305e71d26cc964f8d4411add9  2016-04-23 00:00:00
4  01674381af7edd264113d4e6ed55ecda  2016-04-29 00:00:00

First few rows of ride_ids:
                          driver_id                           ride_id  \
0  002be0ffdc997bd5c50703158b7c2491  006d61cf7446e682f7bc50b0f8a5bea5   
1  002be0ffdc997bd5c50703158b7c2491  01b522c5c3a756fbdb12e95e87507eda   
2  002be0ffdc997bd5c50703158b7c2491  029227c4c2971ce69ff2274dc798ef43   
3  002be0ffdc997bd5c50703158b7c2491  034e861343a63ac3c18a9ceb1ce0ac69   
4  002be0ffdc997bd5c50703158b7c2491  034f2e614a2f9fc7f1c2f77647d1b981   

   ride_distance  ride_duration  ride_prime_time  
0           1811            327               50  
1           3362            809                0  
2      

In [4]:
# Removeing entries with negative ride_distance and ride_duration
ride_ids = ride_ids[(ride_ids['ride_distance'] >= 0) & (ride_ids['ride_duration'] >= 0)]

# ride_distance to miles and ride_duration to minutes
ride_ids['ride_distance_miles'] = ride_ids['ride_distance'] * 0.000621371
ride_ids['ride_duration_minutes'] = ride_ids['ride_duration'] / 60

# earnings for each ride
base_fare = 2.00
cost_per_mile = 1.15
cost_per_minute = 0.22
service_fee = 1.75

ride_ids['earnings'] = base_fare + (cost_per_mile * ride_ids['ride_distance_miles']) + (cost_per_minute * ride_ids['ride_duration_minutes']) + service_fee
ride_ids['earnings'] *= (1 + ride_ids['ride_prime_time'] / 100)

print("First few rows of ride_ids with earnings:")
print(ride_ids.head())


First few rows of ride_ids with earnings:
                          driver_id                           ride_id  \
0  002be0ffdc997bd5c50703158b7c2491  006d61cf7446e682f7bc50b0f8a5bea5   
1  002be0ffdc997bd5c50703158b7c2491  01b522c5c3a756fbdb12e95e87507eda   
2  002be0ffdc997bd5c50703158b7c2491  029227c4c2971ce69ff2274dc798ef43   
3  002be0ffdc997bd5c50703158b7c2491  034e861343a63ac3c18a9ceb1ce0ac69   
4  002be0ffdc997bd5c50703158b7c2491  034f2e614a2f9fc7f1c2f77647d1b981   

   ride_distance  ride_duration  ride_prime_time  ride_distance_miles  \
0           1811            327               50             1.125303   
1           3362            809                0             2.089049   
2           3282            572                0             2.039340   
3          65283           3338               25            40.564963   
4           4115            823              100             2.556942   

   ride_duration_minutes   earnings  
0               5.450000   9.364647  
1   

In [5]:
driver_stats = ride_ids.groupby('driver_id').agg(
    total_earnings=('earnings', 'sum'),
    num_rides=('ride_id', 'count'),
    avg_ride_distance=('ride_distance_miles', 'mean'),
    avg_ride_duration=('ride_duration_minutes', 'mean'),
    avg_prime_time=('ride_prime_time', 'mean')
).reset_index()

print("First few rows of driver_stats:")
print(driver_stats.head())

print("\nSummary statistics for driver_stats:")
print(driver_stats.describe())


First few rows of driver_stats:
                          driver_id  total_earnings  num_rides  \
0  002be0ffdc997bd5c50703158b7c2491     3654.608316        277   
1  007f0389f9c7b03ef97098422f902e62      332.432167         31   
2  011e5c5dfc5c2c92501b8b24d47509bc      494.240288         34   
3  0152a2f305e71d26cc964f8d4411add9     2644.773419        191   
4  01674381af7edd264113d4e6ed55ecda     5463.216339        375   

   avg_ride_distance  avg_ride_duration  avg_prime_time  
0           3.903841          13.311552       19.404332  
1           2.355818          11.019892       20.161290  
2           4.928075          14.316176       19.852941  
3           4.786310          15.228709       10.732984  
4           5.175845          15.886356       12.533333  

Summary statistics for driver_stats:
       total_earnings   num_rides  avg_ride_distance  avg_ride_duration  \
count      937.000000  937.000000         937.000000         937.000000   
mean      2857.356554  206.511206  

Total Earnings:

Mean total earnings per driver: $2,857.36
The total earnings range from $24.60 to $12,640.20, indicating a wide variation in driver performance.
Number of Rides:

Mean number of rides per driver: 206.51
The number of rides ranges from 3 to 919, showing that some drivers are much more active than others.
Average Ride Distance and Duration:

Mean average ride distance: 4.47 miles
Mean average ride duration: 14.29 minutes
Both metrics show moderate variation, suggesting different driving patterns among drivers.
Average Prime Time:

Mean average prime time: 16.17%
Prime time ranges from 0% to 57.39%, indicating that some drivers take advantage of peak hours more than others.

In [6]:
driver_ids['driver_onboard_date'] = pd.to_datetime(driver_ids['driver_onboard_date'])

# date of the last ride for each driver
last_ride_dates = ride_ids.groupby('driver_id')['ride_id'].max().reset_index()
last_ride_dates = last_ride_dates.merge(ride_timestamps[ride_timestamps['event'] == 'dropped_off_at'][['ride_id', 'timestamp']], on='ride_id')
last_ride_dates = last_ride_dates.groupby('driver_id')['timestamp'].max().reset_index()
last_ride_dates.columns = ['driver_id', 'last_ride_date']

last_ride_dates['last_ride_date'] = pd.to_datetime(last_ride_dates['last_ride_date'])

driver_lifetime = driver_ids.merge(last_ride_dates, on='driver_id')

driver_lifetime['active_days'] = (driver_lifetime['last_ride_date'] - driver_lifetime['driver_onboard_date']).dt.days

average_active_days = driver_lifetime['active_days'].mean()

print("Average projected lifetime of a driver (in days):", average_active_days)



Average projected lifetime of a driver (in days): 28.41457586618877


The average projected lifetime of a driver is approximately 28.41 days. This relatively short duration indicates that drivers might not stay active with Lyft for long periods.

In [7]:
# segments based on number of rides
bins = [0, 50, 150, 300, 500, driver_stats['num_rides'].max()]
labels = ['Very Low', 'Low', 'Medium', 'High', 'Very High']
driver_stats['activity_level'] = pd.cut(driver_stats['num_rides'], bins=bins, labels=labels, right=False)

segment_analysis = driver_stats.groupby('activity_level').agg(
    avg_total_earnings=('total_earnings', 'mean'),
    avg_num_rides=('num_rides', 'mean'),
    avg_ride_distance=('avg_ride_distance', 'mean'),
    avg_ride_duration=('avg_ride_duration', 'mean'),
    avg_prime_time=('avg_prime_time', 'mean')
).reset_index()

print("Segment analysis:")
print(segment_analysis)



Segment analysis:
  activity_level  avg_total_earnings  avg_num_rides  avg_ride_distance  \
0       Very Low          504.676338      36.571984           4.516779   
1            Low         1075.368646      75.500000           4.923076   
2         Medium         3292.768870     237.816327           4.307216   
3           High         5310.768597     382.206030           4.315467   
4      Very High         8172.285810     602.898305           4.061401   

   avg_ride_duration  avg_prime_time  
0          14.132819       14.330742  
1          14.385212       14.991391  
2          14.326685       17.285318  
3          14.427252       17.629869  
4          14.068582       18.175883  


Analysis of Driver Segments
From the segment analysis, we observe the following:

Very Low Activity Level:

Average total earnings: $504.68
Average number of rides: 36.57
Average ride distance: 4.52 miles
Average ride duration: 14.13 minutes
Average prime time: 14.33%
Low Activity Level:

Average total earnings: $1,075.37
Average number of rides: 75.50
Average ride distance: 4.92 miles
Average ride duration: 14.39 minutes
Average prime time: 14.99%
Medium Activity Level:

Average total earnings: $3,292.77
Average number of rides: 237.82
Average ride distance: 4.31 miles
Average ride duration: 14.33 minutes
Average prime time: 17.29%
High Activity Level:

Average total earnings: $5,310.77
Average number of rides: 382.21
Average ride distance: 4.32 miles
Average ride duration: 14.43 minutes
Average prime time: 17.63%
Very High Activity Level:

Average total earnings: $8,172.29
Average number of rides: 602.90
Average ride distance: 4.06 miles
Average ride duration: 14.07 minutes
Average prime time: 18.18%
Insights and Recommendations
Main Factors Affecting Driver's Lifetime Value:

Number of Rides: Drivers with higher ride counts tend to generate more earnings.
Prime Time: Higher prime time percentages correlate with increased earnings.
Average Projected Lifetime:

The average projected lifetime of a driver is approximately 28.41 days.
Driver Segments:

High and Very High Activity Levels: These segments generate significantly more value for Lyft. They have higher earnings and take more advantage of prime time.
Low and Very Low Activity Levels: These segments contribute less value and have fewer rides and lower earnings.
Actionable Recommendations
Retention Programs:

Incentivize High-Activity Drivers: Create reward programs for drivers who consistently have high ride counts.
Retention Strategies: Offer benefits such as bonuses, discounted vehicle maintenance, and exclusive support for drivers in the High and Very High segments to encourage them to stay with Lyft longer.
Encourage Prime Time Driving:

Provide incentives for driving during prime time hours, such as higher earnings multipliers, to increase overall driver earnings and satisfaction.
Targeted Training and Support:

Training Programs: Implement training programs for drivers in the Low and Very Low segments to help them increase their ride counts and earnings.
Personalized Support: Offer personalized support to new drivers to help them transition into higher activity levels more quickly.
Data-Driven Insights:

Continuously analyze ride data to identify patterns and provide actionable insights to drivers on how to optimize their driving hours, routes, and prime time utilization.


**What are the main factors that affect a driver's lifetime value?**
Analysis: The main factors are the number of rides, average ride distance, average ride duration, and prime time. Higher ride counts and prime time percentages are associated with higher earnings.

**What is the average projected lifetime of a driver?**
Analysis: The average projected lifetime of a driver is approximately 28.41 days.

**Do all drivers act alike? Are there specific segments of drivers that generate more value for Lyft than the average driver?**
Analysis: Drivers do not act alike. Segment analysis shows that drivers with high and very high activity levels generate significantly more value for Lyft. These drivers have higher earnings, more rides, and better utilization of prime time compared to those with lower activity levels.

**What actionable recommendations are there for the business?**

**Recommendations:
**Retention Programs:** Incentivize high-activity drivers and offer retention strategies to keep them engaged.
**Encourage Prime Time Driving:** Provide incentives for driving during peak hours.
**Targeted Training and Support:** Offer training and personalized support for drivers in lower activity segments.
**Data-Driven Insights:** Continuously analyze data to provide actionable insights for optimizing driving hours and earnings.