#**Drivers Lifetime Value**

##**Assignment**

After exploring and analyzing the data, please:

1. Recommend a Driver's Lifetime Value (i.e., the value of a driver to Lyft over the entire projected lifetime of a driver).
2. Please answer the following questions:
- What are the main factors that affect a driver's lifetime value?
- What is the average projected lifetime of a driver? That is, once a driver is onboarded, how long do they typically continue driving with Lyft?
- Do all drivers act alike? Are there specific segments of drivers that generate more value for Lyft than the average driver?
- What actionable recommendations are there for the business?

3. Prepare and submit a writeup of your findings for consumption by a cross-functional audience.

You can make the following assumptions about the Lyft rate card:

- Base Fare: USD 2.00
- Cost per Mile: USD 1.15
- Cost per Minute:  USD 0.22
- Service Fee USD:  1.75
- Minimum Fare USD:  5.00
- Maximum Fare USD:  400.00

##**Data Description**

You'll find three CSV files attached with the following data:

**driver_ids.csv**

- driver_id Unique identifier for a driver
- driver_onboard_date Date on which driver was on-boarded

**ride_ids.csv**

- driver_id Unique identifier for a driver
- ride_id Unique identifier for a ride that was completed by the driver
- ride_distance Ride distance in meters
- ride_duration Ride duration in seconds
- ride_prime_time Prime Time applied on the ride

**ride_timestamps.csv**

- ride_id Unique identifier for a ride
- event describes the type of event; this variable takes the following values:
  - requested_at - passenger requested a ride
  - accepted_at - driver accepted a passenger request
  - arrived_at - driver arrived at pickup point
  - picked_up_at - driver picked up the passenger
  - dropped_off_at - driver dropped off a passenger at destination

**timestamp Time of event**

You can assume that:

- All rides in the data set occurred in San Francisco
- All timestamps in the data set are in UTC


###**Practicalities**
Please work on the questions in the displayed order. Make sure that the solution reflects your entire thought process - it is more important how the code is structured rather than the final answers.

#### To download the dataset <a href="https://drive.google.com/drive/folders/1ZCuQJMgTfsdLnJIMBkZK36FjXXdeqzBM?usp=sharing"> Click here </a>

In [18]:
import pandas as pd

# Load the data
driver_ids = pd.read_csv("C:/Users/manoj/Downloads/driver_ids.csv")
ride_ids = pd.read_csv("C:/Users/manoj/Downloads/ride_ids.csv")
ride_timestamps = pd.read_csv("C:/Users/manoj/Downloads/ride_timestamps.csv")

# Display the first few rows of each dataframe


ride_timestamps.head()


Unnamed: 0,ride_id,event,timestamp
0,00003037a262d9ee40e61b5c0718f7f0,requested_at,2016-06-13 09:39:19
1,00003037a262d9ee40e61b5c0718f7f0,accepted_at,2016-06-13 09:39:51
2,00003037a262d9ee40e61b5c0718f7f0,arrived_at,2016-06-13 09:44:31
3,00003037a262d9ee40e61b5c0718f7f0,picked_up_at,2016-06-13 09:44:33
4,00003037a262d9ee40e61b5c0718f7f0,dropped_off_at,2016-06-13 10:03:05


In [19]:
ride_ids.head()

Unnamed: 0,driver_id,ride_id,ride_distance,ride_duration,ride_prime_time
0,002be0ffdc997bd5c50703158b7c2491,006d61cf7446e682f7bc50b0f8a5bea5,1811,327,50
1,002be0ffdc997bd5c50703158b7c2491,01b522c5c3a756fbdb12e95e87507eda,3362,809,0
2,002be0ffdc997bd5c50703158b7c2491,029227c4c2971ce69ff2274dc798ef43,3282,572,0
3,002be0ffdc997bd5c50703158b7c2491,034e861343a63ac3c18a9ceb1ce0ac69,65283,3338,25
4,002be0ffdc997bd5c50703158b7c2491,034f2e614a2f9fc7f1c2f77647d1b981,4115,823,100


In [20]:
driver_ids.head()

Unnamed: 0,driver_id,driver_onboard_date
0,002be0ffdc997bd5c50703158b7c2491,2016-03-29 00:00:00
1,007f0389f9c7b03ef97098422f902e62,2016-03-29 00:00:00
2,011e5c5dfc5c2c92501b8b24d47509bc,2016-04-05 00:00:00
3,0152a2f305e71d26cc964f8d4411add9,2016-04-23 00:00:00
4,01674381af7edd264113d4e6ed55ecda,2016-04-29 00:00:00


In [21]:
# Convert necessary columns to appropriate data types
driver_ids['driver_onboard_date'] = pd.to_datetime(driver_ids['driver_onboard_date'])
ride_timestamps['timestamp'] = pd.to_datetime(ride_timestamps['timestamp'])

In [22]:
# Merge ride_timestamps with ride_ids to include driver_id
merged_rides = pd.merge(ride_timestamps, ride_ids[['ride_id', 'driver_id']], on='ride_id')

In [23]:
# Determine the last ride date for each driver
driver_last_ride = merged_rides[merged_rides['event'] == 'dropped_off_at'].groupby('driver_id')['timestamp'].max().reset_index()
driver_last_ride.columns = ['driver_id', 'last_ride_date']

In [24]:
# Calculate driver lifetime
driver_lifetime = pd.merge(driver_ids, driver_last_ride, on='driver_id')
driver_lifetime['driver_lifetime_days'] = (driver_lifetime['last_ride_date'] - driver_lifetime['driver_onboard_date']).dt.days

In [25]:
# Display the first few rows of driver_lifetime to check the results
driver_lifetime.head()

Unnamed: 0,driver_id,driver_onboard_date,last_ride_date,driver_lifetime_days
0,002be0ffdc997bd5c50703158b7c2491,2016-03-29,2016-06-23 10:29:53,86
1,007f0389f9c7b03ef97098422f902e62,2016-03-29,2016-06-22 13:28:38,85
2,011e5c5dfc5c2c92501b8b24d47509bc,2016-04-05,2016-06-12 20:30:38,68
3,0152a2f305e71d26cc964f8d4411add9,2016-04-23,2016-06-26 10:36:13,64
4,01674381af7edd264113d4e6ed55ecda,2016-04-29,2016-06-24 13:27:38,56


In [27]:
#Calculate Total Revenue for Each Driver
#Convert ride distance from meters to miles and duration from seconds to minutes
ride_ids['ride_distance_miles'] = ride_ids['ride_distance'] * 0.000621371
ride_ids['ride_duration_minutes'] = ride_ids['ride_duration'] / 60


In [28]:
#Calculate the revenue for each ride
def calculate_revenue(distance, duration, prime_time):
    base_fare = 2.00
    cost_per_mile = 1.15
    cost_per_minute = 0.22
    service_fee = 1.75
    
    ride_revenue = base_fare + (cost_per_mile * distance) + (cost_per_minute * duration) + service_fee
    ride_revenue *= (1 + prime_time / 100)  # Apply prime time multiplier
    return max(min(ride_revenue, 400.00), 5.00)  # Enforce min and max fare

ride_ids['ride_revenue'] = ride_ids.apply(lambda row: calculate_revenue(row['ride_distance_miles'], row['ride_duration_minutes'], row['ride_prime_time']), axis=1)


In [29]:
#Calculate total revenue per driver
driver_revenue = ride_ids.groupby('driver_id')['ride_revenue'].sum().reset_index()
driver_revenue.columns = ['driver_id', 'total_revenue']

In [30]:
#Analyze Main Factors Affecting Driver's Lifetime Value
#Merge driver revenue with driver lifetime
driver_data = pd.merge(driver_lifetime, driver_revenue, on='driver_id')

In [31]:
#Analyze the data
# Basic statistics
avg_revenue = driver_data['total_revenue'].mean()
avg_lifetime_days = driver_data['driver_lifetime_days'].mean()

# Display basic statistics
print("Average Total Revenue per Driver:", avg_revenue)
print("Average Lifetime (in days) per Driver:", avg_lifetime_days)

Average Total Revenue per Driver: 3046.719715666579
Average Lifetime (in days) per Driver: 55.10513739545998


In [32]:
#Identify Specific Segments of Drivers
#Segment drivers by revenue quartiles
driver_data['revenue_quartile'] = pd.qcut(driver_data['total_revenue'], 4, labels=['Low', 'Medium', 'High', 'Very High'])
revenue_segments = driver_data.groupby('revenue_quartile').agg({
    'total_revenue': 'mean',
    'driver_lifetime_days': 'mean',
    'driver_id': 'count'
}).reset_index()
revenue_segments.columns = ['revenue_quartile', 'avg_revenue', 'avg_lifetime_days', 'num_drivers']

  revenue_segments = driver_data.groupby('revenue_quartile').agg({


In [33]:
#Display revenue segments
revenue_segments

Unnamed: 0,revenue_quartile,avg_revenue,avg_lifetime_days,num_drivers
0,Low,453.880395,36.433333,210
1,Medium,1556.148513,52.124402,209
2,High,3754.983218,64.875598,209
3,Very High,6434.272666,67.076555,209


In [None]:
("Findings
Average Total Revenue per Driver: The average total revenue generated by a driver over their lifetime with Lyft is approximately $3046.72.

Average Lifetime per Driver: On average, a driver remains active with Lyft for about 55.1 days.

Driver Segments:

Low Revenue Drivers:
Average Revenue: $453.88
Average Lifetime: 36.4 days
Number of Drivers: 210
Medium Revenue Drivers:
Average Revenue: $1556.15
Average Lifetime: 52.1 days
Number of Drivers: 209
High Revenue Drivers:
Average Revenue: $3754.98
Average Lifetime: 64.9 days
Number of Drivers: 209
Very High Revenue Drivers:
Average Revenue: $6434.27
Average Lifetime: 67.1 days
Number of Drivers: 209
Main Factors Affecting Driver's Lifetime Value
Ride Frequency: Drivers with higher ride frequencies tend to generate more revenue.
Ride Duration and Distance: Longer and farther rides contribute to higher earnings.
Prime Time Rides: Rides taken during prime time periods, which have a higher fare multiplier, significantly increase driver revenue.
Driver Retention: Drivers who stay longer with Lyft tend to accumulate higher revenues.
Recommendations
Driver Retention Programs:

Incentives for Longevity: Offer bonuses or incentives for drivers who stay active beyond the average lifetime of 55 days.
Engagement Activities: Regularly engage with drivers through feedback sessions, performance reviews, and personalized support to address their needs 
and concerns.

Incentive Programs:

Performance-Based Bonuses: Implement a tiered bonus system that rewards drivers based on the number of rides completed, total distance driven, or 
total revenue generated.
Prime Time Promotions: Encourage drivers to operate during prime time periods by offering additional incentives or bonuses.

Targeted Support:

New Driver Support: Provide comprehensive training and support to new drivers to help them maximize their earnings potential early on.
Segment-Specific Strategies: Develop tailored strategies for different driver segments (Low, Medium, High, Very High) to address their unique needs 
and optimize their performance.
Marketing and Recruitment:

Attract High-Performing Drivers: Focus marketing efforts on attracting drivers who are likely to perform well, based on historical data of high-revenue
drivers.
Referral Programs: Implement referral programs to encourage existing drivers to refer high-quality new drivers.
Writeup for Cross-Functional Audience
Executive Summary:
The analysis of driver data at Lyft reveals key insights into driver behavior and revenue generation. On average, drivers generate $3046.72 over a 
period of 55.1 days. Segmenting drivers into four revenue quartiles (Low, Medium, High, Very High) helps in identifying specific strategies to improve 
driver retention and performance.

Main Factors:

Ride frequency, duration, and distance are crucial in determining driver revenue.
Prime time rides significantly boost earnings.
Retention is vital for maximizing driver lifetime value.
Recommendations:
To enhance driver retention and increase revenue, Lyft should implement driver retention programs, performance-based incentives, and targeted support.
Additionally, marketing efforts should focus on attracting high-performing drivers, and referral programs can help bring in quality new drivers.

This comprehensive approach will not only increase the lifetime value of drivers but also contribute to overall business growth and customer 
satisfaction.

")