# LYFT DATA CHALLENGE

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
%matplotlib inline

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from helper import *

In [3]:
drivers = pd.read_csv('driver_ids.csv')
rideID = pd.read_csv('ride_ids.csv')
ridetime = pd.read_csv('ride_timestamps.csv')

# GOALS

After exploring and analyzing the data, please:
 
#### 1.     Recommend a Driver's Lifetime Value (i.e., the value of a driver to Lyft over the entire projected lifetime of a driver).
- Most obvious is the revenue that the driver has accumulated normalized by time
- Shorter rides vs longer rides value
- Which time are they active? Are more drivers getting more during late or early or busienss?
Why 
 
#### 2. 	Please answer the following questions:
 
    a) What are the main factors that affect a driver's lifetime value?
 
    b) What is the average projected lifetime of a driver? That is, once a driver is onboarded, how long do they typically continue driving with Lyft?
 
    c) Do all drivers act alike? Are there specific segments of drivers that generate more value for Lyft than the average driver?
 
    d) What actionable recommendations are there for the business?
 
#### 3.     Prepare and submit a writeup of your findings for consumption by a cross-functional audience.
***

# Column Elements

### Drivers:
- #### driver_id: 
        Unique identifier for a driver
- #### driver_onboard_date : 
        Date on which driver was on-boarded

### RideID:
- driver_id = Unique identifier for a driver
 
- ride_id = Unique identifier for a ride that was completed by the driver
 
- ride_distance = Ride distance in meters
 
- ride_duration = Ride duration in seconds
 
- ride_prime_time = Prime Time applied on the ride

### RideTime:
- ride_id = Unique identifier for a ride
 
- event = event describes the type of event (see below)
 
- timestamp = Time of event

***
 
#### EVENT TYPES:
 
- #### requested_at : 
        passenger requested a ride
 
- #### accepted_at :
        driver accepted a passenger request
 
- #### arrived_at : 
        driver arrived at pickup point
 
- #### picked_up_at : 
        driver picked up the passenger
 
- #### dropped_off_at : 
        driver dropped off a passenger at destination

 


 



 
 
 

 

 

 



***
# Assumptions:
- All rides in the data set occurred in San Francisco
 
- All timestamps in the data set are in UTC

## Rates:

- Base Fare : $2.00 

- Cost per Mile : $1.15

- Cost per Minute : $0.22 

- Service Fee : $1.75

- Minimum Fare : $5.00

- Maximum Fare : $400.00
 



***

 

 

 

 


In [4]:
print(len(drivers) == drivers.driver_id.nunique())
drivers.head()

True


Unnamed: 0,driver_id,driver_onboard_date
0,002be0ffdc997bd5c50703158b7c2491,2016-03-29 00:00:00
1,007f0389f9c7b03ef97098422f902e62,2016-03-29 00:00:00
2,011e5c5dfc5c2c92501b8b24d47509bc,2016-04-05 00:00:00
3,0152a2f305e71d26cc964f8d4411add9,2016-04-23 00:00:00
4,01674381af7edd264113d4e6ed55ecda,2016-04-29 00:00:00


In [5]:
rideID.head()

Unnamed: 0,driver_id,ride_id,ride_distance,ride_duration,ride_prime_time
0,002be0ffdc997bd5c50703158b7c2491,006d61cf7446e682f7bc50b0f8a5bea5,1811,327,50
1,002be0ffdc997bd5c50703158b7c2491,01b522c5c3a756fbdb12e95e87507eda,3362,809,0
2,002be0ffdc997bd5c50703158b7c2491,029227c4c2971ce69ff2274dc798ef43,3282,572,0
3,002be0ffdc997bd5c50703158b7c2491,034e861343a63ac3c18a9ceb1ce0ac69,65283,3338,25
4,002be0ffdc997bd5c50703158b7c2491,034f2e614a2f9fc7f1c2f77647d1b981,4115,823,100


In [6]:
ridetime.head(5)

Unnamed: 0,ride_id,event,timestamp
0,00003037a262d9ee40e61b5c0718f7f0,requested_at,2016-06-13 09:39:19
1,00003037a262d9ee40e61b5c0718f7f0,accepted_at,2016-06-13 09:39:51
2,00003037a262d9ee40e61b5c0718f7f0,arrived_at,2016-06-13 09:44:31
3,00003037a262d9ee40e61b5c0718f7f0,picked_up_at,2016-06-13 09:44:33
4,00003037a262d9ee40e61b5c0718f7f0,dropped_off_at,2016-06-13 10:03:05


Adding fare & determining timestamp for each ride.

In [12]:
rideID['fare'] = rideID.apply(value_of_ride, axis = 1)
rideID = rideID.merge(ridetime[ridetime['event'] == 'arrived_at'][['ride_id','timestamp']], 
                      on = "ride_id", 
                      how = "left")

In [13]:
rideID.head()

Unnamed: 0,driver_id,ride_id,ride_distance,ride_duration,ride_prime_time,fare,timestamp
0,002be0ffdc997bd5c50703158b7c2491,006d61cf7446e682f7bc50b0f8a5bea5,1811,327,50,5.0,2016-04-23 02:16:36
1,002be0ffdc997bd5c50703158b7c2491,01b522c5c3a756fbdb12e95e87507eda,3362,809,0,5.368741,2016-03-29 19:03:57
2,002be0ffdc997bd5c50703158b7c2491,029227c4c2971ce69ff2274dc798ef43,3282,572,0,5.0,2016-06-21 12:01:32
3,002be0ffdc997bd5c50703158b7c2491,034e861343a63ac3c18a9ceb1ce0ac69,65283,3338,25,58.889055,2016-05-19 09:18:20
4,002be0ffdc997bd5c50703158b7c2491,034f2e614a2f9fc7f1c2f77647d1b981,4115,823,100,5.95815,2016-04-20 22:07:03


In [17]:
drivers

Unnamed: 0,driver_id,driver_onboard_date
0,002be0ffdc997bd5c50703158b7c2491,2016-03-29 00:00:00
1,007f0389f9c7b03ef97098422f902e62,2016-03-29 00:00:00
2,011e5c5dfc5c2c92501b8b24d47509bc,2016-04-05 00:00:00
3,0152a2f305e71d26cc964f8d4411add9,2016-04-23 00:00:00
4,01674381af7edd264113d4e6ed55ecda,2016-04-29 00:00:00
5,01788cf817698fe68eaecd7eb18b2f72,2016-05-06 00:00:00
6,0213f8b59219e32142711992ca4ec01f,2016-04-07 00:00:00
7,021e5cd15ef0bb3ec20a12af99e142b3,2016-05-07 00:00:00
8,0258e250ca195cc6258cbdc75aecd853,2016-04-26 00:00:00
9,028b5a4dcd7f4924ebfabcf2e814c014,2016-05-06 00:00:00
