Using the **NYC Taxi and Limousine Commission (TLC)** Yellow Taxi dataset from June 2017, we will try to solve following questions?

1. Imagine that you decide to drive a taxi for 10 hours each week to earn a little extra money. Explain how you would approach maximizing your income as a taxi driver.

2. If you could enrich the dataset, what would you add?  Is there anything in the dataset that you don’t find especially useful?

To answer the first question we can use TLC Yellow Taxi Dataset to 
1. Build a summary statistics of the demand, fare, duration, and wait time (time to get new customer after dropping previsous customer) for different location and times across New York City. 
2. And use a reinforcement learning model to maximize the income (for individual driver working 10 hour per week) by helping select the best policy (i.e. select best time and route).

## Question 1 Solution

To optimize the income, taxi driver should aim to have the maximal occupancy time and the minimal wait time. Using reinforcement learning we can find the best route to reduce the wait time. This will help taxi driver whether to wait in the same location or go to nearby location to find the customer.

### Reinforcement Learning

The income optimization problem for taxi can be mapped into a reinforcement learning. For the ease of implimentation we will ignore day of a week. So, we will find a policy best fit for 10 hours in any day.

#### 1. Framework
After dropping a customer, a taxi driver can take two action:
1. Find the customer in the same location by waiting
2. Cannot find the customer in the same location so have to go to new location for searching customer

Going to new location will have a cost, but if they choose the new location wisely/correctly they can find customer fast (get rewared) and increase the income at the end of the day. This can be mapped into a reinforcement learning model.

Let us define Markov Decision Process (MDP) for our problem:

- **State:** Can be described by current location and a hour of the day, $S = (L, t)$

- **Reward:** $R(L, L', t)$ Mean (avg) trip fare while travelling from pickup location $L$ to drop off location $L'$ at time $t$ (hour of day)

- **Value Function:** $V(S, A)$ expected income after taking action $A$ from state $S$.

- **Action:** $A = \pi(L, t) = L'$ the next pickup location from state $S = (L, t)$. The driver goes to the new pick up location $L'$ picks up the next customer. All possible set of pickup and dropoff location.

- **Parameters:**
    - **Probability for Picking Passenger:** $P_{pick}(L,t)$ is the probability of picking passenger in location $L$ at time $t$. This probability can be calculated by dividing the number of successful pickup in location $L$ by the total number of trip(dropoff) in that location in that timeslot (hour of the day) $t$.
    
      $P_{pick}(L) = \frac{n_{pickup}(L, t)}{n_{pickup}(L, t) + n_{dropoff}(L,t)}$
      
    - **Transition Probability:** $P_{tran}(L, L', t)$ is the probability of travelling to location $L'$ from $L$ during the given time $t$ (hour of the day). we can estimate the transition probability by calculating the number trip between $L$ and $L'$ at time $t$, with total trip in NYC at time $t$.
   
    - **$T\_{wait(L,t)}$:** The time to get a new customer in location $L$ after dropping the old customer at given timeslot $t$. 
    - **$T\_{drive(L, L',t)}$:** The driving time from location $L$ to $L'$. We can calculate this by calculating the difference between dropoff time and pickup time.

- **State Transition:** The state transition function describe the possibility of taxi moving from state $S(L,t)$ to state $S'(L',t')$ after taking an action $A$. There are two possibilities:
   
   - Taxi sucessfully find passenger in location $L$ within time $T\_{wait(L,t)}$. The taxi then take the passenger to destination $L'$ with probability $P_{tran}(L,L',t)$. The taxi arrives at location $L'$ within $T\_{drive(L, L',t)}$ i.e. the driving time between $L$ and $L'$. Then the taxi will get reward $R(L, L', t)$. In this case the new state $S'$ will be $S' = S(L',t+T\_{wait(L,t)}+T\_{drive(L, L',t)}$.
   
   - Taxi does not find a passenger in location $L$ within $T\_{wait(L,t)}$ with probability $1-P_{pick}(L)$. Therefore, the taxi driver will not receive a reward. However, the taxi save the driving time $T\_{drive(L, L',t)}$ and the driver move to next location to find the passenger. In this case the new state $S'$ will be $S'=(L',t+T\_{wait(L,t)}$
    
    
#### Objective
The objective of the task (or MDP) is to maximize the total expected income by driving 10 hours a week from any given initial state. The terminal states are the states with t = 600 minutes (10 hour per week). No more actions can be taken once the system reaches the terminal states. 

The maximal expected reward for an action $A$ in state $S = (L, t)$
is expressed as $V(S, A)$.

$V (S, A) = (1 - P_{pick}(L))\times \max_{a'\epsilon A} V(L, t + T_{wait}(A), a') + \sum_{L'\epsilon L} P_{pick}(L) \times P_{tran}(L, L') \times [R(L, L') + \max_{a' \epsilon A} V(L',t + T_{wait}(A) + T_{drive}(L, L'),a')]$


The **optimal policy** $\pi^*$ is defined as: $\pi^*(S) = \arg \max {V(S, A)} $


And the **optimal value function** is given by $V^*(S) = V (S, \pi^*(S))$

#### 2. Learning Algorithm

We can use dynamic programming to find the optimal policy that maximized the income (or revenue within 10 hours).

****Algorithm****
![Dynamic Algorithm](algo.png)




#### 3. Read, Understand, and Clean Data 
First, read the dataset and understand the features and their statistics.
+ Import python libraries
+ Read the data file and disply the feature and their statistic
+ Remove any row with null value as well as outliers


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

#set seaborn style to matplotlib
plt.style.use('seaborn-whitegrid')


In [2]:
# read the data from the file
taxi_df = pd.read_csv('yellow_tripdata_2017-06.csv')#, nrows=100000)
taxi_df.head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount
0,2,2017-06-08 07:52:31,2017-06-08 08:01:32,6,1.03,1,N,161,140,1,7.5,1.0,0.5,1.86,0.0,0.3,11.16
1,2,2017-06-08 08:08:18,2017-06-08 08:14:00,6,1.03,1,N,162,233,1,6.0,1.0,0.5,2.34,0.0,0.3,10.14
2,2,2017-06-08 08:16:49,2017-06-08 15:43:22,6,5.63,1,N,137,41,2,21.5,1.0,0.5,0.0,0.0,0.3,23.3
3,2,2017-06-29 15:52:35,2017-06-29 16:03:27,6,1.43,1,N,142,48,1,8.5,1.0,0.5,0.88,0.0,0.3,11.18
4,1,2017-06-01 00:00:00,2017-06-01 00:03:43,1,0.6,1,N,140,141,1,4.5,0.5,0.5,2.0,0.0,0.3,7.8


**Check for missing data and remove them if any**

In [3]:
print(taxi_df.isnull().sum())

VendorID                 0
tpep_pickup_datetime     0
tpep_dropoff_datetime    0
passenger_count          0
trip_distance            0
RatecodeID               0
store_and_fwd_flag       0
PULocationID             0
DOLocationID             0
payment_type             0
fare_amount              0
extra                    0
mta_tax                  0
tip_amount               0
tolls_amount             0
improvement_surcharge    0
total_amount             0
dtype: int64


In [4]:
print('Data size before removing : %d' % len(taxi_df))
taxi_df = taxi_df.dropna(how = 'any', axis = 'rows')
print('Data size after removing : %d' % len(taxi_df))

Data size before removing : 9656993
Data size after removing : 9656993


In [5]:
# Display the statistic of the features
taxi_df.describe()

Unnamed: 0,VendorID,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount
count,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0,9656993.0
mean,1.546961,1.623943,2.978617,1.045527,162.6235,160.7379,1.33404,13.28727,0.3413314,0.4972247,1.87848,0.3376697,0.2996046,16.64632
std,0.4977898,1.264608,5.704095,0.5665036,66.75223,70.47343,0.4929622,215.1675,0.4623294,0.07625157,2.696221,2.022799,0.01441594,215.3387
min,1.0,0.0,0.0,1.0,1.0,1.0,1.0,-550.0,-50.56,-0.5,-74.0,-12.5,-0.3,-550.3
25%,1.0,1.0,1.0,1.0,114.0,107.0,1.0,6.5,0.0,0.5,0.0,0.0,0.3,8.75
50%,2.0,1.0,1.67,1.0,162.0,162.0,1.0,9.5,0.0,0.5,1.36,0.0,0.3,11.85
75%,2.0,2.0,3.1,1.0,233.0,233.0,2.0,15.0,0.5,0.5,2.46,0.0,0.3,18.17
max,2.0,9.0,9496.98,99.0,265.0,265.0,5.0,630461.8,22.5,140.0,444.0,990.0,1.0,630463.1


The major insight we can draw from the data description:-
1. The minimal fare_amount (and other monetary fields) is negative. This might be due to refund. 
2. The maximum number of passenger in the data is 9. However, a taxi can have max 6 passenger.

We will drop them as they can be considered an outliers.


In [6]:
print('Data size before removing : %d' % len(taxi_df))
taxi_df = taxi_df[taxi_df.fare_amount>=0]
taxi_df = taxi_df[taxi_df.extra>=0]
taxi_df = taxi_df[taxi_df.passenger_count<=6]
print('Data size after removing: %d' % len(taxi_df))

Data size before removing : 9656993
Data size after removing: 9652199


#### 4. Feature Generation

The next goal is to build a feature set. 
For reinforcement learning, we need to build a set of statistics based on day of the week, hour, pickup location, and dropoff location. We will gather information like: - 
+ Mean trip fare from Location L to L', 
+ Mean wait time at location L at given hour(time between getting new customer and dropping the old one), 
+ Mean driving time between L and L' at any hour.
+ Probability of picking a customer ($P_{pick}$) in Location L at any time t (hour of the day)
+ Probability of going from location L to L' at any time t ($P_{tran}$)


In [7]:
#Get hour of day and day of week from date time
taxi_df['tpep_pickup_datetime'] = pd.to_datetime(taxi_df.tpep_pickup_datetime)
taxi_df['tpep_dropoff_datetime'] = pd.to_datetime(taxi_df.tpep_dropoff_datetime)
taxi_df['hour'] = taxi_df['tpep_pickup_datetime'].apply(lambda x: x.hour)

#Find total trip duration from pickup location to dropoff location
taxi_df['trip_duration'] = (taxi_df['tpep_dropoff_datetime']-taxi_df['tpep_pickup_datetime']).astype('timedelta64[m]')

#find wait time between trip in same location
taxi_df['prev_drop_time'] = taxi_df['tpep_dropoff_datetime'].shift(1)[
            (taxi_df['VendorID'] == taxi_df['VendorID'].shift(1))]# & (taxi_df['PULocationID'] == taxi_df['DOLocationID'].shift(1))]

#taxi_df = taxi_df.dropna(how='any', axis='rows').reset_index()
taxi_df['wait_time'] = (taxi_df['tpep_pickup_datetime'] - taxi_df['prev_drop_time']).astype('timedelta64[m]')

#get the trip fare, mean drive time and mean wait time group by location, hour and day
taxi_summary = taxi_df.groupby(['hour', 'PULocationID', 'DOLocationID']).agg(
            {'trip_duration': ['mean'],'wait_time':['mean'], 'PULocationID':['count'], 'total_amount': ['mean']}).reset_index()
taxi_summary.columns = ['hour', 'PULocationID', 'DOLocationID', 'trip_duration', 'wait_time','n_trip', 'trip_revenue']
taxi_summary = taxi_summary.dropna(how='any', axis='rows').reset_index()

print(len(taxi_summary))
taxi_summary.head()

225779


Unnamed: 0,index,hour,PULocationID,DOLocationID,trip_duration,wait_time,n_trip,trip_revenue
0,0,0,1,1,0.2,-11.0,5,73.53
1,1,0,2,186,30.0,-10.0,1,65.56
2,3,0,4,4,4.125,-12.088889,56,12.36125
3,4,0,4,7,26.142857,-10.5,7,29.502857
4,5,0,4,13,12.0,-10.333333,6,17.895


Function to calculate transit probability and pick probability

In [8]:
    def get_pickup_prob(pickup, dropoff):
        pickup_count = {}
        dropoff_count = {}
        p_pick = {}
        
        for index, row in dropoff.iterrows():
            key = str(row['hour'])+ ' '+ str(row['DOLocationID'])
            if key in dropoff_count:
                dropoff_count[key] += row['n_dropoff']
            else:
                dropoff_count[key] = row['n_dropoff']
        
        for index, row in pickup.iterrows():
            key = str(row['hour'])+ ' '+ str(row['PULocationID'])
            if key in pickup_count:
                pickup_count[key] += row['n_pickup']
            else:
                pickup_count[key] = row['n_pickup']

        for key in pickup_count:
            if key in dropoff_count:
                p_pick[key] = (pickup_count[key]/(pickup_count[key]+dropoff_count[key]))

        return p_pick

    def get_tranist_prob(taxi_transit):
        p_tran = {}
        trip_count = {}
        total = {}
        for index, row in taxi_transit.iterrows():

            if str(row['hour']) in total:
                total[str(row['hour'])] += row['n_trip']
            else:
                total[str(row['hour'])] = row['n_trip']

            key = str(row['hour']) + ' ' + str(row['DOLocationID']) + ' ' + str(row['PULocationID'])

            if key in trip_count:
                trip_count[key] += row['n_trip']
            else:
                trip_count[key] = row['n_trip']

        for key in trip_count:
            hour = key.split(' ')[0]
            p_tran[key] = trip_count[key]/total[hour]
        return p_tran

#### Get all the feature for Learning Algorithm
1. Set of all the location $L$
2. Set of all actions $A$
3. Set of time $t$
4. Pickup Probability $P_{pick}$
5. Transition Probability $P_{tran}$
6. Reward set $R$
7. Drive Time $T_{wait}$
8. Wait time $T_{wait}$

In [9]:
#get count for taxi pickup in location L at time t
taxi_pickup = taxi_summary.groupby(['hour', 'PULocationID']).agg({'n_trip':['sum']}).reset_index()
taxi_pickup.columns=['hour', 'PULocationID', 'n_pickup']

#get count for taxi dropoff in location L at time t
taxi_dropoff = taxi_summary.groupby(['hour', 'DOLocationID']).agg({'n_trip': ['sum']}).reset_index()
taxi_dropoff.columns = ['hour', 'DOLocationID', 'n_dropoff']

#Get number of trip from L to L1 at time t
taxi_transit = taxi_summary.groupby(['hour', 'PULocationID', 'DOLocationID']).agg({'n_trip':['sum']}).reset_index()
taxi_transit.columns = ['hour', 'PULocationID', 'DOLocationID', 'n_trip']

#Get list of all location from taxi zone lookup file
location = pd.read_csv('taxi+_zone_lookup.csv')
L = np.array(location['LocationID'])

A = np.array(taxi_summary[['PULocationID', 'DOLocationID', 'hour']])
T = [x for x in range(24)]

#Calculate Pick probabilty for all location and time
p_pick = get_pickup_prob(taxi_pickup, taxi_dropoff)

#Get Transit probability between L and L1 at time t
p_tran = get_tranist_prob(taxi_transit)

r = np.array(taxi_summary[['PULocationID', 'DOLocationID','hour', 'trip_revenue']])
t_drive = np.array(taxi_summary[['PULocationID', 'DOLocationID','hour','trip_duration']])
t_wait = np.array(taxi_summary[['PULocationID', 'DOLocationID', 'hour', 'trip_duration']])  

We have all the information we need to train our algorithm.

Applying a reinforcement learning model to the TLC Yellow taxi data we can obtain an optimal policy can be
obtained that maximizes the income generated by a single driver. Given a starting location and a time of the day this
model can direct drivers to the optimal location to find a customer. 

***Note :- Due to time constraint I stopped here for addressing the solution of question 1. I hope my overall approach to address the given solution is clear.***

## Question 2 Solution

The data lacks the feedback from the driver about their intention for moving to specific location. This is an important information that will enhance our model to calculate the probability value (both pickup and transition probability).