# Taxi company car allocation


### Optimization Methods, MA06002, 2016

* Artem Korenev
* Andrei Kvasov
* Anton Marin
* Oleg Sudakov

It is common knowledge, that the financial efficiency, or, equivalently, profit of a taxi company depends on how many customers its drivers will be able to serve, and what resources should be spent to achieve such result. The company should also consider its reputation as a brand, which depends on, among other factors, on how fast its cars are able to pick up clients and to drive them to their destination. 

Both of these metrics greatly depend on how many cars a company should employ, and where they should be allocated geographically during the day.

In this problem, our team will examine historical data of Uber pickups in New York City, collected from April to September 2014.

# Problem formulation

The dataset contains the following information about Uber pickups in New York City: Date and time of the Uber pickup, the latitude of the Uber pickup, and the longitude of it. The pickup locations will be represented as points of a graph, where the weight of a given edge between vertices will be the distance between two points. We will be solving vehicle allocation problem, assuming that we know the orders' locations and times in advance.

Let's examine elements of the problem individually:

## Taxi drivers

The total number of taxi drivers will be denoted as $N$. Each driver will start his working day at position $X_0^j$ at time $T_0^j$, where $j$ denotes the index of a taxi car.

We will assume, that the driver's shift has a following schedule:

1. The car $j$ waits for orders at a specific point $X_w^j$, starting from the time $T_w^j$;
2. The car drives to pick up the client $k$ at the same point at time $T_p^k$;
3. After picking up the client, car immediately drives to its destination $X_d$. It arrives at time $T_a^k$;
4. The car drops the client at the destination point and afterwards heads to the next pickup point $X_{w+1}^j$ to wait for the order. It starts waiting for the next client at time $T_{w+1}^j$ (if the car waits for the next client at the drop-off point of the previous order, $T_{w+1}^j = T_a^k$.

Obviously, $T_a^k = T_p^k + time(X_p, X_d)$, and $T_{w+1}^j = T_a^k + time(X_d, X_{w+1}^j)$, where the $time(A, B)$ denotes the time needed to get from point $A$ to point $B$. $T_p^k$ is given by the dataset.

We will introduce two penalties for each car:

* $c_{d}$ - downtime cost per unit of time, induced on each taxi car when it is stationary.
* $c_{f}$ - fuel cost per unit of time, induced on each taxi car when it is en route.

Therefore, if the car $j$ is assigned to order $k$, the penalty for this individual order equals to:

$$P_{j, k} = time(X_w^j, X_p)c_{f} + (T_p^k - T_w^j - time(X_w^j, X_p))c_{d} + time(X_p, X_d)c_{f} + time(X_d, X_{w+1}^j)c_{f}$$

This formula does not take into account the downtime at the end of the day. Given end of workshift time $T_{end}$, car index $j$ and the number of orders that this car completed during the day as $n_j$ the penalty for the downtime at the end of the day for this car is $$E_j = (T_{end} - T^j_{n_j})c_{d}$$

The penalty for maintaining individual car will be denoted as $c_{a}$ ($a$ stands for auto). Then, the additional penalty for maintaining $N$ cars on a given day equals to $Nc_a$

## Clients

The total number of clients, or orders will be denoted as $C$. Individual order is represented by a tuple $(t^k, X_p, X_d)$, where $t^k$ denotes the minimum possible time for a pickup, $X_p$ the pickup point, and $X_d$ the destination point. 

A cost for a late pickup of the client will be denoted as $c_{t}$. Therefore, for individual order $k$ the penalty for the pickup delay equals to $D_k = (T_p^k - t^k)c_{t}$.

## Total penalty

To sum up, we will aim to minimize taxi car downtime, pickup delays and fuel costs. Thus, for a given formulation of a problem our goal is:

$$\Sigma_{j, k = 1}^{N, C}P_{j, k} + \Sigma_{j=1}^N E_j + Nc_a + \Sigma_{k=1}^C D_k \rightarrow min$$

# Chosen Methods

## 1. Greedy Algorithms

One of the most straightforward approaches to the problem are greedy algorithms. We can perform the greedy steps in order to make decisions upon pick ups of clients. However, it can be clearly seen almost instantly that applying such algorithms will not produce an optimal solution. However, applying such technique we can reach a sufficient approximation.

## 2. Dynamic Programming

The second approach to the car allocation problem solution would be to use dynamic programming. Then, it could be solved using the same method, as speed scheduling problem, which was considered in the course. Given some optimal car assignment to orders up to the order $j$ (if the orders are sorted in time), we find the optimal value for each driver $i$ if we decide to assign him or her to order $j+1$. 

Given the information about the previous orders, the penalty for current assignment can easily be calculated, as we have information about geographical location of previous drop-off point, drop-off time, time and distance between previous drop-off point and current pickup point.

To solve such problem, a matrix of size $N \times C$ will be used, where $C$ denotes the number of clients, or equivalently, the number of orders, and $N$ denotes the number of drivers. Then, the final assignment will be recovered from the minimal value of column $C-1$ (numeration starts from 0) by backtracking.

## 3. Linear Programming

The third approach to the problem would be to use LP-relaxation. Given the variable matrix $A$ of size $N \times C$, where $C$ denotes the number of clients, and $N$ denotes the number of drivers, $a_{i,j} = 1$, if driver $i$ was assigned to order $j$, and $0$ otherwise.

Formulating the car allocation problem will require implementation of sophisticated restrictions and weights function. 

## 4. Integer Programming

## 5. Flow Problem

# CODE (WIP)

In [122]:
import googlemaps as googlemaps
import pandas as pd
import numpy as np
from datetime import datetime
import time

def read_data(filepath = "data\\04.12.2014.csv"):
    '''Reads information about pickup times and points from .csv file
       Returns dataframe, containing Date/Time, Latitude of pickup and Longitude of pickup columns'''
    data = pd.read_csv("data\\04.12.2014.csv", sep = ";", usecols=["Date/Time", "Lat", "Lon"])
    data.columns = ["Date/Time", "LatP", "LonP"]
    return data

def add_destinations(data):
    '''Adds destination locations for even rows of the dataframe. Location is taken from the next odd row
       Returns even rows of the dataframe with added destinations'''
    data["LatD"] = np.append(data["LatP"].values[1:], 0)  #Added zero to match length of initial column
    data["LonD"] = np.append(data["LonP"].values[1:], 0)
    return data[::2].reset_index(drop = True)

'''NOT USED DUE TO API RESTRICTIONS
def calculate_time(latP, lonP, latD, lonD, gmaps, dtime = 0):
    #TODO: implement departure time to be in future
    #TODO: normal JSON parsing
    latLonP = "{},{}".format(latP, lonP)
    latLonD = "{},{}".format(latD, lonD)
    #dt = datetime.strptime(dtime, "%d.%m.%Y %H:%M")
    #epoch = datetime.utcfromtimestamp(0)
    #secondsUTC = (dt - epoch).total_seconds()
    distMatrix = gmaps.distance_matrix([latLonP], [latLonD]) #departure_time=secondsUTC
    timeInSeconds = distMatrix['rows'][0]['elements'][0]['duration']['value']
    return timeInSeconds

def fill_time(data, gmaps, delay = 0.00001):
    times = []
    for i in data.index:
        latP = data['LatP'].values[i]
        lonP = data['LonP'].values[i]
        latD = data['LatD'].values[i]
        lonD = data['LonD'].values[i]
        times.append(calculate_time(latP, lonP, latD, lonD, gmaps))
        time.sleep(delay)
        if i % 1000 == 0 and i != 0:
            print("Processed {} rows".format(i))
    data['Time'] = times
    return data
'''
        
def calculate_time(latP, lonP, latD, lonD, speed):
    '''Calculates time, needed to travel from point P to point D with given constant speed. Accepts arrays as input.  
       Returns time in seconds'''
    r = 6367
    latP, lonP, latD, lonD = map(np.radians, [latP, lonP, latD, lonD])
    dlon = lonD - lonP
    dlat = latD - latP
    a = np.sin(dlat/2.0)**2 + np.cos(latP) * np.cos(latD) * np.sin(dlon/2.0)**2
    c = 2 * np.arcsin(np.sqrt(a))
    distance = r * c
    return (distance/60.0)*3600.0

def fill_time(data, speed = 60.0):
    '''Adds travel time to dataframe'''
    data["Time"] = calculate_time(data['LatP'].values, data['LonP'].values, data['LatD'].values, data['LonD'].values, speed)
    return data

data = read_data()
data = add_destinations(data)
gmaps = googlemaps.Client(key='AIzaSyDoCDzbj2B3kG-GALmY-f1RRkpmWmoTnPM')
data = fill_time(data, gmaps)

In [123]:
data

Unnamed: 0,Date/Time,LatP,LonP,LatD,LonD,Time
0,04.12.2014 0:00,40.7480,-73.9870,40.7440,-73.9872,26.689155
1,04.12.2014 0:03,40.7268,-73.9834,40.7436,-74.0290,256.168148
2,04.12.2014 0:08,40.7436,-74.0290,40.7422,-73.9989,152.339681
3,04.12.2014 0:13,40.7270,-73.9839,40.7422,-74.0294,251.225824
4,04.12.2014 0:14,40.7340,-73.9897,40.7393,-73.9936,40.459514
5,04.12.2014 0:15,40.7177,-73.9494,40.7409,-74.0051,321.141487
6,04.12.2014 0:17,40.7323,-73.9807,40.7151,-73.9515,186.875972
7,04.12.2014 0:21,40.7676,-73.9305,40.7295,-73.9870,382.074141
8,04.12.2014 0:23,40.6752,-73.9599,40.7089,-73.7514,1077.748057
9,04.12.2014 0:25,40.7224,-73.9972,40.7609,-73.9798,271.331403
