# Airport Ride Service Problem - Solution 2
**Michael Santoro - michael.santoro@du.edu**
## Problem Statement
You're launching a ride-hailing service that matches riders with drivers for trips between the Toledo Airport and Downtown Toledo. It'll be active for only 12 months. You've been forced to charge riders \$30 for each ride. You can pay drivers what you choose for each individual ride.

The supply pool (“drivers”) is very deep. When a ride is requested, a very large pool of drivers see a notification informing them of the request. They can choose whether or not to accept it. Based on a similar ride-hailing service in the same market, you have some [data](https://docs.google.com/spreadsheets/d/1gEMVOCXvWBcoUsTnfgDazKdur09KvnUDKW_9BE_XOD0/edit#gid=2115831440) on which ride requests were accepted and which were not. (The PAY column is what drivers were offered and the ACCEPTED column reflects whether any driver accepted the ride request.)

The demand pool (“riders”) can be acquired at a cost of $30 per rider at any time during the 12 months. There are 10,000 riders in Toledo, but you can't acquire more than 1,000 in a given month. You start with 0 riders. “Acquisition” means that the rider has downloaded the app and may request rides. Requested rides may or may not be accepted by a driver. In the first month that riders are active, they request rides based on a [Poisson distribution](https://en.wikipedia.org/wiki/Poisson_distribution) where lambda = 1. For each subsequent month, riders request rides based on a Poisson distribution where lambda is the number of rides that they found a match for in the previous month. (As an example, a rider that requests 3 rides in month 1 and finds 2 matches has a lambda of 2 going into month 2.) If a rider finds no matches in a month (which may happen either because they request no rides in the first place based on the Poisson distribution or because they request rides and find no matches), they leave the service and never return.

Submit a written document that proposes a pricing strategy to maximize the profit of the business over the 12 months. You should expect that this singular document will serve as a proposal for
1. A quantitative executive team that wants to know how you're thinking about the problem and what assumptions you're making but that does not know probability theory
1. Your data science peers so they can push on your thinking
Please submit any work you do, code or math, with your solution.

## Introduction
In this solution I build on the work done in the initial work done in `AirportRideService.ipynb`. But this solution takes a big leap interms of complexity in that the problem is more formally formatted as Reinforcement Learning problem. Modeling the taxi driver acceptance as the enviroment and the agent seeking to maximize profit.

## Training

## Random Agent Results

In [1]:
from dqn.dqn_agent import Agent

agent = Agent(state_size=2, action_size=1, seed=42)

from env.driver import driver_env_arr

env = driver_env_arr()
total_profit = 0
state = env.reset()

for m in range(1,12):
    actions = agent.act(state)
    actions = ((actions + 1) / (1 + 1)) * 30
    total_profit += env.step(actions)
    state = env.add_riders(m)
    print(f'Total Profit: {total_profit:.2f}\tMonth: {m}')

Accuracy: 0.825
Total Profit: 11209.39	Month: 1
Total Profit: 31649.95	Month: 2
Total Profit: 60820.99	Month: 3
Total Profit: 98140.79	Month: 4
Total Profit: 142790.06	Month: 5
Total Profit: 193990.55	Month: 6
Total Profit: 251317.23	Month: 7
Total Profit: 314716.62	Month: 8
Total Profit: 383811.65	Month: 9
Total Profit: 458457.49	Month: 10
Total Profit: 521969.82	Month: 11


## Trained Agent Results

In [2]:
import torch
from dqn.dqn_agent import Agent

agent = Agent(state_size=2, action_size=1, seed=42)
agent.qnetwork_local.state_dict = torch.load('checkpoint.pth')

from env.driver import driver_env_arr

env = driver_env_arr()
total_profit = 0
state = env.reset()

for m in range(1,12):
    actions = agent.act(state)
    actions = ((actions + 1) / (1 + 1)) * 30
    total_profit += env.step(actions)
    state = env.add_riders(m)
    print(f'Total Profit: {total_profit:.2f}\tMonth: {m}')

Accuracy: 0.825
Total Profit: 11968.40	Month: 1
Total Profit: 32996.11	Month: 2
Total Profit: 62520.62	Month: 3
Total Profit: 99791.28	Month: 4
Total Profit: 143613.55	Month: 5
Total Profit: 194813.00	Month: 6
Total Profit: 252015.52	Month: 7
Total Profit: 314707.94	Month: 8
Total Profit: 382991.95	Month: 9
Total Profit: 456643.30	Month: 10
Total Profit: 519214.57	Month: 11
