## Revenue Management
### Dynamic Programming
#### Model of the demand
$D(t, p)= D_ne^{-\alpha(t)(\frac{p}{p_n}-1)}$ 
  With:
- $p_1 > ... > p_n$ the prices  
- $\alpha(t)$ the price sensitivity  
- $D(t, p)$ the number of tickets sold  

#### State
$s = (t, x)$  
  With:
- $t$ : time to departure  
- $x$ : remaining capacity
 
Hypothesis: The time to departure is decomposed in micro-times $t$ in a non-linear way, meaning that $t$ can represent several weeks if we are far from departure and a few minutes if we are close to departure. Only one person can arrive per time interval $t$.
#### Bellman optimality equation
$v_*(t, x) = P(\mbox{person arrives})P(\mbox{person buys})(\mbox{price} + v_*(t-1, x-1)) + P(\mbox{person arrives})P(\mbox{person does not buy})v_*(t-1, x) + P(\mbox{no one arrives})v_*(t-1, x)$
 $$v_*(t, x) = \max_{p\in\{p_1, ..., p_n\}}\{\lambda(t)e^{-\alpha(t)(\frac{p}{p_n}-1)}(p + v_*(t-1, x-1))+ (1-\lambda(t)e^{-\alpha(t)(\frac{p}{p_n}-1)})v_*(t-1,x)\}$$  
    With:
- $\lambda(t)$ the probability that a person arrives at $t$  
- $e^{-\alpha(t)(\frac{p}{p_n}-1)}$ the purchase probability, the probability that the person buys a ticket  
- $v_*(t, x)$ the optimal total revenue that the airline can earn if there are $t$ micro-times left and $x$ seats left  


We want to determine $p^* = \pi_*(x, t)$ the optimal price to which sell a seat if there are $t$ micro-times left and $x$ seats left. 
The traditional RM approach uses historical booking database to estimate the forecast parameters ($\lambda$, $\alpha$). The Bellman equation is solved recursively to get $v_*(x,t)$. 

#### Initialization
Terminal states:
- $(0, x) \forall x$, no more time left
- $(t, 0) \forall t$, no more seats left

So:
- $v_*(0, x) = 0 \forall x$
- $v_*(t, 0) = 0 \forall t$

#### Probability of arriving $\lambda(t)$
$\lambda(t)$ can be constant $\forall t$.  
 Indeed if we fix $\lambda(t) = 0.2 \forall t$ and the number of micro-times to $500$ that means that at the end of all the micro-times $100$ people on average arrived.
 %What does it mean to arrive ? Is it just to look at the prices of the flight ? Independent from the purchase probability ?
 
 #### Price sensitivity $\alpha(t)$
 - $p_{50}(t)$ : price at which we sell a number of seats equal to half of the total capacity of the plane 
 - FRat5 = Fare Ratio at $50\% = \frac{p_{50}(t)}{p_n} = \Phi(t) > 1$  
 
$\alpha(t) = \frac{\ln(2)}{\frac{p_{50}(t)}{p_n}}>0$  
To determine $\alpha(t)$ we thus need to determine $\Phi(t)$. $\Phi(t)$ can be approximated by a logistic function that looks like $\frac{L}{1+e^{-k(t-t_0)}}+b$. The parameters depend on the route, on the market.  
$\Phi(t)$ increases with time so $\alpha$ decreases with time. For our first very naive approach we can suppose that $\alpha(t)$ is constant.  

#### Bid Prices
Intuitively the Bid Price is the revenue that we would lose by giving away a seat for free. It is the optimal total revenue that the airline can earn at $(x,t)$ minus the optimal total revenue that the airline can earn with one seat less to sell.  
$BP(t, x) = v_*(t-1,x) - v_*(t-1, x-1)$  
So as far as the traditional approach in RM forecasting is concerned, once the historical booking database has been used to estimate the forecast parameters ($\lambda, \alpha$) the Bellman equation is solved recursively to obtain $V_*(x, t)$ which in turn allows us to get the bid price $BP(x,t)$.  
Finally, the RM acceptance criterion in state $s=(x, t)$ is: accept $f$ if $f\geq BP(x,t)+FM$ where $FM = \frac{p_n}{\alpha}$
### Deep Q-Learning
#### Network
Here a neural network with an input dimension of $2$ and an output dimension of $n$ (the number of different price classes used).
For each state $(x, \tau)$ the network produces $n$ Q-values, one for each class of price. The class of price with the highest Q-value is then picked up for the state $(x, \tau)$.  
#### Data Collection Points
With this DQL approach we no longer use micro-times $t$ but Data Collection Points (DCP) $\tau$ which represent a grouping of micro-times. It is thus possible to have more than one buyer in a DCP.  
#### Bellman equation
The Bellman equations for $Q$ become:  
- if $\tau_{i+1}-1 \leq t < \tau_i -1 $ : 
$Q_*^{DQL}(t, x, p) = \lambda(t)e^{-\alpha(t)(\frac{p}{p_n}-1)}(p + Q_*^{DQL}(t-1, x-1, p)) + (1-\lambda(t))e^{-\alpha(t)(\frac{p}{p_n}-1)}Q_*^{DQL}(t-1,x, p)$ 
- if $t = \tau_i - 1$:
$Q_*^{DQL}(t, x, p) = \lambda(t)e^{-\alpha(t)(\frac{p}{p_n}-1)}(p + V_*^{RMS}(t-1, x-1)) + (1-\lambda(t))e^{-\alpha(t)(\frac{p}{p_n}-1)}V_*^{RMS}(t-1, x-1)$  
With $V_*^{RMS}(t, x) = \max_pQ(t-1, x-1, p)$ 

These equations are written for micro-times and they mean that at each DCP we can change the action and select the best one while we can not change the action if we are between two DCPs.

#### Initialization
- $Q(0, x, p) = 0 \forall x \forall p$
- $Q(t, C, p) = 0 \forall x \forall p$
- $Q(\tau,x,p;\theta_0) = RMS$

#### Training
$L_{MB}(\theta) = \sum_{(x, \tau), p, (x', \tau+1)\in MB}(r + \max_pQ(\tau + 1, x', p; \theta) - Q(\tau, x, p;\theta^-))^2$
$\theta_i = \underset{\theta}{\operatorname{argmin}}L_{MB}(\theta)$