#### Week 10
#### The problem to be discussed in the lab
## Inventory dynamics


Consider the following inventory problems in discrete time $t=0,\dots,T$, where possibly $T=\infty$.

The notation is:
- $x(t)\ge 0$ is inventory at period $t$
- $d(t)\ge 0$ is _potentially stochastic_ demand at period $t$
- $s(t)\ge 0$ is the order of new inventory
- $p$ is the profit per one unit of (supplied) good
- $c$ is the fixed cost of ordering any amount of new inventory
- $r$ is the cost of storing one unit of good

The sales in period $t$ are given by $\min\{x(t),d(t)\}$. 
The next period inventory is given by $x(t+1) = \max\{x(t)-d(t),0\} + s(t)$.

The profit in period $t$ is given by

\begin{eqnarray}
\pi(t) & = & p \cdot \text{sales}(t) - r \cdot x(t+1) - c \cdot (\text{order made in period }t) \\
& = & p \min\{x(t),d(t)\} - r \big[ \max\{x(t)-d(t),0\} + s(t) \big] - c \mathbb{1}\{s(t)>0\}
\end{eqnarray}

Assuming all $ s(t) \ge 0$, let $\sigma =  \{s(t)\}_{t=1,\dots}$ denote a feasible inventory policy.
If $d(t)$ is stochastic the policy becomes a function of the period $t$ inventory $x(t)$.

The expected profit maximizing problem is given by

$$
{\max}_{\sigma} \mathbb{E}\Big[ \sum_{t=0}^{\infty} \beta^t \pi(t) \Big],
$$

where $\beta$ is discount factor.

### 1. Write the Bellman equation for the problem

### 2. Finite horizon deterministic case

Let $T<\infty$. Further, let $d=25$ be known in every period. How does the Bellman equation change?

Write a backwards induction solver for the problem.
Use the following parameters: $(c,p,r,\beta)=(15,2.5,0.2,0.975)$.

In [20]:
import numpy as np
import matplotlib.pyplot as plt

class inventory_model:
    '''Class to hold the fundamentals of the inventory model'''
    
    def __init__(self,max_inventory=10):
        '''Create and initialize the parameters of the model'''
        # parameters of the problem
        self.c = 1.5
        self.p = 2.5
        self.r = 0.2
        self.beta = 0.975
        self.fixed_demand = 4
        # grids and spaces
        self.n = max_inventory +1
        self.x = np.arange(self.n)
        self.value = np.zeros(self.n)
        self.policy = np.zeros(self.n)
    
    def __repr__(self):
        '''String representation'''
        return 'Inventory model with parameters:\nc=%1.3f p=%1.3f r=%1.3f beta=%1.3f\n' \
               %(self.c,self.p,self.r,self.beta)
    
    def sales(self,x,d):
        '''Sales in given period'''
        return np.minimum(x,d)
    
    def next_x(self,x,d,s):
        '''Next period inventory'''
        return np.maximum(x-d,0)+s
        
    def profit(self,x,d,s):
        '''Profit in given period'''
        return self.p*self.sales(x,d) \
             - self.r*self.next_x(x,d,s) \
             - self.c*(s>0)

m = inventory_model()
print(m)

print('Current inventory\n',m.x)
print('Current sales\n',m.sales(m.x,m.fixed_demand))
s = np.arange(m.n).reshape((m.n,1))
print('Orders\n',s)
print('Next period inventory\n',m.next_x(m.x,m.fixed_demand,s))
print('Current profits\n',m.profit(m.x,m.fixed_demand,s))


Inventory model with parameters:
c=1.500 p=2.500 r=0.200 beta=0.975

Current inventory
 [ 0  1  2  3  4  5  6  7  8  9 10]
Current sales
 [0 1 2 3 4 4 4 4 4 4 4]
Orders
 [[ 0]
 [ 1]
 [ 2]
 [ 3]
 [ 4]
 [ 5]
 [ 6]
 [ 7]
 [ 8]
 [ 9]
 [10]]
Next period inventory
 [[ 0  0  0  0  0  1  2  3  4  5  6]
 [ 1  1  1  1  1  2  3  4  5  6  7]
 [ 2  2  2  2  2  3  4  5  6  7  8]
 [ 3  3  3  3  3  4  5  6  7  8  9]
 [ 4  4  4  4  4  5  6  7  8  9 10]
 [ 5  5  5  5  5  6  7  8  9 10 11]
 [ 6  6  6  6  6  7  8  9 10 11 12]
 [ 7  7  7  7  7  8  9 10 11 12 13]
 [ 8  8  8  8  8  9 10 11 12 13 14]
 [ 9  9  9  9  9 10 11 12 13 14 15]
 [10 10 10 10 10 11 12 13 14 15 16]]
Current profits
 [[ 0.   2.5  5.   7.5 10.   9.8  9.6  9.4  9.2  9.   8.8]
 [-1.7  0.8  3.3  5.8  8.3  8.1  7.9  7.7  7.5  7.3  7.1]
 [-1.9  0.6  3.1  5.6  8.1  7.9  7.7  7.5  7.3  7.1  6.9]
 [-2.1  0.4  2.9  5.4  7.9  7.7  7.5  7.3  7.1  6.9  6.7]
 [-2.3  0.2  2.7  5.2  7.7  7.5  7.3  7.1  6.9  6.7  6.5]
 [-2.5  0.   2.5  5.   7.5  7.3  7.1

In [21]:
def bellman1(m,j):
    '''Bellman operator: 
        input model, index of next period
        output value, policy'''
    # column vector of possible choices
    s = np.arange(m.n).reshape((m.n,1))
    # current profits as a matrix
    p = m.profit(m.x,m.fixed_demand,s)
    # next period inventory
    xprime = m.next_x(m.x,m.fixed_demand,s)
    i = np.minimum(xprime,m.n-1)
    vm = p + m.beta*m.value[i,j]
    # maximize across choices
    v1 = np.amax(vm,axis=0) # maximum in every column
    s1 = np.argmax(vm,axis=0) # arg-maximum in every column
    return v1,s1

def solver_finite(m, T=10):
    '''Backwards induction solver for finite horizon case'''
    # initialize and resize policy and value function
    m.value = np.zeros((m.n,T))
    m.policy = np.zeros((m.n,T))
    # main DP loop
    for t in range(T,0,-1):
        j = t-1 # index for time t in value and policy arrays
        if t==T:
            # terminal period
            m.policy[:,j] = np.zeros(m.n)
            m.value[:,j] = m.profit(m.x,m.fixed_demand,np.zeros(m.n))
        else:
            # all other periods
            m.value[:,j], m.policy[:,j] = bellman1(m,j+1)
    return m
    
m = inventory_model()
solver_finite(m, T=10)
print('Value function:\n',m.value)
print('Policy function:\n',m.policy)


Value function:
 [[63.18193349 57.09787755 50.28868743 43.88862858 36.72577244 29.99329766
  22.45840625 15.37625     7.45        0.        ]
 [65.68193349 59.59787755 52.78868743 46.38862858 39.22577244 32.49329766
  24.95840625 17.87625     9.95        2.5       ]
 [68.18193349 62.09787755 55.28868743 48.88862858 41.72577244 34.99329766
  27.45840625 20.37625    12.45        5.        ]
 [70.68193349 64.59787755 57.78868743 51.38862858 44.22577244 37.49329766
  29.95840625 22.87625    14.95        7.5       ]
 [73.18193349 67.09787755 60.28868743 53.88862858 46.72577244 39.99329766
  32.45840625 25.37625    17.45       10.        ]
 [73.18193349 67.09787755 60.28868743 53.88862858 46.72577244 39.99329766
  32.45840625 25.37625    17.45        9.8       ]
 [73.18193349 67.09787755 60.28868743 53.88862858 46.72577244 39.99329766
  32.45840625 25.37625    17.45        9.6       ]
 [73.18193349 67.09787755 60.28868743 53.88862858 46.72577244 39.99329766
  32.45840625 25.37625    17.45   

### 3. Deterministic infinite horizon case

Extend the code for the solver infinite horizon ($T=\infty$) case.

### 4. Stochastic case

Now assume that $d(t)$ is stochastic and follows geometric distribution with the support 
$k \in \{0,1,2,\dots\}$  and corresponding probabilities $P(k)=(1-p)^k p$.
Adjust the solver code to accommodate this case.


### 5. Newton step
Write a new solver that would solve the VFI fixed point problem as an equation using Newton method.

### 6. Policy iteration
Write a new solver that would implement policy iteration algorithm