# Inventory Control with Lead Times and Multiple Suppliers

## Description

One potential application of reinforcement learning involves ordering supplies with mutliple suppliers having various lead times and costs in order to meet a changing demand.  Lead time in inventory management is the lapse in time between when an order is placed to replenish inventory and when the order is received.  This affects the amount of stock a supplier needs to hold at any point in time.  Moreover, due to having multiple suppliers, at every stage the supplier is faced with a decision on how much to order from each supplier, noting that more costly suppliers might have to be used to replenish the inventory from a shorter lead time.

The inventory control model addresses this by modeling an environment where there are multiplie suppliers with different costs and lead times. Orders must be placed with these suppliers to have an on-hand inventory to meet a changing demand. However, both having supplies on backorder and holding unused inventory have associated costs. The goal of the agent is to choose the amount to order from each supplier to maximize the revenue earned. 

At each time step, an order is placed to each supplier. If previous orders have waited for the length of their supplier's lead time, then these orders will become part of the on-hand inventory. The demand is then randomly chosen from a user-selected distribution and is subtracted from the on-hand inventory. If the on-hand inventory would become less than zero, than items are considered to be on backorder which decreases the reward. However, the on-hand inventory will be zero for the next time step. Otherwise, the demand is subtracted from the on-hand inventory to calculate on-hand inventory for the start of the next time step. If there is remaining inventory (a nonzero number) at the end of this calculation, this number is being held, and negatively influences the reward proportional to the holding costs. 

## Model Assumptions
* Inventory can not be negative. If the inventory would become negative after subtracting demand, it becomes instead zero.
* Backorders are not retroactively fulfilled. If a high demand would cause inventory to become negative, this unfulfilled demand is not met later when there may be some inventory being held at the end of a timestep.

## Dynamics
### State Space
The state space is $S = \mathbb{Z}^{L_1}_+ \times \mathbb{Z}^{L_2}_+ \times ... \times \mathbb{Z}^{L_N}_+ \times I$ where $N$ is the number of suppliers and $\mathbb{Z}^{L_i}_+$ represents a list of nonnegative integers with the length of the lead time of supplier $i$. This represents how many timesteps back each order is from being added to the inventory. $I$ represents the current on-hand inventory.  To represent a timestep, an order will be moved up an index in the array unless it is added to the inventory, in which case it is removed from the array. Each supplier has their own set of indices in the array that represent its lead times. 

The last index, the on-hand inventory, is offset by adding the maximum inventory to it. This is done so that a negative value of the on-hand inventory can be temporarily kept to use in reward calculations for backorders. 

### Action Space
The action space is $A = \mathbb{Z}^N_+$ where N is the number of suppliers. This represents the amount to order from each supplier for the current timestep.

### Reward
The reward is $R = - (T + h * max(0,I) + b * max(0, -I))$ where $T = \sum_{i = 1}^{N} c_i * a_i$ and represents the sum of the amount most recently ordered from each supplier, $a_i$,  multiplied by the appropriate ordering cost, $c_i$. $h$ represents the holding cost for excess inventory, and $b$ represents the backorder cost for when the inventory would become negative.

## Environment

## Heuristic Agents