<a href="https://colab.research.google.com/github/alirezakavianifar/gitTutorial/blob/developer/DeepRlHospital.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1.1 Environment
The environment is the proposed healthcare network, consisting of multiple healthcare units (hospitals) with varying demands based on patient severity (severe, moderate, mild). This environment includes the dynamics of supply and demand, the logistics of transportation (including costs and distances), inventory levels at each hospital, and the capacity for transshipment between hospitals. Key environmental factors include:

1. The model applies to multiple products and services across all healthcare units based on the type of patient.
2. Prohibited backorders, leading to immediate demand loss if not met.
3. Hospitals facing stockouts or bed shortage do not admit new patients unless inventory replenishment occurs between periods or at the onset of the next order cycle.
4. Coverage distances determine the feasibility of transferring products between hospitals.
5. These demands are non-stationary, meaning they can change unpredictably over time.
6. A healthcare unit is unable to both receive and dispatch the same products from/to other healthcare units within the same period.
7. The mode of transportation is available.

Sets and Indices are:

\[
\begin{align*}
H & \text{Set of hospitals } h; h, h' = \{1, \ldots, H\} \\
R & \text{Set of suppliers } r; r = \{1, \ldots, R\} \\
T & \text{Set of time } t; t = \{1, \ldots, T\} \\
P & \text{Set of product } p; p = \{1, \ldots, P\} \\
S & \text{Set of disruption scenarios, } s = \{1, \ldots, S\}
\end{align*}
\]

# 1.2 States (\(S_t\))
The acronyms for this part are:

\[
\begin{align*}
Inv_{tsp^h} & \text{Inventory amount of the product type } p \text{ in hospital } h \text{ in period } t \\
D_{p^h}^{t+K} & \text{Predicted demand product type } p \text{ in hospital } h \text{ for period } t+K. \text{ K is the prediction horizon.} \\
C_{a^h} & \text{Maximum capacity of hospital } h \\
\beta_{p^h} & \text{Available capacity portion of product type } p \text{ in hospital } h \text{ (might be changed because of the disruption)} \\
LeadTime & \text{The latency between the time of order and time of delivery}
\end{align*}
\]

Accordingly, the state \( S_t \) for each hospital \( h \) at time \( t \) could include:

- Current inventory levels of each product type \( p \) in hospital \( h \): \( Inv_{tsp^h} \).
- Demand for period \( t \) and forecasted demand for each product type \( p \) in hospital \( h \) in period: \( D_{p^h}^{t+K} \).
- The status of supply capacity (potentially impacted by disruptions), represented as a portion of available capacity: \( C_{a^s,h} \times \beta_{p^h} \). (In our first model, assume there is no degradation)
- Information on the current lead time for each supplier \( r \): \( LeadTime \).

# 1.3 Actions (\(A_t\))
Actions within this environment vary by the strategy employed:

## Collaborative (DHRA):
a) **Ordering**: Decide the quantity of each product to order from the retailer ($\( A_{p,rh}^{t+K+s} \)$) based on estimated demand and supply.
   - How much of each product to order from the retailer based on supply and demand estimates.

b) **Transshipment Decision**: Decide whether to transfer excess inventory of a product to another hospital facing a shortage (\$( AA_{p,hh'}^{tsp} \)$).

c) **Requesting Transshipment**: Request inventory from other hospitals when facing a shortage ($\( AA_{p,h'}^{tsp} \$)).

## Centralized:
a) **Central Ordering and Allocation**: A single agent (decision-maker) determines the quantities of products to order for the entire network and allocates these resources accordingly (\$( A_{p,rh}^{t+K+s}, A_{p,r'h'}^{t+K+s} \$)).

- **Decentralized**:

a) **Individual Ordering**: Each hospital decides independently on the quantity of products to order from the retailer, with no transshipment between hospitals (\( A_{p,rh}^{t+K+s} \)).

where:

\[
\begin{align*}
A_{p,rh}^{ts} & \text{Portion of the product type } p \text{ that is provided through supplier } r \text{ for hospital } h \text{ in period } t \\
AA_{p,hh'}^{ts} & \text{Portion of the product type } p \text{ that is provided through hospital } h \text{ for hospital } h' \text{ in period } t
\end{align*}
\]

# 1.4 Reward (\( R(S_t, A_t) \))
The reward function is multi-objective, aiming to balance the trade-off between reducing costs (including transportation, inventory (holding), and ordering costs) and minimizing shortage and delivery time. Here are the definitions of the required acronyms:

**Parameter:**
\[
\begin{align*}
tr_{prh} & \text{Cost of transporting the products type } p \text{ from } r \text{ to } h \\
th_{h'phh'} & \text{Cost of transporting the products type } p \text{ from } h \text{ to } h' \\
Cl_{p^h} & \text{Inventory cost of product type } p \text{ in hospital } h
\end{align*}
\]

**Variables:**
\[
\begin{align*}
sh_{tsp^h} & \text{Shortage of the product type } p \text{ in hospital } h \text{ in period } t
\end{align*}
\]

The specific reward components include:

- **Cost Reduction**: Lowering logistics costs across the network, with variations in cost assignment based on the strategy (lateral transshipment costs in DHRA, centralized cost assignment in the centralized strategy, and individual costs in the decentralized strategy). The logistics costs include transportation costs (\( tr_{prh}, th_{h'phh'} \)) and holding costs (\( Cl_{p^h} \)).

- **Service Level Maximization**: Improving demand service levels by reducing the maximum shortages (\( sh_{tsp^h} \)).

Note: The weighting between the rewards for individual agents and the overall system reward is a crucial aspect, especially in the collaborative strategy where lateral transshipment costs are considered.

## Transitions
The state transitions are governed by the dynamics of supply and demand. When an order is placed, the inventory levels change based on the lead time and the received supplies. Demand impacts the inventory as it is fulfilled or unfulfilled.

**Related Parameters:**
\[
\begin{align*}
\epsilon_p & \text{Not usable product type } p \\
\pi_s & \text{Probability of disruptions scenarios} \\
IRS_{tsp^r} & \text{Inventory amount of product type } p \text{ in supplier } r \text{ in period } t \\
ORP_{p,h} & \text{Ordering cost of product } p \text{ for hospital } h
\end{align*}
\]

**Decision Variables:**
\[
\begin{align*}
y_{s,t,h} & \text{Binary variable} \\
ORD_{p,rh}^{t,s} & \text{Binary variable; If hospital } h \text{ orders product } p \text{ from supplier } r \text{ in period } t
\end{align*}
\]

## Objective Function 1: Logistics Costs
The objective function (1) seeks to minimize the overall cost of the proposed network. This includes the cost of transporting resources from the suppliers to hospitals, transshipment costs between the hospitals, ordering cost, and inventory costs.

\[
\begin{align*}
\min Z_1 &= \sum_{prht} tr_{prh} \times A_{p,rh}^{ts} + \sum_{hh't} th_{hphh'} \times AA_{p,hh'}^{ts} + \sum_{prht} ORD_{p,rh}^{t,s} \times ORP_{p,r,h} \\
&+ \sum_{prht} Inv_{ts,p,h} \times Cl_{p,h}
\end{align*}
\]

## Objective Function 2: Fairness/Equity
Optimizes the service level by minimizing the maximum shortage level, ensuring a balanced distribution of deliveries, and minimizing shortages across hospitals.

\$[
\min Z_2 = \max_h \sum_{pt} sh_{ts,p,h}
\$]

## Constraint 3: Capacity constraint in hospital \( h \)
The amount of supply that hospital \( h \) can receive from the supplier is equal or smaller than the maximum available capacity at hospital \( h \).

\[
Inv_{ts,p,h} \leq C_{a,s,h} \times \beta_{p,h}
\]

## Constraints 4, 5, 6: Supplier balance
The equation balance for supplier \( r \) in \( t-LeadTime \) is equal to the amount of the product sends to all hospitals in time \( t \). \( LeadTime \) is the time that supplier decision-makers need to send the product to hospitals (can be considered 0). Equation 6 ensures that the product transfers to the hospital if it was ordered in period \( t-LeadTime \).

\[
\begin{align*}
IRS_{ts,p^r}^{t-LeadTime} &\geq \sum_h A_{p,rh}^{ts} \times (1 + \epsilon_p) \\
IRS_{ts,p^r}^{t-LeadTime} &\leq \sum_h A_{p,rh}^{ts} \times (1 + \epsilon_p) \\
A_{p,rh}^{ts} &\leq ORD_{p,rh}^{t-leadtime,s} \times M
\end{align*}
\]

## Constraints 7 and 8: Hospital inventory balance
The inventory of the product type \( p \) in each hospital in period \( t \) is calculated as the sum of the inventory in period \( t-1 \), plus the shortage, plus the products sent by the suppliers after \( LeadTime \), minus the portion of demand to be fulfilled within the hospital, minus the part that will be transferred to another hospital.

\[
\begin{align*}
Inv_{ts,p,h} &= Inv_{t-1,p,h} + sh_{ts,p,h} + A_{p,rh}^{ts} - D_{p,h}^{t} - AA_{p,hh'}^{ts} \\
AA_{p,hh'}^{ts} &\leq M \times y_{s,t,h}
\end{align*}
\]

## Constraint 8: Coverage distance

\[
y_{s,t,h} \times dis_{hh'} \leq COV
\]