<a href="https://colab.research.google.com/github/alirezakavianifar/RL-DeltaIoT/blob/main/Samin.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The main novelty of the article "Decision-making Under Uncertainty: Be Aware of Your Priorities" lies in the introduction of the Pri-AwaRE architecture. This architecture uses an extended form of the Multi-Reward Partially Observable Markov Decision Process (MR-POMDP++), integrated into the MAPE-K loop, to support priority-aware decision-making in self-adaptive systems (SASs). The key contributions include:

1. **Priority-aware Decision-Making**: The architecture models and reasons about the priorities of individual non-functional requirements (NFRs) using a vector-valued reward function. This allows the system to re-evaluate and adjust priorities based on new knowledge acquired during runtime.

2. **Autonomous Tuning of Priorities**: The system provides a method for maintaining compliance with requirements by autonomously tuning NFR priorities in response to uncertain environmental contexts.

3. **Experimental Validation**: The approach is validated through experiments in the networking and IoT domains, demonstrating that the Pri-AwaRE architecture leads to better satisfaction of NFRs through more informed priority choices【9:0†source】【9:3†source】.

The article extends the Multi-Reward Partially Observable Markov Decision Process (MR-POMDP) by introducing the MR-POMDP++ framework. The enhancements include:

1. **Vector-valued Reward Function**: Unlike traditional MR-POMDPs, which use a vector to represent rewards for multiple objectives, the MR-POMDP++ incorporates these reward values into the decision-making process more dynamically by modeling the priorities of non-functional requirements (NFRs) at runtime.

2. **Alpha-Matrix Representation**: In MR-POMDP++, each element in the alpha vector is itself a vector, resulting in an alpha matrix. This matrix provides a more detailed representation of the value function, capturing the multi-objective nature of the problem more effectively.

3. **Scalarization Function**: The framework uses a scalarization function to select the best policy among multiple optimal policies based on the different priorities of the objectives. This function combines the value vectors with weights corresponding to the objectives, allowing the system to adapt the priorities of these objectives at runtime using the Optimistic Linear Support (OLS) algorithm.

4. **Runtime Autonomous Tuning**: The architecture enables autonomous tuning of NFR priorities during runtime, ensuring that the system can adapt to changing environmental contexts and maintain compliance with requirements   .

Sure, let's use a simplified example to illustrate the differences between MR-POMDPs and MR-POMDP++.

### Example Scenario
Imagine a self-adaptive Internet of Things (IoT) network in a smart home. The system has two main non-functional requirements (NFRs):
1. **Energy Efficiency (EE)**: The system should minimize energy consumption.
2. **Packet Delivery Ratio (PDR)**: The system should maximize the successful delivery of packets.

The environment is dynamic and can vary between low and high interference conditions, which affect both EE and PDR.

### MR-POMDP
In a traditional MR-POMDP:
1. **State Representation**: The states might represent different levels of interference (e.g., low, medium, high).
2. **Action Set**: Actions could include adjusting the transmission power (e.g., low power, medium power, high power).
3. **Reward Function**: The rewards for each action in each state are represented as a vector. For instance, increasing transmission power might have the reward vector [EE = -1 (high energy consumption), PDR = 2 (high packet delivery)].

Here’s how MR-POMDP would handle it:
- **Decision-Making**: The system uses these vectors to decide the best action by looking at the trade-offs between EE and PDR. It will calculate a policy that maximizes the cumulative reward for both objectives.
- **Static Priorities**: The priorities (weights) for EE and PDR are fixed and determined at design time. For example, it might prioritize PDR slightly higher than EE.

### MR-POMDP++
In MR-POMDP++:
1. **Enhanced State Representation**: Similar to MR-POMDP, but with additional mechanisms to capture runtime information.
2. **Dynamic Reward Adjustment**: The system dynamically adjusts the priorities of EE and PDR based on the current context and historical data.

Here’s how MR-POMDP++ would handle it:
- **Priority-Aware Decision-Making**: The system uses a vector-valued reward function, but it also incorporates a mechanism to adjust the priorities of EE and PDR dynamically. For example, if the system detects that the battery level is low, it might increase the priority of EE.
- **Alpha-Matrix**: The alpha vectors are now matrices where each element is a vector, providing a more nuanced decision-making process that can capture the changing priorities of NFRs.
- **Runtime Autonomous Tuning**: If the interference level changes from low to high, MR-POMDP++ can autonomously adjust the weights given to EE and PDR. For instance, during high interference, it might decide that maintaining a higher PDR is more critical even if it means higher energy consumption.

### Detailed Example
- **State**: `s1` (low interference), `s2` (high interference)
- **Action**: `a1` (low power), `a2` (high power)

#### MR-POMDP:
- **Rewards for `a1` in `s1`**: [EE = 2, PDR = 1]
- **Rewards for `a2` in `s1`**: [EE = -1, PDR = 3]
- **Fixed Priority**: Priorities might be set as [EE: 0.4, PDR: 0.6]

**Policy Decision**: Always choose `a2` (high power) because it maximizes the combined reward considering fixed priorities.

#### MR-POMDP++:
- **Rewards for `a1` in `s1`**: [EE = 2, PDR = 1]
- **Rewards for `a2` in `s1`**: [EE = -1, PDR = 3]
- **Dynamic Priority Adjustment**: If the battery level is low, the system adjusts the priorities dynamically to [EE: 0.7, PDR: 0.3]

**Policy Decision**: Depending on the battery level:
- **Normal Battery**: Choose `a2` (high power) to maximize PDR.
- **Low Battery**: Choose `a1` (low power) to conserve energy.

### Key Differences
- **MR-POMDP**: Uses fixed priorities and combines rewards into a single scalar value for decision-making.
- **MR-POMDP++**: Allows for dynamic adjustment of priorities based on runtime conditions and uses a more sophisticated alpha-matrix to represent the value functions for decision-making, supporting autonomous adaptation to changing environments.

This enhanced capability allows MR-POMDP++ to better handle uncertainty and variability in the environment, providing more robust and context-aware decision-making for self-adaptive systems.

An alpha matrix in the context of the MR-POMDP++ framework is an advanced representation of the value function used for decision-making under uncertainty with multiple objectives. This matrix extends the traditional alpha vectors used in POMDPs to handle multiple reward components and dynamically adjust priorities during runtime.

### Traditional Alpha Vectors
In a standard POMDP, the value function for a policy is represented by a set of alpha vectors. Each alpha vector corresponds to a particular belief state and is used to estimate the expected reward for that state.

### Alpha Matrix in MR-POMDP++
In MR-POMDP++, each element of the alpha vector is itself a vector, resulting in an alpha matrix. This structure allows the system to account for multiple objectives and dynamically adjust their priorities based on the current context.

### Example Scenario
Let's consider a simplified example with the following components:

- **States**: \(s1\), \(s2\) (e.g., low and high interference levels in a network).
- **Actions**: \(a1\), \(a2\) (e.g., low power and high power transmission).
- **Objectives**: EE (Energy Efficiency) and PDR (Packet Delivery Ratio).

#### Traditional MR-POMDP Alpha Vectors
In a traditional MR-POMDP, you might have reward vectors for each action in each state, such as:
- For \(a1\) in \(s1\): \([EE = 2, PDR = 1]\)
- For \(a2\) in \(s1\): \([EE = -1, PDR = 3]\)

The alpha vector for a belief state could look like:
\[ \alpha_{s1} = [2, -1] \]

#### MR-POMDP++ Alpha Matrix
In MR-POMDP++, the alpha matrix extends this by incorporating priority-aware adjustments. Suppose we have a priority vector \([w_{EE}, w_{PDR}]\) that adjusts based on runtime conditions.

1. **Alpha Vector for State \(s1\) with Action \(a1\)**:
\[ \alpha_{s1, a1} =
\begin{bmatrix}
2 \\
1
\end{bmatrix}
\]

2. **Alpha Vector for State \(s1\) with Action \(a2\)**:
\[ \alpha_{s1, a2} =
\begin{bmatrix}
-1 \\
3
\end{bmatrix}
\]

These vectors form part of the alpha matrix for state \(s1\):
\[ \alpha_{s1} =
\begin{bmatrix}
2 & -1 \\
1 & 3
\end{bmatrix}
\]

### Dynamic Adjustment
If the system detects a low battery, it might adjust the priority vector to \([w_{EE} = 0.7, w_{PDR} = 0.3]\). The scalarization function combines these weights with the alpha matrix to determine the best action:

For state \(s1\):
\[ \alpha_{s1} \cdot \text{priority vector} =
\begin{bmatrix}
2 & -1 \\
1 & 3
\end{bmatrix}
\cdot
\begin{bmatrix}
0.7 \\
0.3
\end{bmatrix}
=
\begin{bmatrix}
(2 \cdot 0.7) + (-1 \cdot 0.3) \\
(1 \cdot 0.7) + (3 \cdot 0.3)
\end{bmatrix}
=
\begin{bmatrix}
1.4 - 0.3 \\
0.7 + 0.9
\end{bmatrix}
=
\begin{bmatrix}
1.1 \\
1.6
\end{bmatrix}
\]

Here, the scalarized values would be used to select the optimal action considering the adjusted priorities. The system dynamically adapts the decision based on current priorities, offering more context-aware and effective decision-making compared to the static approach of traditional MR-POMDPs.

### Summary
The alpha matrix in MR-POMDP++ allows for:
- **Multi-objective Handling**: Managing multiple objectives simultaneously with dynamic adjustments.
- **Priority-Aware Decision-Making**: Adjusting the importance of different objectives based on runtime conditions.
- **Enhanced Value Representation**: Providing a more detailed and flexible representation of the value function, improving decision-making under uncertainty.