# RL25 Exam Template
This task is designed to evaluate your understanding of RL principles, your ability to implement algorithms, and your skill in analyzing and presenting results.
Please **read the instructions carefully** and make sure to follow each step in your submission.
## ------------------------------------

## Please fill the notebook with your code. This notebook should be clean, well-documented,
## and executable from start to finish without errors.
## ------------------------------------

### Student Name:
### Student ID:
### Date:

---

## Tips & reminders

- Choose an algorithm appropriate for the task.
- Focus on clear, readable code and meaningful comments.
- If needed, use environment wrappers, logging, and seeding wisely.
- Save and test your agent after training to ensure reproducibility.

Good luck!

---

# Environment: inverted pendulum

The environment we will be dealing with in this task is the inverted pendulum from (https://gymnasium.farama.org/), shown in the GIF below.

![](https://miro.medium.com/max/1000/1*TNo3x9zDi1lVOH_3ncG7Aw.gif)

To implement this environment, you should make use of the gymnasium library. Please install the gymnasium library within your preferred Python environment using:

```pip install gymnasium```

Note that the episodes in this environment end with a truncation and not a termination. Assume that for the environment truncation and termination is the same. That is wherever, we are looking for a terminal state we instead look for the state where the episode is truncated! Generally, termination and truncation are different things. This is the case because the timelimit is not actually part of the MDP but rather a constraint set from the outside. More on this [here](https://gymnasium.farama.org/tutorials/gymnasium_basics/handling_time_limits/).

Import the packages required for your implementation

In [None]:
# YOUR CODE HERE

Check if the installation and import work by simulating the environment with randomly sampled actions. A window with an animation of the pendulum should open, display some random actions, and close automatically.

In [None]:
# YOUR CODE HERE

The inverted pendulum comes with continuous action and state spaces.
The pendulum has three state variables relating to the momentary angular position $\theta$ and the angular velocity $\frac{\text{d}}{\text{d}t}\theta$:
$$
\begin{align*}
    x=\begin{bmatrix}
    \text{cos}(\theta)\\
    \text{sin}(\theta)\\
    \frac{\text{d}}{\text{d}t}\theta
    \end{bmatrix}
    \in
    \begin{bmatrix}
    [-1, 1]\\
    [-1, 1]\\
    [-8 \, \frac{1}{\text{s}}, 8 \, \frac{1}{\text{s}}]
    \end{bmatrix},
\end{align*}
$$
and one input variable which relates to the torque applied at the axis of rotation:

$$
\begin{align*}
    u = T \in [-2 \, \text{N}\cdot\text{m}, 2 \, \text{N}\cdot\text{m}].
\end{align*}
$$

The goal of this environment is to bring the pendulum into the upper neutral position, where the angle $\theta = 0$ and the angular velocitiy $\frac{\text{d}}{\text{d}t}\theta=\omega=0$. The environment's reward function is already designed to represent this objective:

\begin{align*}
r = -(\theta^2 + 0.1  \cdot \omega^2 + 0.001 \cdot T_\mathrm{u}^2).
\end{align*}

For further information about the environment kindly refer to the code and documentation of Farama Foundation's `gymnasium`:

[Documentation of the gymnasium pendulum](https://gymnasium.farama.org/environments/classic_control/pendulum/)

[Pendulum environment in the gymnasium Github repository](https://github.com/Farama-Foundation/Gymnasium/blob/main/gymnasium/envs/classic_control/pendulum.py)

# Choose and implement a RL algorithm that is suitable for the given environment.
<font color="red">Note: Unlike in exercise 06, your agent should learn from interaction with the environment without discretization, i.e., must be naturally capable of handling continuous states and actions.</font>

In [None]:
# YOUR CODE HERE

Plot the learning curve of the training process. Clearly label axes (e.g., episode vs. total reward) and make it readable.

In [None]:
# YOUR CODE HERE

## Evaluation
Test the learned policy considering a statistical significant number of episodes and report on the performance using representative metrics.

In [None]:
# YOUR CODE HERE

# Save the learned policy to hard disk.

In [None]:
# YOUR CODE HERE

# Load the saved policy
Make sure that the policy can be loaded and evaluated 
successfully.

In [None]:
# YOUR CODE HERE

## What you need to submit

Your final submission must include the following **components**:

1. **Saved Agent**  
   - Save your trained agent's policy/model weights.  
   - Ensure the saved model can be loaded and evaluated again.

2. **Jupyter Notebook (`.ipynb`)**  
   - Submit a clean and well-documented notebook that contains:  
     - The code for training the agent  
     - Explanations of the implementation steps  
     - Hyperparameters and design choices  
     - Optional: visualizations or debugging outputs  
   - Make sure the notebook can be rerun from top to bottom without errors.

3. **Evaluation Video** (optional)  
   - Record a short video (30–90 seconds) showing the trained agent performing the task.  
   - You can use screen capture tools or any screen recorder of your choice. 

## ------------------------------------
## Submission format

Please submit a ZIP folder or a GitHub repository link that includes:

```
/YourProjectFolder
│
├── agent/                 # Saved agent file(s)
├── notebook.ipynb         # Your Jupyter notebook
├── evaluation_video.mp4   # (Optional) Short video of agent in action
```
## ------------------------------------

## Exam session

Each exam appointment will start with a **presentation and discussion session** on your individual submission. Be prepared to:

- **Explain your algorithm choice**: Why did you choose this specific RL algorithm?
- **Describe the implementation steps**: Walk through the key parts of your code.
- **Discuss your results**: Show the learning curve, explain the agent's behavior (based on useful quantitative metrics), and reflect on challenges or improvements.

A simple slide deck (e.g., based on LaTeX beamer class or similar) is recommended for clarity during your presentation.

An exam appointment will take around 45 minutes in total. After the first part of the exam, your presentation and the discussion of your submission, a second part with general questions to the course content will follow. For the latter, it is expected that main reinforcement learning concepts including key assumptions and limitations can be explained. During the general exam talk, students are highly encouraged to highlight their explanations using key equations, diagram sketches as well as pseudocode summaries. 