# RO47019: Intelligent Control Systems Practical Assignment
* Period: 2024-2025, Q4
* Course homepage: https://brightspace.tudelft.nl/d2l/home/682445
* Instructor: Cosimo Della Santina (C.DellaSantina@tudelft.nl)
* Teaching assistant: Niels Stienen (N.L.Stienen@student.tudelft.nl)
* (c) TU Delft, 2025

Make sure you fill in any place that says `YOUR CODE HERE` or `YOUR ANSWER HERE` and remove `raise NotImplementedError()` afterwards. Moreover, if you see an empty cell, please **do not** delete it, instead run that cell as you would run all other cells. Finally, please **do not** add any extra cells to this notebook or change the existing cells unless you are explicitly asked to do so.

Please fill in your name(s) and other required details below:

In [1]:
# Please fill in your names, student numbers, netID, and emails below.
STUDENT_1_NAME = "Nicola Visentin"
STUDENT_1_STUDENT_NUMBER = "6354815"
STUDENT_1_NETID = "nvisentin"
STUDENT_1_EMAIL = "N.Visentin@student.tudelft.nl"

In [2]:
# Note: this block is a check that you have filled in the above information.
# It will throw an AssertionError until all fields are filled
assert STUDENT_1_NAME != ""
assert STUDENT_1_STUDENT_NUMBER != ""
assert STUDENT_1_NETID != ""
assert STUDENT_1_EMAIL != ""

### General announcements

* Do *not* share your solutions (also after the course is finished), and do *not* copy solutions from others. By submitting your solutions, you claim that you alone are responsible for this code.

* Please post your questions regarding this assignment in the correct support forum on Brightspace, this way everybody can benefit from the response. Please note that it is **not** allowed to post any code relating to solution attempts. If you do have a particular question that you want to ask directly, please use the scheduled Q&A hours to ask the TA or if not possible otherwise, send an email to the instructor or TA.

* This notebook will have in various places a line that throws a `NotImplementedError` exception. These are locations where the assignment requires you to adapt the code! These lines are just there as a reminder for you that you have not yet adapted that particular piece of code, especially when you execute all the cells. Once your solution code replaced these lines, it should accordingly *not* throw any exceptions anymore.

* This [Jupyter notebook](https://jupyter.org/) uses `nbgrader` to help us with automated tests. `nbgrader` will make various cells in this notebook "uneditable" or "unremovable" and gives them a special id in the cell metadata. This way, when we run our checks, the system will check the existence of the cell ids and verify the number of points and which checks must be run. While there are ways that you can edit the metadata and work around the restrictions to delete or modify these special cells, you should not do that since then our nbgrader backend will not be able to parse your notebook and give you points for the assignment. 

* Please note that the above mentioned _read-only_ protection only works in Jupyter Notebook, and it does not work if you open this notebook in another editor (e.g., VSCode, PyCharm, etc.). Therefore, we recommend that you only use Jupyter Notebook for this course. If you use any other editor, you may accidentally delete cells, modify the tests, etc., which would cause you to lose points.

* If you edit a function that is imported in another notebook, you need to **restart the kernel** of the notebook where you are using the function. Otherwise, the changes will not be effective.

* **IMPORTANT**: Please make sure that your code executes without any errors before submitting the notebook. An easy way to ensure this is to use the validation script as described in the README.

# Task 3h - Open questions (13p)
**Authors:** Lorenzo Lyons (L.Lyons@tudelft.nl)

In this last part of Problem 3, we will try to put together the insights we have learned from the previous tasks and answer some questions on how GPs can be used in Robotics. 

*Note:* To answer these questions, you are encouraged to go back to the previous tasks to put your theories to the test by temporarily modifying the code. (The final version should still answer the questions of the previous questions) 

## Task 3h-1 Behavioural cloning torques with also angular velocities (3p)

In task 3f we only used the configuration of the robot as an input to the GP model. By adding the variance minimization term, we obtained a good performance when starting from the same initial condition as the training dataset. 
How would the system behave if the initial condition was perturbed slightly (Feel free to try this out in the code)? 
Would it be possible to also provide the angular velocity as an input to the GP model? How do you expect the behavior of the controller to change?

In [3]:
# If the initial condition is different, we don't expect the GP controller to be able to track the reference anymore. This is due to the
# fact that, as we already said, it is a "behavioural cloning" control, meaning that the GP is just learning to copy what another controller
# (PD in this case) did in that *specific* situation. If this "situation" changes (e.g. different initial state), then the regulator does 
# not work properly. 
# Let's think it this way: the PD controller knows at each instant what the current angles, angular speeds are and what the reference angles,
# angular speeds should be, and generateds a corresponding torque trying to bring the error to zero. This means that it will behave possibly
# in a different way even if the current configuration is the same of some seconds ago. On the other side, the GP-based control, will always
# behave the same in the same configuration, because he was taught to do so: he only know "where it is", and not where the reference is or 
# which speed the links are currently moving at. Not to mention the fact that, when the configuration is far from the ones seen during
# training, the GP simply gives null output.
# By adding the variance-repelling term, we are also exploiting the information of how "sure" the prediction is, and at least we are able to
# keep the robot along the reference path. However, this does not mean that we can properly follow the reference at the correct pace, simply
# because the GP does not know where the reference is. 
# This remains true even in case we train the GP using both the angles and the angular velocities: now the controller would work both knowing
# the angles and the angular speeds (more similarly to the "original" PD), but still we miss the information about the reference. We can
# expect a better response, in general, but we also need more data, making the training more intensive and having a bigger "input space", so
# it's also easier for the GP to "get lost" in a "high variance" zone.

## Task 3h-2 Reflections on variance minimization (5p)

In tasks 3f and 3g, we noticed how adding a variance minimization term significantly increased the overall performance of the controller. 

What is the meaning of the negative variance gradient? Why did the relatively simple additional variance minimization terms work well for the robotic arm? Would this strategy apply to any dynamical system, such as an autonomous vehicle?

In [4]:
# As already explained in the previous tasks, the role of the variance-repelling contribution is to keep the configuration within a "safe"
# area where the GP knows what to do. Since the GP was trained over a set of configurations that allow the end effector to lay on the 
# reference path, this means that this contribution "pushes" the robot towards these configurations. This helps avoiding inaccurate 
# predictions (eventually falling back to the zero a priori mean), but does not assures that the end effector correctly follows the moving 
# reference. This is clear in Task 3g: having a proper variance-repellent gain k_var allows to "play" with Kp and Kd gains making the end 
# effector move faster or slower on the ellipse. Moreover, if this contribution is excessive, then oscillations and aventually instability 
# arise.
# Further considerations on this contribution can be found in the previous notebooks and are not reported here. However, we can imagine that
# such a "correcting" strategy can be effective in this simple scenario, where we have a simple system that performs a repetitive and 
# "periodic" task of following an ellipse. In more complex scenarios, such as an autonomous vehicle for example, where there are a lot 
# states, uncertainties, possible scenarios, perturbations, etc, this strategy is not so efficient, or even not applicable. Pushing the 
# system to behave in a "known" way, in fact, is not what we want when we face an unexpected situation.

## Task 3h-3 Safety properties comparison (5p)

Imagine the robot is performing the same trajectory following the task as we have seen in the previous tasks when an unfortunate human happens to collide with the robot. Imagine the particular safety-critical scenario where the human is unable to remove himself/herself/themselves from the collision, i.e., the human is stuck against the robot. 

What behavior do you expect from the PD + gravity compensation, the torque cloning in configuration space, and the reference trajectory behavioral cloning? 

If the human instead managed to push the robot away from its intended trajectory, what would be the different behavior among the three controllers? Which one would be the safest?


In [5]:
# Let's imagine this unfortunate human to collide with the robotic arm. Of course, this would cause a perturbation on the system, and if 
# the guy gets "stuck" on the machine, this would also change the "parameters" of the system. In this specific case, the PD controller would
# still try to follow the reference, bringing the human with him for a nice ride (not safe) and potentially generating very high torques for
# recovering the initial perturbation. The other controllers, instead, whould react differently: since they are designed to "copy" a certain
# nominal behaviour as a function of the position in the configuration space, if the collision is not severe enough to "shift" the robot 
# out of its current configuration, we can imagine that they would simply continue to generate a torque according to that configuration,
# like nothing happened. Of course, the sudden change in the speed and also the different mass/inertia due to the "hanging" human would cause
# the system to be different with respect to the one "used" during the training of the GP, thus the controller would soon diverge, most 
# likely going to an "high variance" area where it does not know what to to, so it would "turn off" (zero feedback torque).
# On the other side, if the guy manages to push the robot away from the intended trajectory (and assuming he runs away immediately after),
# the situation would be slightly different. We expect the PD controller to still go back to the moving "virtual" reference, recovering the 
# original tracking task and potentially being safer (if it is robust enough to perturbations and if it does not get unstable or oscillates
# too much). The other two controllers, instead, would go to a area which is "unknown" to the GP, generating unreliable predictions and 
# causing an unexpected, unpredictable behaviour. We also need to consider that the reference trajectory behavioral cloning GP-based 
# controller in Task 3g is implementing a damping contribution that helps limiting the speeds and the torques of the robotic arm. Finally, 
# we need to take into account that the GP based controllers also know when they are operating in an "unsafe" area, and so we may exploit that
# information to implement some kind of safety mechanism.
# In general, in this kind of applications, safety is one of the main concerns and comes with a lot of regulations. The interaction with 
# humans is a critical factor to consider, and in such scenarios, as the ones descibed above, the controllers would most realistically be 
# deactivated and switched to an "emergency mode", for example blocking the machine.