# Data Science - Final Assignment

The goal of the assignment is to implement two controllers for the _Pendulum_ environment of the _Gym_ library: a model-based controller and a model-free, neural network-based controller. Please **read carefully** the [documentation](https://gymnasium.farama.org/environments/classic_control/pendulum/) of environment before starting (focus on the state variables and controls). 

The solution **must** be provided as a _Jupyter_ notebook, with all the cells evaluated. Use comments in the code and/or _Markdown_ cells to clarify some particular choices you took while solving the assignment (to overcome some issues or, for instance, the choice of hyperparameters to tune).

For this assignment you will need to use some Python libraries. Here is a list of (potentially) useful imports:

In [1]:
import numpy as np
import gymnasium as gym
import matplotlib.pyplot as plt
import pygmo as pg
import torch
from torch import nn
from skorch import NeuralNetRegressor
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV, train_test_split

## Part 1: Model Predictive Control

Implement a model-based controller that uses the Model Predictive Control (MPC) theory (see the slides in the _Genetic Algorithm_ set in the repo of the course) to **stabilize the pendulum in its upright position** ($\theta = 0$, $\omega = 0$). Please set the **gravity equal to 9.81** using the `g` argument of the `Pendulum-v1` environment. In general, you should follow these steps:

1. Define the _cost_ function associated to the MPC: it must include the cost associated to the difference between the state of the system and the setpoints for the angle and the angular velocity. For the prediction of the future states, create another _Pendulum_ environment called `env_mpc` (separate from the one `main_env` that the controller is interacting with) and every time the cost function is evaluated initialize its state with the current state of the main environment using the following instruction: `env_mpc.unwrapped.state = main_env.unwrapped.state`. The main environment `main_env` should be passed as a parameter to the cost function.
2. Define a function to play a "game" using a controller chosen by the user among the following: 1) MPC; 2) random; 3) Neural Network (see Part 2). The _initial conditions_ (angle and angular velocity) should be passed as parameters to this function and set appropriately at the beginning of the game using the `env.unwrapped.state` variable. For the MPC controller, at each time step an optimization problem must be solved that minimizes the cost function defined in Step 1 with respect to the control sequence (over the control horizon). The appropriate action should then be taken. Remember that the control variable has some bounds... The function should store and return the lists of _observations_ and corresponding _controls_ and the _total score_ associated to the game.
3. Play _a few_ games with _random_ initial conditions (angle between -20 and +20 degrees and angular velocity between -0.1 and 0.1 rad/s) and compute the _average total score_. **You should get a total score above -10, at least in some games.** For one game, plot the angle and the angular velocity as a function of time, and the controls in a separate figure.

## Part 2: Neural Network controller

In this part, you will train a neural-network based controller based on the optimal control strategy found in Part 1. To this aim you should:
1. Create a feedforward neural network that takes the current **observation** ($x$, $y$, $\omega$) as an input and returns the **control** to be applied to the system. Make sure that the returned value is "admissible".
2. Generate a training dataset by playing a certain number of games (suggested minimum 100) using the functions implemented in Part 1 and providing as random initial conditions an angle between -20 and 20 degrees and zero angular velocity. Note: this step may be _slow_. Remember to convert the dataset to `torch` tensors with float32 precision.
3. **Train** and **select** the network (by exploring different _architectures_ and values for the _hyperparameters_).
4. Play 2000 games using the function implemented in Step 1 with the controls given by the "best" network. Compute the average total score. **You should get an average score above -2.** Compare the average total score with that of a _random controller_.