# Quiz: Foundations of Robotics - Managing the world complexity: from linear regression to deep learning.

This "quiz" is in fact a small python exercice of practical interest.

First, you will use a pre-trained `YOLO` CNN to detect things like people and cars in real camera images.

Then, you will familiarize with `Gym` and use the `stable baselines` RL framework to train a simple MLP policy.

_(NB: to execute a jupyter cell, you can press SHIFT+RETURN)_

### Installation of the required python libraries:

_(NB: this requires an Internet connection and may take some time, remove the -qqq if you need troubleshooting)_

In [None]:
!pip install opencv-python -qqq
!pip install numpy -qqq
!pip install gym -qqq
!pip install pyglet -qqq
!pip install stable-baselines3 -qqq

### Libraries used in this notebook:

In [None]:
import csv  # to read CSV files, such as the classes file for YOLO
import cv2  # for efficient image manipulation
import numpy as np  # for efficient array manipulation
import gym  # RL environments
from stable_baselines3 import SAC  # deep RL training algorithm

# other libraries
import time
import warnings
warnings.filterwarnings('ignore')

## 1) You Only Look Once (YOLO)

In this first exercice, you will be using a YOLO pre-trained CNN for detecting other agents in camera images. Before we start, check that all the following files are in the same folder as the notebook:

- `yolov3.weights`: this file contains pre-trained parameters for a huge YOLO-v3 neural network, trained on a dataset called COCO (Common Objects in COntext).

- `yolov3.cfg`: this file describes the architecture of the big YOLO-v3 neural network.

- `yolov3-tiny.weights`: contains parameters of a much lighter variant of the YOLO-v3 architecture, also trained on the COCO dataset.

- `yolov3-tiny.cfg`: describes the tiny variant.

(NB: these two models are meant to process images of size 320 x 320, but this notebook will handle other sizes automatically)

- `coco.name`: contains the names of all 80 classes present in the COCO dataset (you can read the file with a text editor like notepad).

- `street_image.jpg`: example camera image for testing our YOLO pre-trained models.

### Question 1.0
**Q:** Is YOLO supervised or unsupervised?

**A:** (Your text here)

### Question 1.1
**Q:** Both pre-trained YOLO models defined by the aforementionned files are unable to detect one of the following classes, which one is it, and why?
- A car
- A person
- An elephant
- A banana
- A pineapple
- A pizza

**A:** (Your text here)

### Useful functions:

The following functions are provided for your convenience, so you don't need to learn YOLO implementation details. Just read the docstrings.

In [None]:
def get_yolo(configuration, weights, classes):
    """Extracts a pre-trained YOLO model and a list of class names from files.
    
    Args:
        configuration: name of the configuration file (e.g., 'yolov3.cfg')
        weights: name of the parameters file (e.g., 'yolov3.weights')
        classes: name of the classes file (e.g., 'coco.names')
    
    Returns:
        yolo: the pre-trained model
        yolo_classes: the list of available class names
    """
    with open(classes, 'r') as f:
        yolo_classes = [item[0] for item in csv.reader(f)]
    yolo = cv2.dnn.readNetFromDarknet(configuration, weights)
    yolo.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
    yolo.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
    return yolo, yolo_classes

def yolo_inference(yolo_model, yolo_classes, image,
                   yolo_width=320, yolo_height=320,
                   confidence_threshold=0.5, nms_threshold=0.3):
    """Performs a forward propagation in a pre-trained YOLO model.
    
    Args:
        yolo_model: pre_trained model (output of get_yolo())
        yolo_classes: list of class names (output of get_yolo())
        image: input cv2 image
        yolo_width: width expected by the model
        yolo_height: height expected by the model
        confidence_threshold: detections are reported only above this threshold
        nms_threshold: NMS threshold for filtering out irrelevant boxes
    
    Returns:
        classes: classes of all detected objects
        scores: confidence scores of all detected object
        boxes: x, y, width and height of all detected bounding boxes
    """
    height = image.shape[0]
    width = image.shape[1]
    blob_image = cv2.dnn.blobFromImage(image, 1/255,
                                       (yolo_width, yolo_height),
                                       [0, 0, 0], 1, crop=False)
    yolo_model.setInput(blob_image)
    layer_names = yolo_model.getLayerNames()
    out_layers = yolo_model.getUnconnectedOutLayers()
    out_layer_names = [layer_names[i - 1] for i in out_layers]
    outputs = yolo_model.forward(out_layer_names)
    res_classes = []
    res_scores = []
    res_boxes = []
    for out in outputs:
        for detection in out:
            scores = detection[5:]
            class_index = np.argmax(scores)
            class_confidence = float(scores[class_index])
            if class_confidence > confidence_threshold:
                box_width = int(detection[2] * width)
                box_height = int(detection[3] * height)
                box_x = int(detection[0] * width - box_width / 2)
                box_y = int(detection[1] * height - box_height / 2)
                res_classes.append(yolo_classes[class_index])
                res_scores.append(class_confidence)
                res_boxes.append([box_x, box_y, box_width, box_height])
    res_classes = np.array(res_classes)
    res_scores = np.array(res_scores)
    res_boxes = np.array(res_boxes)
    if len(res_classes) != 0:
        valid_boxes = cv2.dnn.NMSBoxes(res_boxes, res_scores,
                                       confidence_threshold, nms_threshold)
        res_classes = res_classes[valid_boxes]
        res_scores = res_scores[valid_boxes]
        res_boxes = res_boxes[valid_boxes]
    return res_classes, res_scores, res_boxes

def draw_boxes(image, classes, scores, boxes):
    """Draws bounding boxes in image from the output of yolo_inference()
    
    Args:
        image: the cv2 image to draw boxes onto
        classes: 1st output of yolo_inference()
        scores: 2nd output of yolo_inference()
        boxes: 3rd output of yolo_inference()
    """
    assert len(classes) == len(scores) == len(boxes), "Dimensions don't match."
    for i, c in enumerate(classes):
        s = scores[i]
        b = boxes[i]
        x, y, w, h = b
        cv2.rectangle(image, (x, y), (x+w, y+h), (255, 0, 0), 2)
        cv2.putText(img=image,
                    text=f"{c.upper()}: {int(s*100)}%",
                    org=(x, y-10),
                    fontFace=cv2.FONT_HERSHEY_PLAIN,
                    fontScale=1.0,
                    color=(255, 0, 0),
                    thickness=1)

### "Big" YOLO model on a street picture:

The `street_image.jpg` picture is a camera image featuring a visually complex street scenario.

In [None]:
# load the image:

image = cv2.imread('street_image.jpg')

print("Showing image. Focus the window and press any key to close.")

# Show the scene:

cv2.imshow('capture', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Let us use YOLO to detect people and cars in this scene:

In [None]:
image = cv2.imread('street_image.jpg')

yolo, yolo_classes = get_yolo(configuration='yolov3.cfg',
                              weights='yolov3.weights',
                              classes='coco.names')
classes, scores, boxes = yolo_inference(yolo, yolo_classes, image)
draw_boxes(image, classes, scores, boxes)

print("Showing detected objects.")

cv2.imshow('capture', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Pretty neat, right?

### Question 1.2
**Q:** Now, as an exercice, do the same thing, but using your webcam (or another video stream) to process frames in real time, drawing bounding boxes, and displaying them on top of the stream. You can use the following code base:

```python
capture = cv2.VideoCapture(0)

print("Press any key to close the window.")
while True:
    success, image = capture.read()
    assert success
    
    # your code here
    # (...)
    
    cv2.imshow('capture', image)
    k = cv2.waitKey(1)
    if k != -1:
        break
cv2.destroyAllWindows()

del(capture)
```

In [None]:
# A: (Your code here)

### Question 1.3
**Q:** Err, that was quite laggy. And your computer CPU is pretty fast! Imagine the same model running on an embedded system... Come up with a solution to accelerate the framerate of our little application. What is the tradeoff here?

In [None]:
# A: (Your code here)

**A:** (Your text here)

Nevertheless, keep in mind that YOLO is likely not using its capacity optimally (and in general, no deep learning model is). Coming up with better architectures (i.e., better inductive biases) and better training algorithms can greatly improve the performance of similar small models in the future.

## 2) Soft Actor-Critic (SAC)

In this second exercice, we will familiarize with `Gym` environments, and use the `SAC` deep RL algorithm to solve a simple continuous control environment called `Pendulum-v0`. SAC is readily implemented in the `Stable Baselines 3` RL framework.

Gym is a simple interface wrapping RL environments. It is designed so that readily implemented RL algorithms can easily interact with these environments. The main methods in a Gym environment are `reset()`, which sets the environment in an initial state and outputs an initial observation, and `step(action)`, which takes an action as input and outputs a new observation, an instantaneous reward, a done signal and an information dictionary.

Gym comes with a number of readily available environments, such as `Pendulum-v0`:

In [None]:
env = gym.make("Pendulum-v0")

`Pendulum-v0` is a classic continuous control task in which one tries to balance an inverted pendulum up against gravity.

Let us examine the _observation space_ of this environment, i.e., the range of possible observations returned by `reset` and `step`:

In [None]:
print(env.observation_space)

This observation space is a `Box` of 3 float values, comprised between `[-1, -1, -8]` and `[1, 1, 8]`. In other words, observations for this environment are arrays of 3 floats:
- an angle cosine (between -1 and 1 rad)
- an angle sine (between -1 and 1 rad)
- an angular speed (between -8 and 8 rad/s)

Now, let us examine the _action space_, i.e., the range of possible actions taken as input by the `step` function:

In [None]:
print(env.action_space)

This action space is very simple: unidimensional, between -2 and 2. Actions for this environment are arrays of 1 single float:
- the applied torque (between -2 and 2)

The `render` method enables visualizing the environment. Let us apply actions sampled randomly in the action space of `Pendulum-v0` and see what happens:

(_The visualization window may be hidden by the notebook, if so, bring it to the front._ **Note: do not close the window otherwise python will crash, just wait until it is closes itself.**)

In [None]:
obs = env.reset()  # sets the envrionment to an initial state
for _ in range(200):  # let us perform 200 time steps
    act = env.action_space.sample()  # samples a random action
    obs, rew, done, info = env.step(act)  # applies the action
    env.render()  # renders the environment
env.close()  # closes the rendering window

We have applied random torques at each time step and the pendulum has done crazy things in response.

### Question 2.1
**Q:** Instead of applying random actions, **try** to design your own hard-coded policy, i.e., use `obs` to compute clever actions that balance the pendulum up against gravity. Do not spend too much time on this, as this is just to give you a feeling of how difficult this "simple" task is. In fact, no correction will be provided for this question. You may use the following code base:

```python
def policy(observation):
    
    cos_theta, sin_theta, theta_dot = observation
    torque = 0.001  # use cos_theta, sin_theta and theta_dot to adapt this
    # (NB: do not mind the display bug when torque is exactly 0)

    action = np.array([torque,])
    return action

obs = env.reset()
for _ in range(200):
    act = policy(obs)
    obs, rew, done, info = env.step(act)
    env.render()
env.close()
```

In [None]:
# A: (Your code here)

Hardcoding your own effetive policy would require a non-negligible amount of engineering: probably you would need to make the pendulum oscilliate until it is close enough to the goal position, where you would need to switch to some PID-like controller for stabilization. Deep RL is an interesting alternative, since it is able to automatically find a working policy by trial-and-error instead.

`Stable baselines 3` is a framework that readily implements many state-of-the-art deep RL algorithms. Because our environment is wrapped in the `Gym` interface, we can easily use this framework to train a policy in our environment. For instance, we can directly try the `SAC` algorithm as follows:

_(NB: this will take some time until "total_timesteps" reaches 10000)_

In [None]:
# SAC using an MLP policy in our environment:

model = SAC("MlpPolicy", env, verbose=1)

# Train our MLP policy for 10000 training steps:

model.learn(total_timesteps=10000)

### Question 2.2
**Q:** Examine the `ep_rew_mean` metric. What do you think this metric represents? How did it evolve during training? Why?

**A:** (Your text here)

We can now use the `predict` method of our `SAC` object as our trained policy:

In [None]:
def policy(observation):
    action, _states = model.predict(obs, deterministic=True)
    return action

### Question 2.3
**Q:** Finally, test this policy in our environment for 1000 time steps (i.e., 5 episodes).

_(NB: when an episode is complete, the `done` signal becomes `True`, and `reset` must be called)_

In [None]:
# A: (Your code here)