# ML-Agents: Penguins
## Baby penguin feeding, killer whale avoiding penguins

### 1. Introduction
This notebook's goal, initially, is to have a penguin be able to pick up fish and feed its baby.
Then, it will need to learn to avoid killer whales (a.k.a. Orca) which hurt the penguin.

### 2. Case analysis

![Penguin Area](Images/penguinArea.jpg)

#### Agent Actions
Penguins' movement is described within vectorAction[2], where:

    vectorAction[0] can be either 0 or 1, describing either staying still or moving forward.
    vectorAction[1] can be 0, 1 or 2, and is used for describing not turning, turning left or turning right, in that order.
    
Intermediate values are disallowed.
Each movement adds a (-1/5000) reward. This is in order to reward penguins that use fewer movements to achieve the same result.
   
#### Observations

Penguins collect the following observations of their state:
- Whether it is full or not (1xfloat)
- Distance to the baby penguin (1xfloat)
- Direction to the baby pengiin (3xfloat)
- Direction the penguin is facing (3xfloat)
=> In total we get 8 tracked values.

#### Perception

![Penguin Sphere Cast sensor](Images/sphereCast_Penguin.png)

Penguins have a 3D Ray Perception Sensor, that senses objects with the following tags:
- Baby
- Fish
- _Untagged_

This sensor works with Sphere Casts, which are raycasts with checking spheres at the end.

#### Rewards
Besides the already mentioned negative reward for penguin movement, the following also exist:
- Eating a fish (+1)
- Regurgitating a fish (+1), in order to feed its baby

#### Curriculum
A curriculum is used in order to increase the training efficiency:

    PenguinLearning:
      measure: reward
      thresholds: [-0.1, 0.7, 1.7, 1.7, 1.7, 2.7, 2.7]
      min_lesson_length: 80
      signal_smoothing: true
      parameters:
        fish_speed: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.5, 0.5]
        feed_radius: [6.0, 5.0, 4.0, 3.0, 2.0, 1.0, 0.5, 0.2]

### 3. Performance analysis

![Original case performance charts](Images/beforePerformance.png)

### 4. New case proposal

![Penguin Area with Orca](Images/penguinAreaWithOrca.jpg)

In addition to fulfilling its existing tasks, the penguin needs to learn to avoid killer whales.
They're their natural predator. Thus, we expect the behaviour to be of avoidance whenever near an Orca.

#### Perception
In order for the penguin to know when an Orca is nearby we need to add the Orca tag to the existing ones in the 3D Ray Perception Sensor.

#### Rewards
We add a new reward:
- Touching an Orca (-3)

The penguin is obviously punished for touching Orcas.

#### Curriculum
We add an entry to the curriculum that increases the Orcas' swimming speed:

        orca_speed: [0.0, 0.0, 0.0, 0.2, 0.3, 0.6, 1.2, 2.4]

### 5. New case performance

![With orcas performance charts](Images/afterPerformance.png)

### 6. Performance comparison

![Compared performance charts](Images/comparisonPerformance.png)

We can observe that penguins learn faster at feeding their babies well, but their cumulative reward never seems to reach the same level as the original case. At some points, penguins must be hitting orcas by accident.

### 7. Video

[![Vimeo Video: Penguins (ML-Agents Unity) ENTI I/O 2019-20](Images/videoThumbnail.jpg)](https://vimeo.com/432448709 "Penguins (ML-Agents in Unity) - ENTI I/O 2019-20")

### 8. Team

| ![Alex](Images/alex.jpg) |
|--------------------------|
|**Àlex Weiland Lottner**|
|alexweilandlottner@enti.cat|