# Input Output. Delivery One

<img src="../docs/images/image-banner.png" align="middle" width="3000"/>

## G01: Victor Armisen and David Recuenco

## 1. Introduction
The purpose of this notebook is explaining our work with the environment "Tennis" from the ml-agents examples.
This example, as its name says, simulates a tennis match between two agents which follows the real tennis rules.
What we wanted to do with the example was using only an agent and:
* Make the agent play paddle alone and following the game's rules: the ball has to touch the ground once before being hit and if the ball touches the ground two times in a row without touching the front wall, the point is lost.
* Make the agent do keepy-ups not letting the ball touching the ground. 
* The same keepy-ups as above but with a force and racket rotations, making it harder for the agent to keep the ball in the air.

### The team
Name | Enti email | Picture
--- | --- | ---
Victor Armisen | victorarmisencapo@enti.cat | placeholder picture
David Recuenco | davidrecuencooliver@enti.cat | <img src="../docs/images/DRO.png" width="100"/>

## 2. Case Analysis
The example is managed by 3 scripts:

* **TennisArea**

Manages the ball's physics and has the function to reset the match, spawning the ball in a random side of the court.

In [None]:
    public void MatchReset()
    {
        var ballOut = Random.Range(6f, 8f);
        var flip = Random.Range(0, 2);
        if (flip == 0)
        {
            ball.transform.position = new Vector3(-ballOut, 6f, 0f) + transform.position;
        }
        else
        {
            ball.transform.position = new Vector3(ballOut, 6f, 0f) + transform.position;
        }
        m_BallRb.velocity = new Vector3(0f, 0f, 0f);
        ball.transform.localScale = new Vector3(.5f, .5f, .5f);
        ball.GetComponent<HitWall>().lastAgentHit = -1;
    }

* **TennisAgent**

Obviously, manages the agent which in this case is the racket. The script has the heuristic, movement and reset.
In the agent we slightly change its properties in the case of Keep Up to simulate the touches.
In all cases, we use the inputs of the vectorActions to change the speeds and rotations of the agent.

In [None]:
// Tennis Agentvar moveX = Mathf.Clamp(vectorAction[0], -1f, 1f) * m_InvertMult;
    var moveY = Mathf.Clamp(vectorAction[1], -1f, 1f);
    var rotate = Mathf.Clamp(vectorAction[2], -1f, 1f) * m_InvertMult;
    if (moveY > 0.5 && transform.position.y - transform.parent.transform.position.y < -1.5f)
    {
        m_AgentRb.velocity = new Vector3(m_AgentRb.velocity.x, 7f, 0f);
    }
    m_AgentRb.velocity = new Vector3(moveX * 30f, m_AgentRb.velocity.y, 0f);
    m_AgentRb.transform.rotation = Quaternion.Euler(0f, -180f, 55f * rotate + m_InvertMult * 90f);
    if (invertX && transform.position.x - transform.parent.transform.position.x < -m_InvertMult ||
        !invertX && transform.position.x - transform.parent.transform.position.x > -m_InvertMult)
    {
        transform.position = new Vector3(-m_InvertMult + transform.parent.transform.position.x,
            transform.position.y,
            transform.position.z);
    }

* **HitWall**

.
The example checks the collision of the ball on both sides of the court to determine which player has won the point. The winning player received a positive reward and the loser a negative one, the score was added to the canvas and the scene was reset. It also checks the double bounce of the ball on each of the tracks with the variable lastFloorHit
and determines the service served with net. All this to verify that victory condition.

In [None]:
            else if (collision.gameObject.name == "wallB")
            {
                // Agent B hits into wall or agent A hit a winner
                if (lastAgentHit == 1 || lastFloorHit == FloorHit.FloorBHit)
                {
                    AgentAWins();
                }
                // Agent A hits long
                else
                {
                    AgentBWins();
                }
            }
            else if (collision.gameObject.name == "floorA")
            {
                // Agent A hits into floor, double bounce or service
                if (lastAgentHit == 0 || lastFloorHit == FloorHit.FloorAHit || lastFloorHit == FloorHit.Service)
                {
                    AgentBWins();
                }
                else
                {
                    lastFloorHit = FloorHit.FloorAHit;
                    //successful serve
                    if (!net)
                    {
                        net = true;
                    }
                }
            }

### Rewards:
Rewards, as mentioned above, are awarded when the collision is detected on the corresponding track side.
The possibilities are calculated for each agent and the reward is acquired with the SetReward () function.

In [None]:
void AgentAWins()
 {
 m_AgentA.SetReward(1);
m_AgentB.SetReward(-1);
 m_AgentA.score += 1;
Reset();
 }
void AgentBWins()
{
m_AgentA.SetReward(-1);
 m_AgentB.SetReward(1);
 m_AgentB.score += 1;
 Reset();
}

### States:
To control all of the above, the example uses a few simple states to determine each event that occurs during the game. 
It simply detects service, when the ball hits each side of each agent's court and another would detect if there is no direct bounce from the ball..

In [None]:
public enum FloorHit
{
Service,
FloorHitUnset,
FloorAHit,
FloorBHit
}

### Training:
We do learning checks around a 100K Steps. From then on, we consider that, if we see correct results, the model is worth it.
We check these results through the variables used and the graphs that we obtain locally with Tensor.

## 3. Performance Analysis
Explicación

Once the analysis is performed, the example will work quickly. We consider the following.
The example represents an ELO variable for each player to know the degree of success they have in their actions when playing against the enemy bot.
Then, in tensorFlow, we have the cumulative values of the player that indicate his learning progress and a more specific analysis of him.
Finally, it shows us the graph of Episode.
By testing, we are clear that you cannot miss the reward of simply having the ball touch the racket. Without this positive reward, we cannot derive other more specific events that will help us carry out our cases.
As negative rewards, they would be the collisions with the ground, which allow us to drift to walls or other objects that we could insert.
In this perform analysis, we have been able to see the agent's learning curve. It has allowed us to see the phases you go through and the time / steps you need to learn to adequately and perfectly perform these actions performed by the trained brain .nn.
The progression we are talking about would be: 1) begin to perform the basic actions in realizations at changes in speed and rotations 2) Little by little, it will hold and be more stable and minimizing the error.

## 4. New case proposal

As mentioned in the introduction, we have three different cases to train the agent with.
For making this possible we had to remove one of the scene's agents and adapt the scripts to one agent only since they were made for two.

The spawn of the ball had to be modified since it was randomly spawned to both sides.

In [None]:
        var ballOut = Random.Range(-6f, -8f); // distancia en x
        ball.transform.position = new Vector3(ballOut, 8f, 0f) + transform.position;

### Paddle:

<img src="../docs/images/Paddle.png" align="middle"/>

For the Paddle case all was focused on the collisions the ball made. For this, an enum was used as for a status machine in order to check what it colided with.

In [None]:
    void OnCollisionEnter(Collision collision) {
        switch (state) {
            case Status.Floor:
                if (collision.gameObject.name == "Agent") {
                    state = Status.Agent;
                } 
                else Death();
                break;

            case Status.Agent:
                if (collision.gameObject.name == "WallFront") {
                    if (!firstGame) {
                        currentTouches++;
                        GivePositiveReward();
                    }
                    else {
                        firstGame = false;
                        GivePositiveReward_Less();
                    }
                    state = Status.Wall;
                }
                else Death();
                break;

            case Status.Wall:
                if (collision.gameObject.name == "Floor") state = Status.Floor;
                else Death();
                break;
        }
    }

There are two types of reward given to the agent:
* **GivePositiveReward()** Gives a reward of value 1. Used for the normal touches.
* **GivePositiveReward_Less()** Gives a reward of value 0.5. Used for the first successful touch since the next ones are the ones that count.

As for the results, the agent's learning is slow at first but as it's swon in the graphs, it learns exponentially
<img src="../docs/images/Paddle_ELO.png" align="middle"/>
<img src="../docs/images/Paddle_AR.png" align="middle"/>
<img src="../docs/images/Paddle_EL.png" align="middle"/>

One trick used to help out the agent to learn faster got nothing to do with the rewards: I changed the height position of the spawn of the ball so the agent does not need to wait for the ball to fall. The ball spawns close to the agent so he can hit the ball and let it enough space to hit the wall and then the floor without letting the agent hit the ball in the middle that easily. Another trick used to help out the agent was keeping the rewards constant to it gets used to the training without changes while learning.

### Keepy-ups simple: No forces and no racket rotations
<img src="../docs/images/KeepsUps.png" align="middle"/>

In the first case of Keepy-ups, we have a first case without rotation in which the racket has to learn to approach the instanced ball in a random way and keep the touches up.
In the second case, the racket has to learn to control the rotation of the racket and we send the ball to different places already in XYZ. When instantiating the ball, we apply a force to it so that it does not simply fall.
<img src="../docs/images/EX2_Tennis_G01.png" align="middle"/>
The cases are intentionally made so the agent first learns to move in the environment and match the ball and then learn more complex situations with randomized force and control of the ball with the rotations of the racket.

In [None]:
// Code Keepy-ups//EX3: Wind and RotationVector3 dir = ball.transform.position - transform.position;
dir.Normalize();

distance = ball.transform.position.x - transform.position.x;
distance = Mathf.Abs(distance);
if (distance < 2.0f)
{
AddReward(1);
}
else
{
AddReward(-1);
}
m_AgentRb.velocity = new Vector3(moveX * dir.x * 30.0f, m_AgentRb.velocity.y, 0f); //We move in XY.

//EX3: Wind and Rotation
m_AgentRb.velocity = new Vector3(moveX * dir.x * magnitude, m_AgentRb.velocity.y, moveX * dir.z * magnitude); //We move in XYZ.
//We removed the Invert boolean from the original example since we only use a racket and no sides. The racket learns to rotate on the Z axis to control the ball.
m_AgentRb.transform.rotation = Quaternion.Euler(-180f, -180f, 55f * rotate);

### Keepy-ups with a force and racket rotation control ball
We calculate the distance between the ball and the agent to know the degree of accuracy at which it crosses the ball's path.
<img src="../docs/images/EX3_Tensor.png" align="middle"/>

We give positive reward if the ball touches the racket and negative if it touches the floor or invisible walls that we have simulated. The scene is only reset if it touches the ground.
If it touches the wall, we don't reset it to help you and improve the result. Floor -1 Wall -0.5 Racket 2

In [None]:
// Awards for keepy-ups
void OnCollisionEnter(Collision collision)
{
if (collision.gameObject.tag == "iWall")
{
 m_Agent.AddReward(-0.5f);
 if (!EX3)
 {
     Reset();
 } 
 else
   {
     gameObject.GetComponent<Rigidbody>().velocity = Vector3.zero;
 }
}
 if (collision.gameObject.tag == "Agent")
 {
     m_Agent.AddReward(2);
 }
 if(collision.gameObject.name == "Floor")
  {
 m_Agent.AddReward(-1);
Reset();
}
}