Agent Only Learns to Rotate rather than move (always)!! (Best Formatted Issue So far!!😂) #1457

ust007 · 2018-12-01T23:57:16Z

Hey, I know this is too much to read, but I tried to Explain the problem in the best way.

So I wanted to train an agent to reach a target at some random distance on the x-z plane with obstacles in between. My agent and Target are both on the same plane.
Both are spawning at random locations in a range of 30f.

INPUTS(Continous)

x and z relative position of the target
AddVectorObs(relativePosition.x / 30f);
AddVectorObs(relativePosition.z / 30f);
5 rayPer.Percieve at different angles. Same as the Hallway Example
Code:
float rayDistance = 10f;
float[] rayAngles = { 20f, 60f, 90f, 120f, 160f };
```
 string[] detectableObjects = { "target", "wall" };
AddVectorObs(rayPer.Perceive(rayDistance, rayAngles, detectableObjects, 0.3f, 0f));.
```
AddVectorObs(GetStepCount() / (float)agentParameters.maxStep);
also copied from Hallway Example (Up to my understanding this gives a sense of time to agent.).
If the agent is on bad road or not. which is for now always 0 as i haven't introduced.
AddVectorObs(OnRoad ? 1 : 0);

OUTPUTS (Continous) :
only 2 one to move forward and backward
another one to rotate left and right
And its exactly same as the Hallway Example

Rewards
+1 if the anent collides with the Target
Time Penalty : AddReward(-1f / agentParameters.maxStep);
And a small close in reward and small penalty to move away.

Now the parameter of .yaml file is exactly same as Hallway Example.
Tried a lots of different things but nothing worked so just copied it.

Training-- All the training done with obstacles disabled.
100f time scale
PPO-LSTM
Same as Hallway

When the target is in the range of rayPer.Percieve
At First the agent learns to rotate and keeps on rotating to about 150k steps and after that it moves and starts to close in the target 2-3 times but doesn't reaches it. but after that it knows that closing in gives positive reward, so it does that everytime..
A=> so basically after every reset it starts rotating (and that too its always same direction) and when rayPer.Percieve detects Target it closes in..
So, its trained? see further...
In the above training nearly at 300k I decided to increase the distance of the target.
Now, if the target is in the range of rayPer.Percieve then it does the same as A (statement A in 1st point of training.)
and if the target is far off then it just keeps on rotating (same direction as usual) and sometimes random motion to move forward and rotate at the same time.
So no Success :( after 600k steps.
So at 600k I changed all the Target positions are out of range for rayPer.Percieve.
Result - Keeps on rotating..
After 1 , 2 and 3 i thought to train it from scratch and from starting the Target will be out of range for rayPer.Percieve.
Again no success ... just learns to rotate ( same direction) and v v less movements.

No Success even when the Obstacles were Disabled.

So, if someone can help and guide me or any suggestions will be helpful.
My thinking what can be wrong

The task is difficult and needs more training.
Maybe i am giving unnecessary extra data as input.
Needs some different kinda approach to train gradually.
LSTM doesn't works well with continuous input.
but I also did the training step 1 and 2 with use_recurrent: false. So maybe LSTM is not the problem.

Next things I am gonna try-

Discreet Space Type.
Train Multiple Agents at the same time.

PS: I really wanna know why it keeps on learning to rotate always, and that too in the same direction.
Also, this took me an hour to type while my stupid Agent was again learning to rotate in Background. 😂😂😂

All the screenshots of the Results -
TRAINING-1

TRAINING-2

TRAINING-3

TRAINING-4

The text was updated successfully, but these errors were encountered:

superjayman · 2018-12-03T04:02:02Z

I have noticed similar behavior from my Agents, it looks like sometimes they get stuck in a local minima and they just going around in circles eating up food. Not sure how to fix?

LeSphax · 2018-12-03T18:40:40Z

One thing I can see from your graphs is that entropy is increasing. When that happens to me it usually means that the task is too hard for the agent and he is not getting enough rewards.

Entropy increasing means that the agent is becoming less confident about how good his actions are, essentially he is picking more random actions.
Part of the ppo algorithm is rewarding the agent when he takes random actions to make sure it keeps exploring. So your agent thinks that taking random actions is the best way to get rewards.

How does this small closing in reward work? It feels like it is a one time reward and after getting it the agent decides that there is nothing more to get and he should just take random actions.

So you could either make this reward continuous, i.e reward him every time the agent gets closer than the last frame.
You could also put the goal closer to the agent so that there is a better chance that he will move to it randomly. Either using curriculum learning and putting the goal further and further each lesson.
Or just making the distance to the goal random.

ust007 · 2018-12-07T04:32:00Z

@LeSphax , I think you are right about Entropy.
But as I mentioned in my Post, I am giving continuously if the agent moving close to the Target in each step. And a Penalty for going away.
And I changed to Discreet Space Type, and this changed a lot.
But still I am unable to train it to go near the target if the target is not in the range of the rayPer.Percieve().
I tried both techniques , First- starting from close and then slowly taking them far away.. and another time just starting from far away points... No good results from both.

I am starting to think if the task is too difficult?

LeSphax · 2018-12-12T06:36:34Z

Hey @ust007,

Just to make sure I understand the reward and penalty that the agent gets for moving close/ going away, could you detail it further or maybe show me the code?

Now that I think about it, maybe the penalty for going away is the problem here. I think it is generally better to avoid giving penalties to an agent that are caused by his actions. I.e:
DO Give the agent a small penalty every timestep no matter what (to encourage finishing faster)
DONT Give the agent a penalty when he does an action that you don't want it to do.

Because until the agent understands the difference between moving away and moving closer he will just understand that whenever he moves he has a 50/50 chance of getting a penalty or getting a reward.
So removing the penalty should be helpful in your case.

You say you tried to put the goal close to the agent, but how close did you put it? My idea was at first to put the goal so close to the agent that he will walk on it by chance quite often. This way you wouldn't need to rely on your getting closer reward, the reward of +1 if the agent collides with the Target should be enough.

About the rayPer.Perceive thing, it's hard to follow what you did exactly. It looks like you changed the environment manually in the middle of the training session.
I would recommend against changing the environment manually in general because it makes it hard to reproduce your results afterwards. If you want to change the task in the middle of training you should use curriculum learning as it allows you to retrain the agent from scratch on the same curriculum.

So to summarize:

Could you show the code of the small reward/penalty you are giving?
Try to remove the penalty for moving away.
If that doesn't work put the goal reaaaally close to the agent. Once it learns to move to the goal, use curriculum learning to put the goal further and further away.

ust007 · 2018-12-12T09:41:57Z

thanks @LeSphax ,
I will explain you about the reward/penalty thing (below)

First about the curriculum learning i didn't get the part where you said

it allows you to retrain the agent from scratch on the same curriculum.

what does this mean..

that the agent gets to know that there are some changes in the env and he has to try new things including the previous learned things.
he doesn't knows that the env has changed
Or maybe something else.. so pls explain in detail
Because as you said I was changing the environment manually by pausing or using the --load feature.
Explain me the difference if you can.

lemme explain what result I want.
In the end I want the agent to bypass the obstacles and reach the target..
What I did..

I am manually introducing the obstacles. (Don't know how to use curriculum learning here.)
I trained and it learned but sometimes it just gets stuck to the obstacles.
And there is an unexpected behavior in which the agent moves towards the target from behind , rather than from moving towards the target from the ray.persieve side.. it moves from the opposite end,
so this results in the agent being stuck on obstacles as it cannot detect that there's wall or obstacle behind. idk why this happens..

Now about the reward/penalty thing.. this is the code..

AddReward((previousDistance - distanceToTarget) / CurrentEpDistance);

previous dist: the distance b/w the target and agent in last step.
distToTarget: the distance b/w the target and agent in current step.
CurrentEpDistance : the initial starting distance at the start of te episode.

Overall, I wanted to reward the agent only with +1(after addition from all the steps) in an episode if he moves the full distance b/w the agent and target.
And similarly if he moves away this code gives it a penalty.

LeSphax · 2018-12-12T12:39:15Z

About the curriculum you should try to read the documentation but basically it allows to create several lessons.
This way you can have your agent do tasks that are harder and harder. For example a curriculum could be:

1- The goal is very close to the agent
2- The goal is far from the agent
3- The goal is far and there are obstacles

So using you can just retrain the agent from scratch everytime (i.e without using --load).
And the changes to the environment will be done automatically. This allows you to reproduce your results more easily because you don't have to remember what you did manually. Everything is written in code.

About your code, it seem to be correct but it might improve without the penalty.
From what I understand now you don't have the same problem, the agent is going towards the objective but get stuck to obstacles.
I think this just means it needs to train more. You could also start with smaller obstacles and make them bigger after it passes them successfully.

atapley · 2018-12-14T17:03:08Z

I have trained agents to locate goals when the agent's position and the goal's location are randomized as well, and in my experience, the agent will rotate in one direction when it starts off every time because it is looking for either the goal or something it recognizes, such as a corner or a specific obstacle. Once it sees something it is looking for, it will then move in that direction.

The reason why it moves backwards towards the goal is because of your distance reward. The agent learns that it will waste time by spinning around, so it tries to solve it by moving backwards instead since it can get a positive reward by moving backwards. If you get rid of the backwards action, it will guarantee that it needs to move using the ray face.

ust007 · 2018-12-14T17:20:23Z

@atapley I get your point about the rotation. But about the backward movement.. u said

since it can get a positive reward by moving backwards
the same reward he can get by moving forwards towards the goal. So there's no point to get biased with backwards movement.
But in later training it has improved and also approaches from the ray face.

atapley · 2018-12-14T17:29:08Z

The agent would need to turn around to face the goal which involves more actions than just moving backwards, meaning more negative time step rewards. The reason behind it learning as it trains more can just be that after getting stuck on an obstacle for a while due to it moving backwards, it learns that it's better to move forwards. But at least in the early stages of training, that is a possibility for the issue.

ust007 · 2018-12-15T06:25:36Z

@atapley Yeah you are right. Thanks for the help.

xiaomaogy · 2019-01-22T18:50:22Z

Thanks for reaching out to us. Hopefully you were able to resolve your issue. We are closing this due to inactivity, but if you need additional assistance, feel free to reopen the issue.

lock · 2020-01-22T22:10:35Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

ust007 changed the title ~~Agent Only Learns to Rotate rather than move (always)!!~~ Agent Only Learns to Rotate rather than move (always)!! (Best Formatted Issue So far!!😂) Dec 1, 2018

eshvk added the help-wanted Issue contains request for help or information. label Dec 4, 2018

vincentpierre assigned harperj Dec 17, 2018

xiaomaogy closed this as completed Jan 22, 2019

lock bot locked as resolved and limited conversation to collaborators Jan 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Only Learns to Rotate rather than move (always)!! (Best Formatted Issue So far!!😂) #1457

Agent Only Learns to Rotate rather than move (always)!! (Best Formatted Issue So far!!😂) #1457

ust007 commented Dec 1, 2018

superjayman commented Dec 3, 2018

LeSphax commented Dec 3, 2018

ust007 commented Dec 7, 2018

LeSphax commented Dec 12, 2018

ust007 commented Dec 12, 2018

LeSphax commented Dec 12, 2018

atapley commented Dec 14, 2018

ust007 commented Dec 14, 2018

atapley commented Dec 14, 2018

ust007 commented Dec 15, 2018

xiaomaogy commented Jan 22, 2019

lock bot commented Jan 22, 2020

Agent Only Learns to Rotate rather than move (always)!! (Best Formatted Issue So far!!😂) #1457

Agent Only Learns to Rotate rather than move (always)!! (Best Formatted Issue So far!!😂) #1457

Comments

ust007 commented Dec 1, 2018

superjayman commented Dec 3, 2018

LeSphax commented Dec 3, 2018

ust007 commented Dec 7, 2018

LeSphax commented Dec 12, 2018

ust007 commented Dec 12, 2018

LeSphax commented Dec 12, 2018

atapley commented Dec 14, 2018

ust007 commented Dec 14, 2018

atapley commented Dec 14, 2018

ust007 commented Dec 15, 2018

xiaomaogy commented Jan 22, 2019

lock bot commented Jan 22, 2020