Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to shuffle if the fields are not of same length #1798

Closed
michael20at opened this issue Mar 7, 2019 · 12 comments
Closed

Unable to shuffle if the fields are not of same length #1798

michael20at opened this issue Mar 7, 2019 · 12 comments
Assignees
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@michael20at
Copy link

michael20at commented Mar 7, 2019

Hm, tried ml-agents 0.7 with my new 2080ti (runs great on normal Tensorflow),
updated Unity to the newest version 2018.3.7f1, training of 3D Balls example via cmd starts fine, but after a few episode (sometimes only a few, sometimes after 9) I get

"Unable to shuffle if the fields are not of same length"!

What could be the problem?

@michael20at
Copy link
Author

Hm, it kinda works if I increase the buffer size by ten in the config file, but I still don't know why?

@xiaomaogy xiaomaogy added the help-wanted Issue contains request for help or information. label Apr 2, 2019
@xiaomaogy
Copy link
Contributor

This is kind of weird and unexpected, I don't see other people raising this issue, so I guess maybe delete and rerun and try to follow the basic guide step by step might help.

@Ina299
Copy link

Ina299 commented Apr 2, 2019

I also experienced this issue.
I use ml-agents 0.6 on 1080ti.
batch_size: 1536 beta: 0.006 buffer_size: 15360 epsilon: 0.17 gamma: 0.995 hidden_units: 512 lambd: 0.9 learning_rate: 0.00012 max_steps: 5.0e15 normalize: True num_epoch: 4 num_layers: 4 time_horizon: 768 summary_freq: 2000 use_recurrent: False use_curiosity: False
I saw this phenomenon when I took agent' s life time too long.
So I guess agent should be Done() before buffer_size.

@ervteng ervteng added the needs-info Issue contains insufficient information to be resolved. label Apr 3, 2019
@ervteng
Copy link
Contributor

ervteng commented Apr 3, 2019

Are you using a Numpy version greater than 1.14.1 by any chance? Newer versions of Numpy are known to have similar issues.

@rainysolar
Copy link

rainysolar commented Apr 23, 2019

I have the same issue.
I use ml-agents 0.8.1 on 2070 card which only support CUDA 10.
CUDA 10 needs TensorFlow >=1.13.0.
And Numpy 1.14.1 is not support TensorFlow>=1.13.0

@yttiktak
Copy link

Same issue. Using Ubuntu Bionic, numpy v 1.16.2
Trying to track it down, seems a numpy array (advantages) changes shape from (N,) to (1,N) when the ppy trainer.py subtracts the mean from it, about line 325.
I hacked out a thing to bypass policy updates when that happens. Walker seems to be training now. My 5 yr old approves of the walker. It got up to a reward of 300 so far.

@xiaomaogy xiaomaogy added bug Issue describes a potential bug in ml-agents. and removed help-wanted Issue contains request for help or information. needs-info Issue contains insufficient information to be resolved. labels May 20, 2019
@JadenTravnik
Copy link

This doesnt seem to happen when using Numpy 1.14.5 and tensorflow 1.7 as mentioned in setup.py.
I'm still not sure what the issue is but at least one doesn't get the bug.

@EfveZombie
Copy link

It should be a compatibility problem with higher version of Tensorflow or Numpy.

In my situation (py 3.7 && tf 1.13 && np 1.16), simply adding parameter dtype = float to all the calls of numpy.array() in ml-agents/mlagents/trainers/buffer.py could fix this problem.

But I dont know if that would cause any potential problems.

@ervteng
Copy link
Contributor

ervteng commented Jul 12, 2019

The fields in the buffer should be all numerical, so I don't see an issue with adding the dtype. We'll definitely track this as we upgrade to newer versions of Numpy.

ervteng pushed a commit that referenced this issue Jul 17, 2019
Fixes shuffling issue with newer versions of numpy (#1798). 
* make get_value_estimates output a dict of floats
* Use np.append instead of convert to list, unconvert
* Add type hints and test for get_value_estimates
@ervteng
Copy link
Contributor

ervteng commented Jul 17, 2019

Hi @EfveZombie, we found the root cause of this issue and fixed it on the latest develop branch. Let us know if it fixes your issue. Thanks!

@awjuliani
Copy link
Contributor

Closing this issue as it was addressed in a recent release of ML-Agents.

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

9 participants