Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editor crashes every time I try to play using an Internal brain type #918

Closed
beardordie opened this issue Jun 26, 2018 · 58 comments
Closed
Assignees
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@beardordie
Copy link

beardordie commented Jun 26, 2018

I just installed the newest ml-agents beta following the newest installation guides today, 06/25/2018. I had v0.2 beta working fine on another machine, so I know roughly how it's supposed to behave. What I am experiencing now is that all example scenes work fine on Player type brain and Heuristic type brain, but any time I set it to Internal type brain and use the provided bytes file for each example, the Unity Editor crashes upon pressing Play. I am new to crashes, so I'm not sure how to troubleshoot. I've attached the Editor log but I don't know how to read it for relevant information. I'm using Unity 2018.1.2f1
Editor.log

@xiaomaogy xiaomaogy self-assigned this Jun 26, 2018
@xiaomaogy xiaomaogy added the help-wanted Issue contains request for help or information. label Jun 26, 2018
@xiaomaogy
Copy link
Contributor

@beardordie Did you follow the new documentation guide? Do you have the new TensorflowSharp plugin? Have you installed the new python packages?

Also can you list out your detailed steps that leads to a crash?

@Setmaster
Copy link

Setmaster commented Jun 27, 2018

I'm having the same issue, the demo scenes work with player, heuristic and external but crash when internal is used. I'm using Unity 2018.1.6f1 and Anaconda. Editor.log, video

@xiaomaogy
Copy link
Contributor

@Setmaster Thanks for the video and log, but this information is not enough for us to help you. Please tell us the detailed steps to reproduce your error, specify things like what os you are using, which installation guide did you follow etc.

@Setmaster
Copy link

Setmaster commented Jun 28, 2018

@xiaomaogy
Win 10
Version 1803
Build 17134.112
Followed this repo's installation guide, choosing to use Anaconda and after that followed the basic guide and setup the project as it was instructed.

Some issues I had but solved:

Running learn.py ModuleNotFoundError:No module named 'docopt' - Solved it by wiriting (ml-agents) C:\Users\vi7or\Documents\Repositories\ml-agents\python>python ./learn.py --run-id=run01 --train

Then I had an issue with tensorflow - Solved by installing tensorflow using conda instead of pip

@xiaomaogy
Copy link
Contributor

Running learn.py ModuleNotFoundError:No module named 'docopt' - Solved it by wiriting (ml-agents) C:\Users\vi7or\Documents\Repositories\ml-agents\python>python ./learn.py --run-id=run01 --train

Did you use py instead of python before?

@xiaomaogy
Copy link
Contributor

Also after you click the play button in the editor, does the play button just get stuck there like that forever? Have you made sure you are using Tensorflow 1.7.1 in your python environment, and used the latest version of TensorFlowSharp plugin in the basic guide?

@Setmaster
Copy link

I don't remember trying py before and I'm not sure what you mean by stuck, a few seconds after pressing play the editor will crash. Here is a list of installed packages which includes the correct version of Tensorflow. I downloaded the package again from here ,and Unity says there is nothing new to import, also I downloaded the package and installed it yesterday so I believe I'm using the latest version.

@beardordie
Copy link
Author

beardordie commented Jun 28, 2018 via email

@xiaomaogy
Copy link
Contributor

@beardordie No it doesn't require the python and tensorflow to be installed. But we haven't tested on the cpu you have.

@Setmaster This is something I've never seen, I've tested our repo with the Windows 10 with Unity 2018.1, and the Internal Brain works without any crash.

Can you guys build the Unity executable with the Internal Brain checked, then run the built executable in the command line and see what happens? Without any error message I am not able to even guess what's going on wrong here.....

@Setmaster
Copy link

@xiaomaogy By executable do you mean player? About the cmd, what parameter would be used for this? Also, I built a player with the brain set to internal anyway, and it crashed when executed. Here is a copy of the player if it's useful.

@xiaomaogy
Copy link
Contributor

Hi @Setmaster, By executable I mean the stuff you've provided here.

I've tried to run your built executable provided above on my windows machine (Win 10), and it works without any crash. To this point I'm pretty sure it is a machine specific things. @beardordie Does this built executable work on your computer?

@Setmaster Is there any thing special about your computer? Have you tried this on any other computer?

@Setmaster
Copy link

@xiaomaogy I don't think so, here are some specs:
CPU: Intel Core i7 Extreme 980X @ 3.33GHz
Motherboard: ASUSTeK Computer INC. Rampage III Extreme (LGA1366)
SSD: Samsung SSD 850 PRO
Graphics: GTX 1080 EVGA

I ran it on another computer without issues.

@xiaomaogy
Copy link
Contributor

@Setmaster You ran it on another computer and it works? So what's the difference between that computer vs your own computer?

@beardordie
Copy link
Author

That built executable crashes on my computer with Intel Core 2 Extreme. I'm not surprised that this older processor is not working, but I am surprised that a computer with the specs xiaomaogy listed would have any trouble with it.

@beardordie
Copy link
Author

For reference, the GPU in my Intel Core 2 Extreme PC (which crashes upon using internal brain) has an AMD Radeon HD 5800. Again, a much older computer, but it should be able to run an internal brain regardless of whether it supports all the tensorflow stuff to train new brains.

@Setmaster
Copy link

Setmaster commented Jun 30, 2018

@xiaomaogy The other computer was a Thinkpad notebook, I don't know the exact specs, but I presume all of them are different from mine. Both my CPU and beardordie's seem to be quite old, maybe that's to blame?

@Liven28
Copy link

Liven28 commented Jul 2, 2018

Hello, same issue.

My specs : Windows10 / Unity 2018.1.0f2
Old proc too : i920 (hyper threading desactivated for OC purpose)

@xiaomaogy
Copy link
Contributor

@mmattar The windows machine we have is working, but for these people it seems that certain cpu specs will make the internal brain crash.

@Liven28
Copy link

Liven28 commented Jul 11, 2018

additionnal information, I have this message when installing TFSharpPlugin :
Unloading broken assembly Assets/ML-Agents/Plugins/Android/TensorFlowSharp.Android.dll, this assembly can cause crashes in the runtime

And I upgraded Unity to 2018.2.0f2 but the problem persists

@m4Ssa
Copy link

m4Ssa commented Jul 12, 2018

Hello, I'm also experiencing the same issue.

Specs: Windows10 / Unity 2018.1.0f / TF 1.7.1 / I7 Q740

Edit: I also had to build my TF from sources since the CPU does not have AVX support and the stock version didnt work.

@xiaomaogy xiaomaogy added the bug Issue describes a potential bug in ml-agents. label Jul 12, 2018
@xiaomaogy
Copy link
Contributor

xiaomaogy commented Jul 13, 2018

@m4Ssa @Livenvh @beardordie Could you please try the older version of the TensorFlowSharp plugin available here (https://s3.amazonaws.com/unity-ml-agents/0.3/ML-AgentsWithPlugin.unitypackage)? If the editor stops crashes with the older TensorFlowSharp plugin, then I will try to update this plugin and see if that can fix the problem. Right now I don't have a windows machine that will crash with the steps you guys described, so I am not able to find a solution for this.

@beardordie
Copy link
Author

beardordie commented Jul 13, 2018 via email

@beardordie
Copy link
Author

beardordie commented Jul 13, 2018 via email

@xiaomaogy
Copy link
Contributor

@beardordie Actually python is not related to this crash, so you don't need to install it to test it. If you have time to test, that would be really helpful. Thanks in advance.

@Liven28
Copy link

Liven28 commented Jul 13, 2018

@xiaomaogy
I had test the old TensorFlowSharp (on 2018.2)
=> The type or namespace name `CommunicatorParameters' could not be found (in RpcCommunicator and SocketCommunicator scripts

@xiaomaogy
Copy link
Contributor

@Livenvh How did you test it? It seems that some of your c# script has been changed. Are you sure all of your .cs scripts inside Assets/ML-Agents folder are in sync with the v0.4 master, and only the TensorFlowSharp plugin has been switched to the older version?

Also 2018.2 might not work, please use 2018.1.

@Liven28
Copy link

Liven28 commented Jul 13, 2018

juste finish to test :
New 2018.1 project + fresh ml-agent 0.4 (juste unzip gitub version and adapt player setings) + TensorFlowSharp you link above => The type or namespace name `CommunicatorParameters' could not be found (in RpcCommunicator and SocketCommunicator scripts (same than 2018.2)

@jjjuande
Copy link

It seems that Unity 2018.2 doesn't trust TensorFlowSharp.Android.dll so it's unloaded when the Unity Platform target is set to Android. And with that .dll unloaded, the projects won't run in the Editor when the platform target is Android. They run fine on an Android device or in the Editor when the platform target is set to anything else than Android. (e.g. Windows)

When the project is loaded with TFSharp installed:
Unloading broken assembly Assets/ML-Agents/Plugins/Android/TensorFlowSharp.Android.dll, this assembly can cause crashes in the runtime

When the project runs with Platform target set to Android:
TypeLoadException: Could not find method due to a type load error
Brain.InitializeBrain (Academy aca, Communicator communicator) (at Assets/ML-Agents/Scripts/Brain.cs:209)

@bmobear
Copy link

bmobear commented Jul 24, 2018

Same issue here. Brain type Player/Heuristic/External work fine. Unity crashes when the play button is clicked and the brain type is set to Internal.

Specs: Ubuntu 16.04 64-bit, Intel Core i7-6850K, Python 3.5.2, tensorflow 1.7.1, Unity 2018.2.0b2, ml-agent 0.4

also crashes after switching to tensorflow 1.7 and 1.9 with Unity 2018.2.0f2.

edited:
I tried on a different machine with the following settings and it works.
Specs: Ubuntu 18.04 64-bit, Intel Core i9-7940X, Python 3.6.5, tensorflow 1.9, Unity 2018.2.0f2, ml-agent 0.4

@VicMP0
Copy link

VicMP0 commented Jul 24, 2018

Hi, same issue here. Internal brain crashes editor and .exe.

Win10 64bits, i7 960, gtx 970, Unity 2018.1.6f1, last TFSharp ml-agent 0.4

@Pyroevil
Copy link

Pyroevil commented Aug 1, 2018

Same issue for me too with all examples scenes provided with the toolkit ( v0.4b ).
Training (external) work just fine but if I want to check the result in internal mode , Unity just close after few seconds. It's crash with my bytes files but with the bytes files provided with the toolkit too. I delete and restart the project from scratch many times. Trying the master branch or the last release ( v0.4b ).

Because I thinking it's not directly related to my tensorflow installation and my trained bytes files , I give it a try with the TensorFlowSharp v0.3 instead of v0.4. Now it's working but not with all samples scenes. Only with one don't having "Discrete" visualisation or action vector space type. " Continuous" type work . Discrete one give me error ( ex: GridWorld scene for this error log ):

TFException: NodeDef mentions attr 'dilations' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_FLOAT]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]>; NodeDef: main_graph_0_encoder0/conv_1/Conv2D = Conv2D[T=DT_FLOAT, data_format="NHWC", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 4, 4, 1], use_cudnn_on_gpu=true](visual_observation_0, main_graph_0_encoder0/conv_1/kernel/read). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
TensorFlow.TFStatus.CheckMaybeRaise (TensorFlow.TFStatus incomingStatus, System.Boolean last) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (TensorFlow.TFBuffer graphDef, TensorFlow.TFImportGraphDefOptions options, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (System.Byte[] buffer, TensorFlow.TFImportGraphDefOptions options, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
TensorFlow.TFGraph.Import (System.Byte[] buffer, System.String prefix, TensorFlow.TFStatus status) (at <6ed6db22f8874deba74ffe3e566039be>:0)
MLAgents.CoreBrainInternal.InitializeCoreBrain (MLAgents.Batcher brainBatcher) (at Assets/ML-Agents/Scripts/CoreBrainInternal.cs:132)
MLAgents.Brain.InitializeBrain (MLAgents.Academy aca, MLAgents.Batcher brainBatcher) (at Assets/ML-Agents/Scripts/Brain.cs:211)
MLAgents.Academy.InitializeEnvironment () (at Assets/ML-Agents/Scripts/Academy.cs:288)
MLAgents.Academy.Awake () (at Assets/ML-Agents/Scripts/Academy.cs:227)

This is only true with my bytes files. Bytes files provided with the examples work fine with TensorFlowSharp v0.3. So I get stuck to only look pre-trained files come with examples and cannot see the results of my own experiements.

My specs:
TensorFlow 1.7.1 (compiled myself without AVX)
Python 3.5.1
Windows 10 Family 64x
Laptop Asus X553MA
Intel Pentium CPU N3540
Unity 2018.2.0f2
ML Agents ToolKit v0.4b
TensorFlowSharp v0.4 and TensorFlowSharp v0.3
8Go of rams

@Liven28
Copy link

Liven28 commented Aug 26, 2018

Any news about that bug?

@xiaomaogy
Copy link
Contributor

@Liven28 We are still not sure what is causing this bug, it works on our windows test machine so we are still not able to reproduce it.

@xiaomaogy
Copy link
Contributor

@Pyroevil The error message you posted is saying the bytes file you generated is using a different tensorflow version than you place you are using it. If you want to try ml-agents v0.3, then you might want to switch all of them (including the tensorflowSharp plugin to v0.3, the tensorflow version to 1.4).

@xiaomaogy
Copy link
Contributor

@jjjuande The problem you mentioned is a different issue, the TensorFlow.Android.dll file is showing the error message, but it is not the cause for the crash.

@xiaomaogy
Copy link
Contributor

@m4Ssa So if you want to try ml-agents v0.3, switch all of them to v0.3, the communicator parameter error might be due to the protobuf file not compatible.

@Liven28
Copy link

Liven28 commented Sep 23, 2018

I had just reinstall windows10 Family 64x (with just windows and graphic drivers up to date, unity and ml-agent configured).
Same problem persist (crash on play internal brain).

Configuration : i7 920 / GTX 1060 / 12 Go Ram / Gigabyte GA-X58A-UD7
(no overclocking)

I tried on 2017.4 - 2018.1 - 2018.2 - 2018.3b
with Python 3.5.1, tensorflow 1.7.1, toolkit v0.4 / v0.5

@zetaFairlight
Copy link

Same problem here, everything works... except when I chose internal. It closes the screen as soon as I press play.

Using 2018.2.9f1 on Windows 10, CPU (not using GPU) Intel. Let me know if you want more info or testing,

@xiaomaogy
Copy link
Contributor

xiaomaogy commented Sep 27, 2018

@Liven28 @Gaby10 This is not something we can solve right now due to the reason I mentioned earlier. In v0.6 (which will be released in a few weeks)we will change the way internal brain works (It will be a scriptable object instead of a gameobject, and it will be called Learning Brain). If you guys want to try you can check this PR #1250.

@Liven28
Copy link

Liven28 commented Oct 2, 2018

@xiaomaogy ok very interesting. I cross fingers.
If you need to test something, you know where I am.

@jamu1989
Copy link

jamu1989 commented Oct 2, 2018

same here.
Windows 10 Pro 64 bit, AMD Phenom(tm) || x4 965, Unity 2018.2.10f1
Training works fine, also in the editor. But if i click on play, with an internal brain, unity instantly closes.

@destructor465
Copy link

Same, internal brains not working, on play unity closes instantly, while training works without a problem.

Specs:
Windows 10 Pro 64-bit (Build 17763)
Intel Pentium G4620
Unity 2018.2.13f1

@kudyk
Copy link

kudyk commented Oct 27, 2018

@xiaomaogy
When i try to use the older TFSharp version im getting an error in Unity:
Assets/ML-Agents/Scripts/RpcCommunicator.cs(23,9): error CS0246: The type or namespace name CommunicatorParameters' could not be found. Are you missing an assembly reference?
I've build my Tensorflow without Grpc support since it didnt work with Grpc enabled. Don't know if thats a related problem though.

Hello, the same issue, did you solve it, how?

@kudyk
Copy link

kudyk commented Oct 28, 2018

@xiaomaogy @beardordie @m4Ssa Hi guys, i think i found solve of the issue. I clone repository into new clear diretory, done all things by instruction of v0.5, but imported TensorFlowSharp for ml-agents v0.3 from here https://github.com/TimothyA86/ml-agents/blob/master/docs/Installation.md.

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

@destructor465
Copy link

My laptop has AMD A8-3520m processor, which without AVX support. Build of @Setmaster doesn`t works.

My CPU doesn't support AVX too, I will need to try you way.

@Liven28
Copy link

Liven28 commented Oct 31, 2018

@xiaomaogy
Hello, my i7 920 doesn't support AVX, I tried kudyk solution (ml_agent v0.5 / TensorFlowSharp from v0.3 ml_agent) et it seems to work!

I just tried 3Dball scene with internal brain and Unity didn't crash.
I didn't make more tests, waiting for official news about the viability of this solution.

@destructor465
Copy link

destructor465 commented Nov 1, 2018

kudyk solution works, but most examples throws long exceptions, one of them:

TFException: NodeDef mentions attr 'output_dtype' not in Op<name=Multinomial; signature=logits:T, num_samples:int32 -> output:int64; attr=seed:int,default=0; attr=seed2:int,default=0; attr=T:type,allowed=[DT_FLOAT, DT_DOUBLE, DT_INT32, DT_INT64, DT_UINT8, DT_INT16, DT_INT8, DT_UINT16, DT_HALF]; is_stateful=true>; NodeDef: multinomial_3/Multinomial = Multinomial[T=DT_FLOAT, output_dtype=DT_INT64, seed=670408, seed2=108](dense_3/MatMul, multinomial_3/Multinomial/num_samples). (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Working examples: 3D Ball, Bouncer, Crawler, Reacher, Tennis and Walker.

@flurrux
Copy link

flurrux commented Nov 4, 2018

kudyk solution works for me too

@xiaomaogy xiaomaogy removed the help-wanted Issue contains request for help or information. label Dec 18, 2018
@Liven28
Copy link

Liven28 commented Mar 3, 2019

is the v0.7 solve the problem?

@kudyk
Copy link

kudyk commented Mar 3, 2019

Hi everyone, v0.7 works with tensorflow 1.7.0 from here.

@Liven28
Copy link

Liven28 commented Mar 3, 2019

Does it meen that tensorflow 1.7.0 now worsk with non AVX CPUs (and we don't need to use 1.4.0 any more)
or that v0.7 doesn't work with 1.4.0 any more and non AVX CPUs can't use lm-agent ?

@kudyk
Copy link

kudyk commented Mar 3, 2019

@Liven28 Tensorflow that i use form the link above is third-party, unofficial, built with sse2 support by @fo40225 user. He have build of 1.7.1 too, but only for cuda gpu, and it doesn`t work for me.

@xiaomaogy
Copy link
Contributor

Since we've switched from TensorFlowSharp to Barracuda, this issue is no longer relevant. I will close it for now. Feel free to open if you want to discuss more.

@lock
Copy link

lock bot commented Apr 2, 2020

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests