Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can't load my trained model #3278

Closed
JamesCann opened this issue Jan 23, 2020 · 6 comments
Closed

can't load my trained model #3278

JamesCann opened this issue Jan 23, 2020 · 6 comments
Assignees
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@JamesCann
Copy link

Describe the bug
Can't load the trained model to continue the training session.

The command is the same to start the training but I added '--load' to the command line to continue. The moment I press Play in Unity the session seems to just end.

See below for command-line dump

This is "Windows 10"

C:\Users\james>pip3 show mlagents
Name: mlagents
Version: 0.13.1
Summary: Unity Machine Learning Agents
Home-page: https://github.com/Unity-Technologies/ml-agents
Author: Unity Technologies
Author-email: ML-Agents@unity3d.com
License: UNKNOWN
Location: c:\users\james\appdata\local\programs\python\python36\lib\site-packages
Requires: h5py, Pillow, tensorflow, mlagents-envs, pyyaml, jupyter, pypiwin32, grpcio, matplotlib, numpy, protobuf
Required-by:

C:\Users\james>pip3 show tensorflow
Name: tensorflow
Version: 2.0.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: c:\users\james\appdata\local\programs\python\python36\lib\site-packages
Requires: wheel, google-pasta, six, gast, termcolor, keras-preprocessing, absl-py, grpcio, keras-applications, tensorboard, numpy, astor, opt-einsum, protobuf, tensorflow-estimator, wrapt
Required-by: mlagents

C:\ml-agents>mlagents-learn config/trainer_config.yaml --run-id=thirdRun --train --load
WARNING:tensorflow:From c:\users\james\appdata\local\programs\python\python36\lib\site-packages\tensorflow_core\python\compat\v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term

                    ▄▄▄▓▓▓▓
               ╓▓▓▓▓▓▓█▓▓▓▓▓
          ,▄▄▄m▀▀▀'  ,▓▓▓▀▓▓▄                           ▓▓▓  ▓▓▌
        ▄▓▓▓▀'      ▄▓▓▀  ▓▓▓      ▄▄     ▄▄ ,▄▄ ▄▄▄▄   ,▄▄ ▄▓▓▌▄ ▄▄▄    ,▄▄
      ▄▓▓▓▀        ▄▓▓▀   ▐▓▓▌     ▓▓▌   ▐▓▓ ▐▓▓▓▀▀▀▓▓▌ ▓▓▓ ▀▓▓▌▀ ^▓▓▌  ╒▓▓▌
    ▄▓▓▓▓▓▄▄▄▄▄▄▄▄▓▓▓      ▓▀      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌   ▐▓▓▄ ▓▓▌
    ▀▓▓▓▓▀▀▀▀▀▀▀▀▀▀▓▓▄     ▓▓      ▓▓▌   ▐▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▌    ▐▓▓▐▓▓
      ^█▓▓▓        ▀▓▓▄   ▐▓▓▌     ▓▓▓▓▄▓▓▓▓ ▐▓▓    ▓▓▓ ▓▓▓  ▓▓▓▄    ▓▓▓▓`
        '▀▓▓▓▄      ^▓▓▓  ▓▓▓       └▀▀▀▀ ▀▀ ^▀▀    `▀▀ `▀▀   '▀▀    ▐▓▓▌
           ▀▀▀▀▓▄▄▄   ▓▓▓▓▓▓,                                      ▓▓▓▓▀
               `▀█▓▓▓▓▓▓▓▓▓▌
                    ¬`▀▀▀█▓

Version information:
ml-agents: 0.13.1,
ml-agents-envs: 0.13.1,
Communicator API: API-13,
TensorFlow: 2.0.0
INFO:mlagents.trainers:CommandLineOptions(debug=False, num_runs=1, seed=-1, env_path=None, run_id='thirdRun', load_model=True, train_model=True, save_freq=50000, keep_checkpoints=5, base_port=5005, num_envs=1, curriculum_folder=None, lesson=0, no_graphics=False, multi_gpu=False, trainer_config_path='config/trainer_config.yaml', sampler_file_path=None, docker_target_name=None, env_args=None, cpu=False, width=84, height=84, quality_level=5, time_scale=20, target_frame_rate=-1)
INFO:mlagents_envs:Listening on port 5004. Start training by pressing the Play button in the Unity Editor.

@JamesCann JamesCann added the bug Issue describes a potential bug in ml-agents. label Jan 23, 2020
@JamesCann
Copy link
Author

Here is the command line dump when it attempts to rerun then exists immediately

INFO:mlagents_envs:Connected new brain:
thirdRun?team=0
INFO:mlagents.trainers:Hyperparameters for the PPOTrainer of brain thirdRun:
trainer: ppo
batch_size: 1024
beta: 0.005
buffer_size: 10240
epsilon: 0.2
hidden_units: 16
lambd: 0.95
learning_rate: 0.0003
learning_rate_schedule: linear
max_steps: 5000000
memory_size: 256
normalize: False
num_epoch: 3
num_layers: 2
time_horizon: 64
sequence_length: 64
summary_freq: 1000
use_recurrent: False
vis_encode_type: simple
reward_signals:
extrinsic:
strength: 1.0
gamma: 0.99
summary_path: thirdRun_thirdRun
model_path: ./models/thirdRun-0/thirdRun
keep_checkpoints: 5
2020-01-23 10:26:30.135661: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
INFO:mlagents.trainers:Loading Model for brain thirdRun?team=0
INFO:mlagents.trainers:Saved Model
INFO:mlagents.trainers:List of nodes to export for brain :thirdRun?team=0
INFO:mlagents.trainers: is_continuous_control
INFO:mlagents.trainers: version_number
INFO:mlagents.trainers: memory_size
INFO:mlagents.trainers: action_output_shape
INFO:mlagents.trainers: action
INFO:mlagents.trainers: action_probs
Converting ./models/thirdRun-0/thirdRun/frozen_graph_def.pb to ./models/thirdRun-0/thirdRun.nn
IGNORED: StopGradient unknown layer
GLOBALS: 'is_continuous_control', 'version_number', 'memory_size', 'action_output_shape'
IN: 'vector_observation': [-1, 1, 1, 18] => 'main_graph_0/hidden_0/BiasAdd'
IN: 'epsilon': [-1, 1, 1, 2] => 'mul'
OUT: 'action', 'action_probs'
DONE: wrote ./models/thirdRun-0/thirdRun.nn file.
INFO:mlagents.trainers:Exported ./models/thirdRun-0/thirdRun.nn file

@vincentpierre
Copy link
Contributor

Using the --load argument will resume the training where it ended. It will save for example the number of training steps that have happened in the previous training session. My guess is that you need to increase the max_steps argument in the training configuration yaml file. --load is to continue training not restart a training session with a previously trained behavior as starting point.

@vincentpierre vincentpierre self-assigned this Jan 23, 2020
@JamesCann
Copy link
Author

ill give that a try today, thanks

@JamesCann
Copy link
Author

Ok tried that, this time it didn't end. However, the model doesn't seem to have remembered anything. It looks as confused as when it was initially executed. Step count seems back to the beginning.

@JamesCann
Copy link
Author

MY BAD!! It's working now, I forgot to add the --load parameter this time. Looks GREAT, thank you

@github-actions
Copy link

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

2 participants