Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error after 50,000 iterations with Alien-v0 environment running on my MacOS Sierra #10

Closed
shyamalschandra opened this issue Sep 27, 2016 · 6 comments

Comments

@shyamalschandra
Copy link

This is the log of the events. What should I do? Thanks!

iMac:DQN-tensorflow shyamalsuhanachandra$ python main.py --env_name=Alien-v0 --is_train=True --display=True
 [*] GPU : 1.0000
[2016-09-27 17:28:27,334] Making new env: Alien-v0
{'_save_step': 500000,
 '_test_step': 50000,
 'action_repeat': 4,
 'backend': 'tf',
 'batch_size': 32,
 'cnn_format': 'NCHW',
 'discount': 0.99,
 'display': True,
 'double_q': False,
 'dueling': False,
 'env_name': 'Alien-v0',
 'env_type': 'detail',
 'ep_end': 0.1,
 'ep_end_t': 1000000,
 'ep_start': 1.0,
 'history_length': 4,
 'learn_start': 50000.0,
 'learning_rate': 0.00025,
 'learning_rate_decay': 0.96,
 'learning_rate_decay_step': 50000,
 'learning_rate_minimum': 0.00025,
 'max_delta': 1,
 'max_reward': 1.0,
 'max_step': 50000000,
 'memory_size': 1000000,
 'min_delta': -1,
 'min_reward': -1.0,
 'model': 'm1',
 'random_start': 30,
 'scale': 10000,
 'screen_height': 84,
 'screen_width': 84,
 'target_q_update_step': 10000,
 'train_frequency': 4}
 [*] Loading checkpoints...
 [!] Load FAILED: checkpoints/Alien-v0/min_delta--1/max_delta-1/history_length-4/train_frequency-4/target_q_update_step-10000/double_q-False/memory_size-1000000/action_repeat-4/ep_end_t-1000000/dueling-False/min_reward--1.0/backend-tf/random_start-30/scale-10000/env_type-detail/learning_rate_decay_step-50000/ep_start-1.0/screen_width-84/learn_start-50000.0/cnn_format-NCHW/learning_rate-0.00025/batch_size-32/discount-0.99/max_step-50000000/max_reward-1.0/learning_rate_decay-0.96/learning_rate_minimum-0.00025/env_name-Alien-v0/ep_end-0.1/model-m1/screen_height-84/
2016-09-27 17:28:28.996 Python[26135:3913383] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/m1/b_t9_2151y30ryvtr_2gznch0000gp/T/org.python.python.savedState
  0%|                    | 49999/50000000 [14:17<235:26:28, 58.93it/s]E tensorflow/core/common_runtime/executor.cc:334] Executor failed to create kernel. Invalid argument: CPU BiasOp only supports NHWC.
     [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]

Traceback (most recent call last):
  File "main.py", line 66, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "main.py", line 61, in main
    agent.train()
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 56, in train
    self.observe(screen, reward, action, terminal)
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 135, in observe
    self.q_learning_mini_batch()
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 157, in q_learning_mini_batch
    q_t_plus_1 = self.target_q.eval({self.target_s_t: s_t_plus_1})
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 559, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3656, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 710, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 908, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 958, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 978, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors.InvalidArgumentError: CPU BiasOp only supports NHWC.
     [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]
Caused by op u'target/target_l1/BiasAdd', defined at:
  File "main.py", line 66, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 30, in run
    sys.exit(main(sys.argv))
  File "main.py", line 58, in main
    agent = Agent(config, env, sess)
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 29, in __init__
    self.build_dqn()
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/agent.py", line 240, in build_dqn
    32, [8, 8], [4, 4], initializer, activation_fn, self.cnn_format, name='target_l1')
  File "/Users/shyamalsuhanachandra/DQN-tensorflow/dqn/ops.py", line 25, in conv2d
    out = tf.nn.bias_add(conv, b, data_format)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 391, in bias_add
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 279, in _bias_add
    data_format=data_format, name=name)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 703, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2317, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1239, in __init__
    self._traceback = _extract_stack()
@mthrok
Copy link

mthrok commented Sep 27, 2016

Your checkpoint data is saved with GPU and convolution format 'NCHW' (See data_format argument) but when you try to load data, it tries to load data using CPU which only support NHWC.

I think you can change the tf.nn.bias_add operation in DQN-tensorflow/dqn/ops.py line 25 to manual addition and reshaped the bias operation manually.

@jacktang
Copy link

jacktang commented Oct 7, 2016

I tried the command line in README and the same problem occurred on my MacBook Pro. The error message:

InvalidArgumentError (see above for traceback): CPU BiasOp only supports NHWC.
     [[Node: target/target_l1/BiasAdd = BiasAdd[T=DT_FLOAT, data_format="NCHW", _device="/job:localhost/replica:0/task:0/cpu:0"](target/target_l1/Conv2D, target/target_l1/biases/read)]]

And I don't understand why the checkpoint data is saved with GPU and load data using CPU, is that bug? When I change the option use_gpu to False, everything works fine.

@shyamalschandra
Copy link
Author

@mthrok : Could you send me the patch for the ops.py file? Thanks again!

@mthrok
Copy link

mthrok commented Oct 10, 2016

@shyamalschandra I am not using this code so I don't have a patch. Sorry.

@zhijiew
Copy link

zhijiew commented Dec 5, 2016

@shyamalschandra Have you solved the problem? I encounter the same problem...

@yenchenlin
Copy link

Hi all, just change True in this [line] to False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants