Keras_model_fn global steps dont increase #444

gautiese · 2018-10-26T07:27:00Z

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans):Tensorflow / Keras
Framework Version:1.11
Python Version:2
CPU or GPU:GPU
Python SDK Version:
Are you using a custom image:No

Describe the problem

I am getting an error that the global steps are not increasing. Further, the training seeps to have a Warm Start (dont know why, or if thats a problem at all)

Minimal repro / logs

2018-10-26 07:24:04,680 INFO - tensorflow - loss = 0.5054888, step = 0 2018-10-26 07:24:12,735 WARNING - tensorflow - It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize. 2018-10-26 07:24:18,908 WARNING - tensorflow - It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize. 2018-10-26 07:24:25,089 WARNING - tensorflow - It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize. 2018-10-26 07:24:31,263 WARNING - tensorflow - It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize. 2018-10-26 07:24:39,229 WARNING - tensorflow - It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize. 2018-10-26 07:24:39,230 INFO - tensorflow - loss = 0.5811208, step = 0 (34.550 sec) 2018-10-26 07:24:45,407 WARNING - tensorflow - It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 0 vs previous value: 0. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.

This is happening on any keras_model_fn, even then one in examples.

The text was updated successfully, but these errors were encountered:

mvsusp · 2018-10-26T17:35:29Z

Hi @gautiese

It seems that Keras optimizers are not working as they used to be in TF 1.11. This issue disappears if you change the Keras optimizer by a TF optimizer. We changed the example to address this change https://github.com/awslabs/amazon-sagemaker-examples/pull/442/files?utf8=%E2%9C%93&diff=split&w=1#diff-11cd3635a41f977f48d21c3a65d3e774L71

I will close this ticket now. Feel free to open it again if you have additional questions.

Thanks for using SageMaker!

#526) (#530)

mvsusp closed this as completed Oct 26, 2018

ChoiByungWook pushed a commit that referenced this issue Dec 8, 2020

feature: support creating and updating profiler in training job (#444) (

f8c5287

#526) (#530)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Keras_model_fn global steps dont increase #444

Keras_model_fn global steps dont increase #444

gautiese commented Oct 26, 2018

mvsusp commented Oct 26, 2018

Keras_model_fn global steps dont increase #444

Keras_model_fn global steps dont increase #444

Comments

gautiese commented Oct 26, 2018

System Information

Describe the problem

Minimal repro / logs

mvsusp commented Oct 26, 2018