-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nothing in Tensorboard after Eval steps #122
Comments
Hi, The evaluation steps may have finished before you were able to see any updates in TensorBoard. Have you tried running wit more evaluation steps? Do you see updates during training? |
Hi, We pushed a new image which uses a different parameter to control the frequency of evaluations. You can specify it as follows: hyperparameters={'throttle_secs': 30} Where throttle_secs is the minimum amount of elapsed time between evaluations. By default this value is 600, so it'll only update once every 10 minutes. We'll update our example notebooks to document this. |
Thanks for the response @winstonaws! Appreciate you guys being so active in helping out the new SageMaker community! |
@winstonaws : I have tried setting the value of throttle_secs using hyperparameters as mentioned above, but it is not getting reflected while running the job.
|
I am not seeing tensorboard working with the above suggested changes as well. I thought it may have fixed it, but it doesn't look like it. |
@samuelhkahn : were you able to find any workaround for this ? @winstonaws : please suggest what should be done in order to update the evaluation throttle duration. |
@samuelhkahn @chang2394 What version of the python SDK are you using? The fixes don't go out automatically to the notebook instances at the moment, unfortunately. The fix needed is in this change: #105 Can you try updating to the latest version and rerunning it? You can do this by running this in your notebook: !pip install --upgrade sagemaker If it's still not working correctly, what behavior are you seeing? Does it differ from your experience when trying out https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_resnet_cifar10_with_tensorboard/tensorflow_resnet_cifar10_with_tensorboard.ipynb ? When I run that example I can see the TensorBoard UI as soon as I call fit, and every time the training job evaluates (which happens at the throttle_secs frequency), I see the scalars update. @chang2394 |
@winstonaws : I am able to see data in tensorboard UI, not sure what was the cause of the issue. Right now, I am using sagemaker 1.2.4 and tensorflow 1.6. |
@chang2394 Great! Are you still having any other problems with tensorboard? |
@winstonaws No, it seems to be working fine as of now. Thanks a lot :) |
Fixed the roles from all the notebooks
I recently upgraded to the most recent release of the python SDK through a
pip upgrade
. I followed a couple other Github threads and it said the with the most recent version ofsagemaker-python-sdk
this was solved (temporarily). But I am not seeing anything updating in my Local Tensorboard Instance. Am I missing something?This is the model and helper functions I am deploying with SageMaker:
here is my train script to actually deploy this model:
and here is a sample of the logs:
Any tips or clues would be greatly appreciated.
The text was updated successfully, but these errors were encountered: