Roboschool_ray examples not creating output folder

Please fill out the form below.

### System Information
ml.c5.2xlarge

### Describe the problem
I was trying to run rl_roboschool_ray example notebook
But the s3://<your_s3_bucket>/<training_job_name>/output folder was not created for hopper and humanoid case., so there was no intermediate training video saved, nor the final model.tar.gz.  
I tried the reacher example, the output folder was created fine. 
And output folder never created for hopper and humanoid cases after I tried couple of times
The hopper and humanoid training also never ends till time reach train_max_run. 

I searched the issues, there was a similar one "Ray RLLib examples not saving model output #581"
But I don't quite get the response of "According to user script in the example checkpoints should be saved to ['opt/ml/output/intermediate' folder](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_roboschool_ray/common/sagemaker_rl/ray_launcher.py#L95) and moved to `s3://<your_s3_bucket>/<training_job_name>/output/intermediate` location during training.
You can modify [the user script](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/reinforcement_learning/rl_roboschool_ray/common/sagemaker_rl/ray_launcher.py#L95) to save checkpoints to `/opt/ml/model` directory at the end of the training instead."

The cloudwatch log looks fine, but there is no "output" folder in S3 
It seems the sync between /opt/ml/output and s3://<your_s3_bucket>/<training_job_name>/output is not always working



21:11:12
== Status ==
21:11:12
Using FIFO scheduling algorithm.
21:11:12
Resources requested: 8/8 CPUs, 0/0 GPUs
21:11:12
Result logdir: /opt/ml/output/intermediate/training
21:11:12
RUNNING trials: - PPO_RoboschoolHumanoid-v1_0:#011RUNNING [pid=124], 694 s, 4 iter, 1280398 ts, -83.2 rew
21:13:51
== sgd epochs ==
21:13:52
0 {'cur_lr': 9.999999747378752e-05, 'total_loss': -0.00026426092, 'policy_loss': -0.00033865558, 'vf_loss': 0.0, 'vf_explained_var': -1.0, 'kl': 0.00014878887, 'entropy': 23.64479}
21:13:53
1 {'cur_lr': 9.999999747378752e-05, 'total_loss': -0.0022162762, 'policy_loss': -0.0030561804, 'vf_loss': 0.0, 'vf_explained_var': -1.0, 'kl': 0.0016798142, 'entropy': 23.630758}
21:13:54
2 {'cur_lr': 9.999999747378752e-05, 'total_loss': -0.0035353287, 'policy_loss': -0.0055731665, 'vf_loss': 0.0, 'vf_explained_var': -1.0, 'kl': 0.004075686, 'entropy': 23.615211}
21:13:55
3 {'cur_lr': 9.999999747378752e-05, 'total_loss': -0.004284208, 'policy_loss': -0.0067875045, 'vf_loss': 0.0, 'vf_explained_var': -1.0, 'kl': 0.005006588, 'entropy': 23.599758}
21:13:56
4 {'cur_lr': 9.999999747378752e-05, 'total_loss': -0.00491518, 'policy_loss': -0.0074260524, 'vf_loss': 0.0, 'vf_explained_var': -1.0, 'kl': 0.0050217416, 'entropy': 23.58474}
........
Result for PPO_RoboschoolHumanoid-v1_0: date: 2019-02-05_21-14-01 done: false episode_len_mean: 19.204132823698043 episode_reward_max: -39.52636344946514 episode_reward_mean


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roboschool_ray examples not creating output folder #627

System Information

Describe the problem

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roboschool_ray examples not creating output folder #627

Description

System Information

Describe the problem

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions