debuggerHook is not saving tensors in s3

**Describe the bug**

Exception during rule evaluation: Customer Error: No debugging data was saved by the training job. Check that the debugger hook was configured correctly before starting the training job. Exception: Training job has ended. All the collection files could not be loaded | Exception during rule evaluation: Customer Error: No debugging data was saved by the training job. Check that the debugger hook was configured correctly before starting the training job. Exception: Training job has ended. All the collection files could not be loaded

**To reproduce**
Train FrameWork Xgboost with debugger hook as below

```
from sagemaker.xgboost import XGBoost
from sagemaker.debugger import rule_configs, Rule, DebuggerHookConfig, CollectionConfig

hyperparams = {"max_depth":5,
               "subsample":0.8,
               "num_round":600,
               "eta":0.2,
               "gamma":4,
               "min_child_weight":6,
               "silent":0,
               "objective":'multi:softmax',
               "num_class":len(le.classes_),
               "smdebug_path":f"s3://{bucket}/{prefix}/debug",
               "smdebug_collections":"metrics,feature_importance"
              }
save_interval = 5

entry_point_script = "xgboost_dest_prediction.py"

trial = Trial.create(trial_name="framework-mode-trial-{}".format(time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())), 
                     experiment_name=destination_prediction_experiment.experiment_name,
                     sagemaker_boto_client=boto3.client('sagemaker'))

framework_xgb = XGBoost(
                      entry_point=entry_point_script,
                      role=sagemaker.get_execution_role(),
                      framework_version='0.90-2',
                      py_version="py3",
                      hyperparameters=hyperparams,
                      instance_count=1, 
                      instance_type='ml.m4.xlarge',
                      output_path='s3://{}/{}/output'.format(bucket, prefix),
                      base_job_name="demo-xgboost-destination-prediction",
                      sagemaker_session=sm_sess,
#                       rules=debug_rules,
                      use_spot_instances = True,
                      max_run = 3600,
                      max_wait = 3600,
                      input_mode = 'File',
                      debugger_hook_config=DebuggerHookConfig(
                            s3_output_path=f"s3://{bucket}/{prefix}/debug",  # Required
                            collection_configs=[
                                CollectionConfig(
                                    name="metrics",
                                    parameters={
                                        "save_interval": str(save_interval)
                                    }
                                )
                            ],
                        ),

                      rules=[
                            Rule.sagemaker(
                                rule_configs.loss_not_decreasing(),
                                rule_parameters={
                                    "collection_names": "metrics",
                                    "num_steps": str(save_interval * 2),
                                },
                            ),
                        ],

                    )

framework_xgb.fit({'train': s3_input_train,
                   'validation': s3_input_validation}, 
                  experiment_config={
                      "ExperimentName": destination_prediction_experiment.experiment_name, 
                      "TrialName": trial.trial_name,
                      "TrialComponentDisplayName": "Training",
                  })
```

**Expected behavior**
I should get tensors saved in s3

**Screenshots or logs**

```
[{'RuleConfigurationName': 'LossNotDecreasing',
  'RuleEvaluationJobArn': 'arn:aws:sagemaker:us-west-2:990360540682:processing-job/demo-xgboost-destination-p-lossnotdecreasing-abb2296f',
  'RuleEvaluationStatus': 'Error',
  'StatusDetails': 'ClientError: No debugging data was saved by the training job. Check that the debugger hook was configured correctly before starting the training job. Exception: Training job has ended. All the collection files could not be loaded\nTraceback (most recent call last):\n  File "evaluate.py", line 112, in _create_trials\n    range_steps=(self.start_step, self.end_step))\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/utils.py", line 20, in create_trial\n    return LocalTrial(name=name, dirname=path, **kwargs)\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/local_trial.py", line 36, in __init__\n    self._load_collections()\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/trial.py", line 168, in _load_collections\n    _wait_for_collection_files(1)  # wait for the first collection file\n  File "/usr/local/lib/python3.7/site-packages/smdebug/trials/trial.py", line 165, in _wait_for_collection_files\n    raise MissingCollectionFiles\nsmdebug.exceptions.MissingCollectionFiles: Trainin',
  'LastModifiedTime': datetime.datetime(2020, 9, 18, 11, 6, 27, 290000, tzinfo=tzlocal())}]
```

**System information**
SageMaker Python SDK version: 2.6
Framework name (eg. PyTorch) or algorithm (eg. KMeans): xgboost frame work
Framework version: 0.90-2
Python version: 3.8
CPU or GPU: CPU
Custom Docker image (Y/N): N




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

debuggerHook is not saving tensors in s3 #1907

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

debuggerHook is not saving tensors in s3 #1907

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions