# Tensor analysis using Amazon SageMaker Debugger

Looking at the distributions of activation inputs/outputs, gradients and weights per layer can give useful insights. For instance, it helps to understand whether the model runs into problems like neuron saturation, whether there are layers in your model that are not learning at all or whether the network consists of too many layers etc. 

The following animation shows the distribution of gradients of a convolutional layer from an example application  as the training progresses. We can see that it starts as Gaussian distribution but then becomes more and more narrow. We can also see that the range of gradients starts very small (order of $1e-5$) and becomes even tinier as training progresses. If tiny gradients are observed from the start of training, it is an indication that we should check the hyperparameters of our model. 

![](images/example.gif)

In this notebook we will train a poorly configured neural network and use Amazon SageMaker Debugger with custom rules to aggregate and analyse specific tensors. Before we proceed let us install the smdebug binary which allows us to perform interactive analysis in this notebook. After installing it, please restart the kernel, and when you come back skip this cell.

### Installing smdebug

In [1]:
!  python -m pip install smdebug

Collecting smdebug
  Downloading smdebug-0.7.2-py2.py3-none-any.whl (162 kB)
[K     |████████████████████████████████| 162 kB 2.8 MB/s eta 0:00:01
Installing collected packages: smdebug
Successfully installed smdebug-0.7.2


### Configuring the inputs for the training job

Now we'll call the Sagemaker MXNet Estimator to kick off a training job . The `entry_point_script` points to the MXNet training script. The users can create a custom *SessionHook* in their training script. If they chose not to create such hook in the training script (similar to the one we will be using in this example) Amazon SageMaker Debugger will create the appropriate *SessionHook* based on specified *DebugHookConfig* parameters.

The `hyperparameters` are the parameters that will be passed to the training script. We choose `Uniform(1)` as initializer and learning rate of `0.001`. This leads to the model not training well because the model is poorly initialized.

The goal of a good intialization is 
- to break the symmetry such that parameters do not receive same gradients and updates
- to keep variance similar across layers

A bad intialization may lead to vanishing or exploiding gradients and the model not training at all. Once the training is finished we will look at the distirbutions of activation inputs/outputs, gradients and weights across the training to see how these hyperparameters influenced the training.


In [5]:
entry_point_script = 'mnist.py'
bad_hyperparameters = {'initializer': 2, 'lr': 0.001}

In [6]:
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker.debugger import DebuggerHookConfig, CollectionConfig
import boto3
import os

sagemaker_session = sagemaker.Session()
BUCKET_NAME = sagemaker_session.default_bucket()
LOCATION_IN_BUCKET = 'smdebug-mnist-tensor-analysis'

s3_bucket_for_tensors = 's3://{BUCKET_NAME}/{LOCATION_IN_BUCKET}'.format(BUCKET_NAME=BUCKET_NAME, LOCATION_IN_BUCKET=LOCATION_IN_BUCKET)
estimator = MXNet(role=sagemaker.get_execution_role(),
                  base_job_name='mxnet',
                  train_instance_count=1,
                  train_instance_type='ml.m5.xlarge',
                  train_volume_size=400,
                  source_dir='src',
                  entry_point=entry_point_script,
                  hyperparameters=bad_hyperparameters,
                  framework_version='1.6.0',
                  py_version='py3',
                  debugger_hook_config = DebuggerHookConfig(
                      s3_output_path=s3_bucket_for_tensors,  
                      collection_configs=[
                        CollectionConfig(
                            name="all",
                            parameters={
                                "include_regex": ".*",
                                "save_interval": "100"
                            }
                        )
                     ]
                   )
                )

Start the training job

In [4]:
estimator.fit(wait=False)

### Get S3 location of tensors

We can get information related to the training job:

In [5]:
job_name = estimator.latest_training_job.name
client = estimator.sagemaker_session.sagemaker_client
description = client.describe_training_job(TrainingJobName=job_name)
description

{'TrainingJobName': 'mxnet-2020-04-27-21-37-10-765',
 'TrainingJobArn': 'arn:aws:sagemaker:us-east-2:441510144314:training-job/mxnet-2020-04-27-21-37-10-765',
 'ModelArtifacts': {'S3ModelArtifacts': 's3://sagemaker-us-east-2-441510144314/mxnet-2020-04-27-21-37-10-765/output/model.tar.gz'},
 'TrainingJobStatus': 'Completed',
 'SecondaryStatus': 'Completed',
 'HyperParameters': {'initializer': '2',
  'lr': '0.001',
  'sagemaker_container_log_level': '20',
  'sagemaker_enable_cloudwatch_metrics': 'false',
  'sagemaker_job_name': '"mxnet-2020-04-27-21-37-10-765"',
  'sagemaker_program': '"mnist.py"',
  'sagemaker_region': '"us-east-2"',
  'sagemaker_submit_directory': '"s3://sagemaker-us-east-2-441510144314/mxnet-2020-04-27-21-37-10-765/source/sourcedir.tar.gz"'},
 'AlgorithmSpecification': {'TrainingImage': '763104351884.dkr.ecr.us-east-2.amazonaws.com/mxnet-training:1.6.0-cpu-py3',
  'TrainingInputMode': 'File',
  'EnableSageMakerMetricsTimeSeries': True},
 'RoleArn': 'arn:aws:iam::44151

We can retrieve the S3 location of the tensors:

In [6]:
path = estimator.latest_job_debugger_artifacts_path()
print('Tensors are stored in: ', path)

Tensors are stored in:  s3://sagemaker-us-east-2-441510144314/smdebug-mnist-tensor-analysis/mxnet-2020-04-27-21-37-10-765/debug-output


### Download tensors from S3

Now we will download the tensors from S3, so that we can visualize them in our notebook.

In [7]:
folder_name = "/tmp/{}".format(path.split("/")[-1])
os.system("aws s3 cp --recursive {} {}".format(path,folder_name))
print('Downloading tensors into folder: ', folder_name)

Downloading tensors into folder:  /tmp/debug-output


Now that we have obtained the tensors from our training job, it is time to plot the distribution of different layers. 
In the following sections we will use Amazon SageMaker Debugger and custom rules to retrieve certain tensors. Typically, rules are supposed to return True or False. However in this notebook we will use custom rules to return dictionaries of aggregated tensors per layer and step, which we then plot afterwards.

### Activation outputs
This rule will use Amazon SageMaker Debugger to retrieve tensors from the ReLU output layers. It sums the activations across batch and steps. If there is a large fraction of ReLUs outputing 0 across many steps it means that the neuron is dying.

In [8]:
from smdebug.trials import create_trial
from smdebug.rules.rule_invoker import invoke_rule
from smdebug.exceptions import NoMoreData
from smdebug.rules.rule import Rule
import numpy as np
import utils
import collections
import os
from IPython.display import Image

In [9]:
class ActivationOutputs(Rule):
    def __init__(self, base_trial):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict() 
    
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*relu_output'):
            if "gradients" not in tname:
                try:
                    tensor = self.base_trial.tensor(tname).value(step)
                    if tname not in self.tensors:
                        self.tensors[tname] = collections.OrderedDict()
                    if step not in self.tensors[tname]:
                        self.tensors[tname][step] = 0
                    neg_values = np.where(tensor <= 0)[0]
                    if len(neg_values) > 0:
                        self.logger.info(f" Step {step} tensor  {tname}  has {len(neg_values)/tensor.size*100}% activation outputs which are smaller than 0 ")
                    batch_over_sum = np.sum(tensor, axis=0)/tensor.shape[0]
                    self.tensors[tname][step] += batch_over_sum
                except:
                    self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = ActivationOutputs(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


[2020-04-27 22:20:52.397 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:20:52.414 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule ActivationOutputs at step 0
[2020-04-27 22:20:52.416 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:20:53.418 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:20:53.432 f8455ab5c5ab:17 INFO <ipython-input-9-c7f9e2eb0647>:17]  Step 0 tensor  conv0_relu_output_0  has 48.82066514756944% activation outputs which are smaller than 0 
[2020-04-27 22:20:53.438 f8455ab5c5ab:17 INFO <ipython-input-9-c7f9e2eb0647>:17]  Step 0 tensor  conv1_relu_output_0  has 51.22558593749999% activation outputs which are smaller than 0 
[2020-04-27 22:20:53.440 f8455ab5c5ab:17 INFO <ipython-input-9-c7f9e2eb0647>:17]  Step 0 tensor  dense0_relu_output_0  has 53.001302083333336% activation outputs which are smaller than 0 
[2020-0

Plot the histograms

In [10]:
utils.create_interactive_matplotlib_histogram(rule.tensors, filename='images/activation_outputs.gif')

In [11]:
Image(url='images/activation_outputs.gif')

### Activation Inputs
In this rule we look at the inputs into activation function, rather than the output. This can be helpful to understand if there are extreme negative or positive values that saturate the activation functions. 

In [12]:
class ActivationInputs(Rule):
    def __init__(self, base_trial):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict() 
        
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*relu_input'):
            if "gradients" not in tname:
                try:
                    tensor = self.base_trial.tensor(tname).value(step)
                    if tname not in self.tensors:
                        self.tensors[tname] = {}
                    if step not in self.tensors[tname]:
                        self.tensors[tname][step] = 0
                    neg_values = np.where(tensor <= 0)[0]
                    if len(neg_values) > 0:
                        self.logger.info(f" Tensor  {tname}  has {len(neg_values)/tensor.size*100}% activation inputs which are smaller than 0 ")
                    batch_over_sum = np.sum(tensor, axis=0)/tensor.shape[0]
                    self.tensors[tname][step] += batch_over_sum
                except:
                    self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = ActivationInputs(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


[2020-04-27 22:22:24.752 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:22:24.767 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule ActivationInputs at step 0
[2020-04-27 22:22:24.768 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:22:25.770 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:22:25.778 f8455ab5c5ab:17 INFO <ipython-input-12-92e74c737aaa>:17]  Tensor  conv0_relu_input_0  has 48.82066514756944% activation inputs which are smaller than 0 
[2020-04-27 22:22:25.783 f8455ab5c5ab:17 INFO <ipython-input-12-92e74c737aaa>:17]  Tensor  conv1_relu_input_0  has 51.22558593749999% activation inputs which are smaller than 0 
[2020-04-27 22:22:25.785 f8455ab5c5ab:17 INFO <ipython-input-12-92e74c737aaa>:17]  Tensor  dense0_relu_input_0  has 53.001302083333336% activation inputs which are smaller than 0 
[2020-04-27 22:22:25.786 f8455ab

Plot the histograms

In [13]:
utils.create_interactive_matplotlib_histogram(rule.tensors, filename='images/activation_inputs.gif')

We can see that second convolutional layer `conv1_relu_input_0` receives only negative input values, which means that all ReLUs in this layer output 0.

In [14]:
Image(url='images/activation_inputs.gif')

### Gradients
The following code retrieves the gradients and plots their distribution. If variance is tiny, that means that the model parameters do not get updated effectively with each training step or that the training has converged to a minimum.

In [15]:
class GradientsLayer(Rule):
    def __init__(self, base_trial):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict()  
        
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*gradient'):
            try:
                tensor = self.base_trial.tensor(tname).value(step)
                if tname not in self.tensors:
                    self.tensors[tname] = {}

                self.logger.info(f" Tensor  {tname}  has gradients range: {np.min(tensor)} {np.max(tensor)} ")
                self.tensors[tname][step] = tensor
            except:
                self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = GradientsLayer(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')

[2020-04-27 22:25:51.768 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:25:51.781 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule GradientsLayer at step 0
[2020-04-27 22:25:51.782 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:25:52.784 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:25:52.786 f8455ab5c5ab:17 INFO <ipython-input-15-1efdd7f3ed18>:13]  Tensor  gradient/conv0_bias  has gradients range: -5149.84033203125 31646.48828125 
[2020-04-27 22:25:52.787 f8455ab5c5ab:17 INFO <ipython-input-15-1efdd7f3ed18>:13]  Tensor  gradient/conv0_weight  has gradients range: -13980.2021484375 39929.21484375 
[2020-04-27 22:25:52.788 f8455ab5c5ab:17 INFO <ipython-input-15-1efdd7f3ed18>:13]  Tensor  gradient/conv1_bias  has gradients range: -1813.0926513671875 10602.732421875 
[2020-04-27 22:25:52.789 f8455ab5c5ab:17 INFO <ipython-input-15-

Plot the histograms

In [16]:
utils.create_interactive_matplotlib_histogram(rule.tensors, filename='images/gradients.gif')

In [17]:
Image(url='images/gradients.gif')

### Check variance across layers
The rule retrieves gradients, but this time we compare variance of gradient distribution across layers. We want to identify if there is a large difference between the min and max variance per training step. For instance, very deep neural networks may suffer from vanishing gradients the deeper we go. By checking this ratio we can determine if we run into such a situation.

In [18]:
class GradientsAcrossLayers(Rule):
    def __init__(self, base_trial, ):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict()  
        
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*gradient'):
            try:
                tensor = self.base_trial.tensor(tname).value(step)
                if step not in self.tensors:
                    self.tensors[step] = [np.inf, 0]
                variance = np.var(tensor.flatten())
                if variance < self.tensors[step][0]:
                    self.tensors[step][0] = variance
                elif variance > self.tensors[step][1]:
                    self.tensors[step][1] = variance             
                self.logger.info(f" Step {step} current ratio: {self.tensors[step][0]} {self.tensors[step][1]} Ratio: {self.tensors[step][1] / self.tensors[step][0]}") 
            except:
                self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = GradientsAcrossLayers(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')

[2020-04-27 22:26:25.950 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:26:25.966 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule GradientsAcrossLayers at step 0
[2020-04-27 22:26:25.967 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:26:26.969 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:26:26.971 f8455ab5c5ab:17 INFO <ipython-input-18-0c18f883c3b5>:17]  Step 0 current ratio: 135150592.0 0 Ratio: 0.0
[2020-04-27 22:26:26.972 f8455ab5c5ab:17 INFO <ipython-input-18-0c18f883c3b5>:17]  Step 0 current ratio: 135150592.0 159276240.0 Ratio: 1.1785093545913696
[2020-04-27 22:26:26.973 f8455ab5c5ab:17 INFO <ipython-input-18-0c18f883c3b5>:17]  Step 0 current ratio: 8339952.0 159276240.0 Ratio: 19.097980499267578
[2020-04-27 22:26:26.974 f8455ab5c5ab:17 INFO <ipython-input-18-0c18f883c3b5>:17]  Step 0 current ratio: 8339952.0 159276240.

Let's check min and max values of the gradients across layers:

In [19]:
for step in rule.tensors:
    print("Step", step, "variance of gradients: ", rule.tensors[step][0], " to ",  rule.tensors[step][1])

Step 0 variance of gradients:  554.4756  to  159276240.0
Step 100 variance of gradients:  21.275051  to  27421.76
Step 200 variance of gradients:  0.023273567  to  159.85687
Step 300 variance of gradients:  0.017334905  to  673.0397
Step 400 variance of gradients:  0.0023620657  to  34.619526
Step 500 variance of gradients:  0.0116885565  to  234.84334
Step 600 variance of gradients:  1.8411363e-06  to  40.85332
Step 700 variance of gradients:  6.691928e-05  to  42.21149
Step 800 variance of gradients:  0.0025186585  to  133.70825
Step 900 variance of gradients:  0.0044774916  to  72.29355
Step 1000 variance of gradients:  5.7557765e-05  to  39.86183
Step 1100 variance of gradients:  0.001992659  to  77.35369
Step 1200 variance of gradients:  0.0003280628  to  29.943787
Step 1300 variance of gradients:  0.006202872  to  35.377148
Step 1400 variance of gradients:  0.0029379758  to  32.488556


### Distribution of weights
This rule retrieves the weight tensors and checks the variance. If the distribution does not change much across steps it may indicate that the learning rate is too low, that gradients are too small or that the training has converged to a minimum.

In [20]:
class WeightRatio(Rule):
    def __init__(self, base_trial, ):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict()  
        
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*weight'):
            if "gradient" not in tname:
                try:
                    tensor = self.base_trial.tensor(tname).value(step)
                    if tname not in self.tensors:
                        self.tensors[tname] = {}
                 
                    self.logger.info(f" Tensor  {tname}  has weights with variance: {np.var(tensor.flatten())} ")
                    self.tensors[tname][step] = tensor
                except:
                    self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = WeightRatio(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


[2020-04-27 22:26:27.303 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:26:27.321 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule WeightRatio at step 0
[2020-04-27 22:26:27.323 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:26:28.326 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:26:28.327 f8455ab5c5ab:17 INFO <ipython-input-20-f8ca0a68fd3b>:14]  Tensor  conv0_weight  has weights with variance: 0.2814779281616211 
[2020-04-27 22:26:28.331 f8455ab5c5ab:17 INFO <ipython-input-20-f8ca0a68fd3b>:14]  Tensor  conv1_weight  has weights with variance: 0.3394239842891693 
[2020-04-27 22:26:28.332 f8455ab5c5ab:17 INFO <ipython-input-20-f8ca0a68fd3b>:14]  Tensor  dense0_weight  has weights with variance: 0.3329920768737793 
[2020-04-27 22:26:28.334 f8455ab5c5ab:17 INFO <ipython-input-20-f8ca0a68fd3b>:14]  Tensor  dense1_weight  has weight

Plot the histograms

In [21]:
utils.create_interactive_matplotlib_histogram(rule.tensors, filename='images/weights.gif')

In [22]:
Image(url='images/weights.gif')

### Inputs

This rule retrieves layer inputs excluding activation inputs.

In [23]:
class Inputs(Rule):
    def __init__(self, base_trial, ):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict()  
        
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*input'):
            if "relu" not in tname:
                try:
                    tensor = self.base_trial.tensor(tname).value(step)
                    if tname not in self.tensors:
                        self.tensors[tname] = {}
                 
                    self.logger.info(f" Tensor  {tname}  has inputs with variance: {np.var(tensor.flatten())} ")
                    self.tensors[tname][step] = tensor
                except:
                    self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = Inputs(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


[2020-04-27 22:32:04.465 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:32:04.478 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule Inputs at step 0
[2020-04-27 22:32:04.479 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:32:05.481 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:32:05.484 f8455ab5c5ab:17 INFO <ipython-input-23-c408f5ccc2ae>:14]  Tensor  conv0_input_0  has inputs with variance: 1.0022085905075073 
[2020-04-27 22:32:05.486 f8455ab5c5ab:17 INFO <ipython-input-23-c408f5ccc2ae>:14]  Tensor  conv1_input_0  has inputs with variance: 4.837328910827637 
[2020-04-27 22:32:05.487 f8455ab5c5ab:17 INFO <ipython-input-23-c408f5ccc2ae>:14]  Tensor  dense0_input_0  has inputs with variance: 61.04131317138672 
[2020-04-27 22:32:05.489 f8455ab5c5ab:17 INFO <ipython-input-23-c408f5ccc2ae>:14]  Tensor  dense1_input_0  has inputs with 

Plot the histograms

In [24]:
utils.create_interactive_matplotlib_histogram(rule.tensors, filename='images/layer_inputs.gif')

In [25]:
Image(url='images/layer_inputs.gif')

### Layer outputs
This rule retrieves outputs of layers excluding activation outputs.

In [26]:
class Outputs(Rule):
    def __init__(self, base_trial, ):
        super().__init__(base_trial)  
        self.tensors = collections.OrderedDict() 
        
    def invoke_at_step(self, step):
        for tname in self.base_trial.tensor_names(regex='.*output'):
            if "relu" not in tname:
                try:
                    tensor = self.base_trial.tensor(tname).value(step)
                    if tname not in self.tensors:
                        self.tensors[tname] = {}
                 
                    self.logger.info(f" Tensor  {tname}  has inputs with variance: {np.var(tensor.flatten())} ")
                    self.tensors[tname][step] = tensor
                except:
                    self.logger.warning(f"Can not fetch tensor {tname}")
        return False

trial = create_trial(folder_name)
rule = Outputs(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


[2020-04-27 22:32:42.983 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output at path /tmp/debug-output
[2020-04-27 22:32:42.998 f8455ab5c5ab:17 INFO rule_invoker.py:15] Started execution of rule Outputs at step 0
[2020-04-27 22:32:43.002 f8455ab5c5ab:17 INFO trial.py:198] Training has ended, will refresh one final time in 1 sec.
[2020-04-27 22:32:44.004 f8455ab5c5ab:17 INFO trial.py:210] Loaded all steps
[2020-04-27 22:32:44.009 f8455ab5c5ab:17 INFO <ipython-input-26-54f31e2b0743>:14]  Tensor  conv0_output_0  has inputs with variance: 3.8557262420654297 
[2020-04-27 22:32:44.012 f8455ab5c5ab:17 INFO <ipython-input-26-54f31e2b0743>:14]  Tensor  conv1_output_0  has inputs with variance: 45.9383430480957 
[2020-04-27 22:32:44.013 f8455ab5c5ab:17 INFO <ipython-input-26-54f31e2b0743>:14]  Tensor  dense0_output_0  has inputs with variance: 4858.85498046875 
[2020-04-27 22:32:44.014 f8455ab5c5ab:17 INFO <ipython-input-26-54f31e2b0743>:14]  Tensor  dense1_output_0  has inputs wi

Plot the histograms

In [27]:
utils.create_interactive_matplotlib_histogram(rule.tensors, filename='images/layer_outputs.gif')

In [28]:
Image(url='images/layer_outputs.gif')

### Comparison 
In the previous section we have looked at the distribution of gradients, activation outputs and weights of a model that has not trained well due to poor initialization. Now we will compare some of these distributions with a model that has been well intialized.

In [7]:
entry_point_script = 'mnist.py'
hyperparameters = {'lr': 0.01}

In [8]:
estimator = MXNet(role=sagemaker.get_execution_role(),
                  base_job_name='mxnet',
                  train_instance_count=1,
                  train_instance_type='ml.m5.xlarge',
                  train_volume_size=400,
                  source_dir='src',
                  entry_point=entry_point_script,
                  hyperparameters=hyperparameters,
                  framework_version='1.6.0',
                  py_version='py3',
                  debugger_hook_config = DebuggerHookConfig(
                      s3_output_path=s3_bucket_for_tensors,  
                      collection_configs=[
                        CollectionConfig(
                            name="all",
                            parameters={
                                "include_regex": ".*",
                                "save_interval": "100"
                            }
                        )
                     ]
                   )
                )
                  

Start the training job

In [9]:
estimator.fit(wait=False)

Get S3 path where tensors have been stored

In [10]:
job_name = estimator.latest_training_job.name
client = estimator.sagemaker_session.sagemaker_client
description = client.describe_training_job(TrainingJobName=job_name)
path = description['DebugHookConfig']['S3OutputPath'] + '/' + job_name + '/debug-output'
print('Tensors are stored in: ', path)

Tensors are stored in:  s3://sagemaker-us-east-2-441510144314/smdebug-mnist-tensor-analysis/mxnet-2020-04-27-23-22-21-264/debug-output


Download tensors from S3

In [12]:
folder_name2 = "/tmp/{}_2".format(path.split("/")[-1])
os.system("aws s3 cp --recursive {} {}".format(path,folder_name2))
print('Downloading tensors into folder: ', folder_name2)

Downloading tensors into folder:  /tmp/debug-output_2


#### Gradients

Lets compare distribution of gradients of the convolutional layers of both trials.

In [13]:
trial = create_trial(folder_name)
rule = GradientsLayer(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


NameError: name 'create_trial' is not defined

In [1]:
dict_gradients = {}
dict_gradients['gradient/conv0_weight_bad_hyperparameters'] = rule.tensors['gradient/conv0_weight']
dict_gradients['gradient/conv1_weight_bad_hyperparameters'] = rule.tensors['gradient/conv1_weight']

NameError: name 'rule' is not defined

Second trial:

In [None]:
trial = create_trial(folder_name2)
rule = GradientsLayer(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


[2020-04-27 23:02:30.733 f8455ab5c5ab:17 INFO local_trial.py:35] Loading trial debug-output_2 at path /tmp/debug-output_2


In [None]:
dict_gradients['gradient/conv0_weight_good_hyperparameters'] = rule.tensors['gradient/conv0_weight']
dict_gradients['gradient/conv1_weight_good_hyperparameters'] = rule.tensors['gradient/conv1_weight']

Plot the histograms

In [None]:
utils.create_interactive_matplotlib_histogram(dict_gradients, filename='images/gradients_comparison.gif')

In the case of the poorly initalized model, gradients are fluctuating a lot leading to very high variance. 

In [None]:
Image(url='images/gradients_comparison.gif')

#### Activation inputs

Lets compare distribution of activation inputs of both trials.

In [None]:
trial = create_trial(folder_name)
rule = ActivationInputs(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


In [None]:
dict_activation_inputs = {}
dict_activation_inputs['conv0_relu_input_0_bad_hyperparameters'] = rule.tensors['conv0_relu_input_0']
dict_activation_inputs['conv1_relu_input_0_bad_hyperparameters'] = rule.tensors['conv1_relu_input_0']

Second trial

In [None]:
trial = create_trial(folder_name2)
rule = ActivationInputs(trial)
try:
    invoke_rule(rule)
except NoMoreData:
    print('The training has ended and there is no more data to be analyzed. This is expected behavior.')


In [None]:
dict_activation_inputs['conv0_relu_input_0_good_hyperparameters'] = rule.tensors['conv0_relu_input_0']
dict_activation_inputs['conv1_relu_input_0_good_hyperparameters'] = rule.tensors['conv1_relu_input_0']

Plot the histograms

In [None]:
utils.create_interactive_matplotlib_histogram(dict_activation_inputs, filename='images/activation_inputs_comparison.gif')

The distribution of activation inputs into first activation layer `conv0_relu_input_0` look quite similar in both trials. However in the case of the second layer they drastically differ. 

In [None]:
Image(url='images/activation_inputs_comparison.gif')