In [2]:
import mxnet as mx
from mxnet import nd, viz # mxnet.viz contains functions that help visualizing neural networks. 
from mxnet.gluon import nn, model_zoo # model_zoo contains a lot of pre-trained models and architectures to be used out of the box.

From Model Zoo import VGG11 network. VGG stands for visual geometry group. This group from Oxford university originally came up with a model architecture. This model is a fairly straightforward network alternating convolutional layer, max pooling layers and concluding with several fully connected layers.

In [3]:
# From model zoo
vgg11 = model_zoo.vision.vgg11(pretrained=True)

Downloading C:\Users\Mohamed.Sharaf\AppData\Roaming\mxnet\models\vgg11-dd221b16.zip3edab37b-6154-43e0-8a56-8462de92110f from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/vgg11-dd221b16.zip...


The first way to visualize or inspect the network is to simply print the gluon block using the print function. This will give a list of all those different blocks and children blocks containing the parent block.

In [4]:
print(vgg11)

VGG(
  (features): HybridSequential(
    (0): Conv2D(3 -> 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Activation(relu)
    (2): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
    (3): Conv2D(64 -> 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): Activation(relu)
    (5): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
    (6): Conv2D(128 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): Activation(relu)
    (8): Conv2D(256 -> 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): Activation(relu)
    (10): MaxPool2D(size=(2, 2), stride=(2, 2), padding=(0, 0), ceil_mode=False, global_pool=False, pool_type=max, layout=NCHW)
    (11): Conv2D(256 -> 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (12): Activation(relu)
    (13): Conv2D(512 -> 512, kernel_size=(3, 3), stride=(1

We can see here two main blocks. One is a features block and the other one is the outputs block. The visualization here is
purely based on the static analysis of the attribute of the object hence it does not contain any flow information. We only know that the features block is a hybrid sequential block, and that the output block is a fully connected layer with a 1000 units.

The forward function of the block defines how the data flows through it.

By calling the `.summary` function on the block with a dummy representative input data, the network is able to compute the shapes of each intermediate layer and the numbers of parameters at the entire network has.

In [5]:
vgg11.summary(nd.ones((1,3,224,224)))

--------------------------------------------------------------------------------
        Layer (type)                                Output Shape         Param #
               Input                            (1, 3, 224, 224)               0
            Conv2D-1                           (1, 64, 224, 224)            1792
        Activation-2                           (1, 64, 224, 224)               0
         MaxPool2D-3                           (1, 64, 112, 112)               0
            Conv2D-4                          (1, 128, 112, 112)           73856
        Activation-5                          (1, 128, 112, 112)               0
         MaxPool2D-6                            (1, 128, 56, 56)               0
            Conv2D-7                            (1, 256, 56, 56)          295168
        Activation-8                            (1, 256, 56, 56)               0
            Conv2D-9                            (1, 256, 56, 56)          590080
       Activation-10        

we can see that the first convolution brings up the number of channels from 3 to 64. And we notice the effect of max pooling in the reduction of spatial resolution from 224 pixels to 112 pixels.

This type of summary is very useful to make sure you get the right shapes at the right place, and that you are expecting and
controlling the number of parameters in your model.

However, to visualize a computational graph, we can use the `mx.viz.plot_network` function that chooses the open source graphviz
library to represents a computational graph. That function can take additional graphviz arguments for prettier printing, and you can also optionally pass in the shape of your input for overlaying the shapes onto graph directly.

This graph represents well the linear and simple structure of vgg11. We can see it here from the outputs all the way back to the input. We can see it's a special dimension is increasing here as a number of channels is reducing all the way to the first input layer, where we have an input of size 224 by 224 and three channels. One thing to note is that the complexity and number of parameters are quite unrelated.

In [8]:
viz.plot_network(vgg11(mx.sym.var('data')), shape={'data':(1,3,224,224)},node_attrs={"shape":"oval","fixedsize":"false"})

ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

In [9]:
mobilenet = model_zoo.vision.mobilenet_v2_1_0(pretrained=True)

Downloading C:\Users\Mohamed.Sharaf\AppData\Roaming\mxnet\models\mobilenetv2_1.0-36da4ff1.zipb5fcc56b-c753-48c6-8aee-699023fdeae7 from https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/models/mobilenetv2_1.0-36da4ff1.zip...


If we take the mobilenet_v2 example from the model_zoo, we can see that it contains a lot more blocks than vgg11.

In [10]:
mobilenet

MobileNetV2(
  (features): HybridSequential(
    (0): Conv2D(3 -> 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
    (2): RELU6(
    
    )
    (3): LinearBottleneck(
      (out): HybridSequential(
        (0): Conv2D(32 -> 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (1): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
        (2): RELU6(
        
        )
        (3): Conv2D(1 -> 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), groups=32, bias=False)
        (4): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=32)
        (5): RELU6(
        
        )
        (6): Conv2D(32 -> 16, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (7): BatchNorm(axis=1, eps=1e-05, momentum=0.9, fix_gamma=False, use_global_stats=False, in_channels=16)

You can see how many layers this network has. However, when we print a summary of this network, we realize that despite this very large number of layers, the number of parameters is only around 3.5 million. Remember that vgg11 had 100 million in the last layer.
The reason is it's using linearly separable convolution, and we can see that up until the mid part of the network, we only using about 50,000 or less parameters per layer. Typically, dense layers are very parameter heavy since a number of parameters is a function of the number of inputs times the number of outputs. However, convolutional layer of parameters that roughly linear in the number of channels, and do not depend on the input special dimensions.

MobileNet does away with dense layer, and it is a final classification layer, and otherwise uses lightweight depth wise separable convolution for the main feature extraction branch.

In [11]:
mobilenet.summary(nd.ones((1,3,224,224)))

--------------------------------------------------------------------------------
        Layer (type)                                Output Shape         Param #
               Input                            (1, 3, 224, 224)               0
            Conv2D-1                           (1, 32, 112, 112)             864
         BatchNorm-2                           (1, 32, 112, 112)             128
             RELU6-3                           (1, 32, 112, 112)               0
            Conv2D-4                           (1, 32, 112, 112)            1024
         BatchNorm-5                           (1, 32, 112, 112)             128
             RELU6-6                           (1, 32, 112, 112)               0
            Conv2D-7                           (1, 32, 112, 112)             288
         BatchNorm-8                           (1, 32, 112, 112)             128
             RELU6-9                           (1, 32, 112, 112)               0
           Conv2D-10        

If we try to visualize MobileNet here, we see that it's very hard to understand the flow of the data because the network is so big.

In [12]:
viz.plot_network(mobilenet(mx.sym.var('data')), shape={'data':(1,3,224,224)}, node_attrs={"shape":"oval","fixedsize":"false"})

ExecutableNotFound: failed to execute ['dot', '-Tsvg'], make sure the Graphviz executables are on your systems' PATH

We can also use XML tools like netron. Netron is an open source tool that accept network from different frameworks including mxnet and lets you explore them in a web UI. 

Let's export the computational graph of this network and load it in netron.

We export it  with below code, and then we can simply go to netron and load the model.

In [16]:
mobilenet.hybridize()
mobilenet(nd.ones((1,3,224,224)))
mobilenet.export('mobile_net')

Let's visualize the computational graph with netron https://lutzroeder.github.io/netron