## Caffe2 Concepts
Below you can learn more about the main concepts of Caffe2 that are crucial for understanding and developing Caffe2 models.

### Blobs and Workspace, Tensors
Data in Caffe2 is organized as blobs. Blob is just a named chunk of data in memory. Most blobs contain a tensor (think multidimensional array), and in python they are translated to numpy arrays (numpy is a popular numerical library for python and is already installed as a prerequisite with Caffe2).

[Workspace](workspace.html) stores all the blobs. Following example shows how to feed blobs into `workspace` and fetch them. Workspaces initialize themselves the moment you start using them.


In [1]:
from caffe2.python import workspace, model_helper
import numpy as np
# Create random tensor of three dimensions
x = np.random.rand(4, 3, 2)
print(x)
print(x.shape)

workspace.FeedBlob("my_x", x)

x2 = workspace.FetchBlob("my_x")
print(x2)

[[[ 0.65488539  0.75442539]
  [ 0.97237125  0.05673576]
  [ 0.50913366  0.63948923]]

 [[ 0.37889724  0.23027369]
  [ 0.23741295  0.26820519]
  [ 0.29745158  0.34361856]]

 [[ 0.99060827  0.9180144 ]
  [ 0.03749199  0.13332019]
  [ 0.77342564  0.76443418]]

 [[ 0.16724496  0.51716335]
  [ 0.74060236  0.34408865]
  [ 0.52004503  0.68331434]]]
(4, 3, 2)
[[[ 0.65488539  0.75442539]
  [ 0.97237125  0.05673576]
  [ 0.50913366  0.63948923]]

 [[ 0.37889724  0.23027369]
  [ 0.23741295  0.26820519]
  [ 0.29745158  0.34361856]]

 [[ 0.99060827  0.9180144 ]
  [ 0.03749199  0.13332019]
  [ 0.77342564  0.76443418]]

 [[ 0.16724496  0.51716335]
  [ 0.74060236  0.34408865]
  [ 0.52004503  0.68331434]]]


### Nets and Operators
The fundamental object of Caffe2 is a net (short for network). Net is a graph of operators, and each operator takes a set of input blobs and produces one or more output blobs.

In the code block below we will create a super simple model. It will have these components:

* One fully-connected layer (FC)
  * a Sigmoid activation with a Softmax
  * a CrossEntropy loss

Composing nets directly is quite tedious, so it is better to use *model helpers* that are python classes that aid in creating the nets. Even though we call it and pass in a single name "my first net", `ModelHelper` will create two interrelated nets:

1. one that initializes the parameters (ref. init_net)
2. one that runs the actual training (ref. exec_net)

In [2]:
# Create the input data
data = np.random.rand(16, 100).astype(np.float32)

# Create labels for the data as integers [0, 9].
label = (np.random.rand(16) * 10).astype(np.int32)

workspace.FeedBlob("data", data)
workspace.FeedBlob("label", label)

True

We created some random data and random labels and then fed those as blobs into the workspace.

In [3]:
# Create model using a model helper
m = model_helper.ModelHelper(name="my first net")

You've now used the `model_helper` to created the two nets we mentioned earlier (init_net, exec_net). We plan to add a fully-connected layer using the FC operator in this model next, but first we need to do some prep work by creating some random fills that the FC Op expects. Next we can add the Ops and use the weights and bias blobs we created, calling them by name when we invoke the FC Op.


In [4]:
weight = m.param_init_net.XavierFill([], 'fc_w', shape=[10, 100])
bias = m.param_init_net.ConstantFill([], 'fc_b', shape=[10, ])

In Caffe2 the FC Op takes in an input blob (our data), weights, and bias. Weights and bias using either `XavierFill` or `ConstantFill` will both take an empty array, name, and shape (as `shape=[output, input]`).



In [5]:
fc_1 = m.net.FC(["data", "fc_w", "fc_b"], "fc1")
pred = m.net.Sigmoid(fc_1, "pred")
[softmax, loss] = m.net.SoftmaxWithLoss([pred, "label"], ["softmax", "loss"])


Reviewing the code blocks above:

First, we created the input data and label blobs in memory (in practice, you would be loading data from a input data source such as database -- more about that later). Note that the data and label blobs have first dimension '16'; this is because the input to the model is a mini-batch of 16 samples at a time. Many Caffe2 operators can be accessed directly through `ModelHelper` and can handle a mini-batch of input a time. Check [ModelHelper's Operator List](workspace.html#cnnmodelhelper) for more details.

Second, we create a model by defining a bunch of operators: [FC](operators-catalogue.html#fc), [Sigmoid](operators-catalogue.html#sigmoidgradient) and [SoftmaxWithLoss](operators-catalogue.html#softmaxwithloss). *Note:* at this point, the operators are not executed, you are just creating the definition of the model.

Model helper will create two nets: `m.param_init_net` which is a net you run only once. It will initialize all the parameter blobs such as weights for the FC layer. The actual training is done by executing `m.net`. This is transparent to you and happens automatically.

The net definition is stored in a protobuf structure (see Google's Protobuffer documentation to learn more; protobuffers are equivalent to Thrift structs). You can easily inspect it by calling `net.Proto()`:


In [6]:
print(str(m.net.Proto()))

name: "my first net"
op {
  input: "data"
  input: "fc_w"
  input: "fc_b"
  output: "fc1"
  name: ""
  type: "FC"
}
op {
  input: "fc1"
  output: "pred"
  name: ""
  type: "Sigmoid"
}
op {
  input: "pred"
  input: "label"
  output: "softmax"
  output: "loss"
  name: ""
  type: "SoftmaxWithLoss"
}
external_input: "data"
external_input: "fc_w"
external_input: "fc_b"
external_input: "label"



You also should have a look at the param initialization net:


In [7]:
print(str(m.param_init_net.Proto()))

name: "my first net_init"
op {
  output: "fc_w"
  name: ""
  type: "XavierFill"
  arg {
    name: "shape"
    ints: 10
    ints: 100
  }
}
op {
  output: "fc_b"
  name: ""
  type: "ConstantFill"
  arg {
    name: "shape"
    ints: 10
  }
}



### Executing
Now when we have the model training operators defined, we can start to run it to train our model.

First, we run only once the param initialization:

In [8]:
m.AddGradientOperators([loss])
workspace.RunNetOnce(m.param_init_net)

True

Note, as usual, this will actually pass the protobuffer of the `param_init_net` down to the C++ runtime for execution.

Then we create the actual training Net:


In [9]:
workspace.CreateNet(m.net,overwrite=True)

True

We create it once and then we can efficiently run it multiple times:

In [10]:
# Run 100 x 10 iterations
for j in range(0, 100):
    data = np.random.rand(16, 100).astype(np.float32)
    label = (np.random.rand(16) * 10).astype(np.int32)

    workspace.FeedBlob("data", data)
    workspace.FeedBlob("label", label)

    workspace.RunNet(m.name, 10)   # run for 10 times

Note how we refer to the network name in `RunNet()`. Since the net was created inside workspace, we don't need to pass the net definition again.

After execution, you can inspect the results stored in the output blobs (that contain tensors i.e numpy arrays):


In [11]:
print(workspace.FetchBlob("softmax"))
print(workspace.FetchBlob("loss"))
print(workspace.FetchBlob("fc_b"))

[[ 0.09061994  0.07965311  0.09871222  0.09583402  0.09483479  0.08366007
   0.13113715  0.1024393   0.12742123  0.09568807]
 [ 0.09078755  0.09321389  0.09350473  0.0892828   0.10340762  0.08797722
   0.11673651  0.10178439  0.121646    0.1016593 ]
 [ 0.09377754  0.08233742  0.10988451  0.08662593  0.09058259  0.08217353
   0.12956057  0.10357561  0.1266723   0.09481001]
 [ 0.09539275  0.081124    0.10594266  0.09683703  0.10041932  0.08679937
   0.11326341  0.11476425  0.1193881   0.08606906]
 [ 0.08773227  0.09671976  0.09878469  0.0911809   0.10331523  0.10669518
   0.12288262  0.10679417  0.09943995  0.08645529]
 [ 0.10224349  0.09781764  0.09634469  0.09334582  0.1011361   0.09115197
   0.11652559  0.10355464  0.11349618  0.08438382]
 [ 0.09411819  0.0817339   0.10100326  0.09774512  0.09923947  0.08981858
   0.11463299  0.10967128  0.12042277  0.09161458]
 [ 0.09039521  0.08872747  0.10862324  0.09776999  0.08921608  0.1028707
   0.11724133  0.10353783  0.11598253  0.08563568]
 

### Backward pass
This net only contains the forward pass, thus is not learning anything. The backward pass is created by creating the gradient operators for each operator in the forward pass.

If you care to follow this example yourself, then try the following steps an examine the results!

Insert following before you call `RunNetOnce()`:


In [23]:
m.AddGradientOperators([loss])

{BlobReference("fc_b"): BlobReference("fc_b_grad"),
 BlobReference("loss"): BlobReference("loss_autogen_grad"),
 BlobReference("pred"): BlobReference("pred_grad"),
 BlobReference("data"): BlobReference("data_grad"),
 BlobReference("fc_w"): BlobReference("fc_w_grad"),
 BlobReference("fc1"): BlobReference("fc1_grad")}

Examine the protobuf output:

In [12]:
print(str(m.net.Proto()))

name: "my first net"
op {
  input: "data"
  input: "fc_w"
  input: "fc_b"
  output: "fc1"
  name: ""
  type: "FC"
}
op {
  input: "fc1"
  output: "pred"
  name: ""
  type: "Sigmoid"
}
op {
  input: "pred"
  input: "label"
  output: "softmax"
  output: "loss"
  name: ""
  type: "SoftmaxWithLoss"
}
op {
  input: "loss"
  output: "loss_autogen_grad"
  name: ""
  type: "ConstantFill"
  arg {
    name: "value"
    f: 1.0
  }
}
op {
  input: "pred"
  input: "label"
  input: "softmax"
  input: "loss_autogen_grad"
  output: "pred_grad"
  name: ""
  type: "SoftmaxWithLossGradient"
  is_gradient_op: true
}
op {
  input: "pred"
  input: "pred_grad"
  output: "fc1_grad"
  name: ""
  type: "SigmoidGradient"
  is_gradient_op: true
}
op {
  input: "data"
  input: "fc_w"
  input: "fc1_grad"
  output: "fc_w_grad"
  output: "fc_b_grad"
  output: "data_grad"
  name: ""
  type: "FCGradient"
  is_gradient_op: true
}
external_input: "data"
external_input: "fc_w"
external_input: "fc_b"
external_input: "label

In [13]:
print("Current blobs in the workspace: {}".format(workspace.Blobs()))

Current blobs in the workspace: [u'data', u'data_grad', u'fc1', u'fc1_grad', u'fc_b', u'fc_b_grad', u'fc_w', u'fc_w_grad', u'label', u'loss', u'loss_autogen_grad', u'my_x', u'pred', u'pred_grad', u'softmax']
