Survey Tensorflow and Caffe2 to decide standard storing format #7222

kavyasrinet · 2018-01-04T18:33:37Z

The survey on model storage formats in standard frameworks like TensorFlow and Caffe2.

TensorFlow

TensorFlow stores the model file as a Protocol Buffer format.
The Graph object in TensorFlow holds a network of nodes, where each node represents one operation and different nodes in a Graph are connected to each other. After creating the Graph object we save it by calling the method: as_graph_def(), which returns a GraphDef object.

The GraphDef class is an object created by the ProtoBuf library from the definition given in the file: graph.proto. We have a similar definition in PaddlePaddle in the form of ProgramDesc.

The protobuf library has all the tools required to parse this text file, and generate the code to load, store, and manipulate the graph definitions, and also provides cross-language support.

A TensorFlow model file generally contains a serialized version of these GraphDef objects saved out by the ProtoBuf code.

A ProtoBuf can be saved in two different formats:

Binary : Binary format files are very small in size. The data is stored in binary format and is not human-readable.
Text : TextFormat is the human-readable form which is easy for debugging, but it comes at the cost of being too large when there's numerical data like weights stored in it. Example: graph_run_run2.pbtxt.

An example of how to load the model in TensorFlow is as follows:

graph_def = graph_pb2.GraphDef()

This creates an empty GraphDef object using the definition of the class from graph.proto. This is the object that we will populate with the data from our model file.

with open(model_file, "rb") as f:
  if FLAGS.input_binary:      # if binary format
    graph_def.ParseFromString(f.read())
  else:                                   # if text format
    text_format.Merge(f.read(), graph_def)

After this step the graph_def variable will have the Graph loaded in it.

Caffe2

Caffe2 uses the protobuf .pb file format for the model files as compared to .caffemodel files in Caffe (the previous version).

Caffe2 exposes the model as the following files:

A protobuf file that defines the network.
A protobuf file that has all of the network weights.

The first file is generally referred to as the predict_net and the second is referred to as init_net.
The predict_net is small and the init_net file is usually quite large since it contains all the network weights and parameters.

An example of how to load a model file in Caffe2 is :

with open(path_to_INIT_NET) as f:
    init_net = f.read()
with open(path_to_PREDICT_NET) as f:
    predict_net = f.read()

p = workspace.Predictor(init_net, predict_net)

Here we read the two files are read into init_net and predict_net. Then we spawn a new Caffe2 workspace by calling workspace.Predictor. This call is a wrapper to the C++ API Predictor in Caffe2 and needs the two protobuf files as passed in the example above.

To perform inference after we have loaded the files, we just call the run method:

results = p.run(input)

PaddlePaddle with fluid

The proposal of the new model file format after removing Pickle is discussed in detail in the issue: #7221

The text was updated successfully, but these errors were encountered:

Xreki · 2018-01-05T03:03:16Z

Great work!

So in tensorflow, the Graph contains both the definition of the network and all weights? If stored in text format, all the weights will be stored in text format too? The resulted model file will be so huge.

In caffe2, the init_net and predict_net are stored in binary format or text format? If spilt into two parts, the predict_net can be stored in text format, which will be friendly for users to debug, and the init_net should be stored in binary format.

kavyasrinet · 2018-01-05T18:55:38Z

Yes in TensorFlow the GraphDef contains both: the node and weights. An example of result file is : model_file
Though in general it is stored in binary format. Text format is given here as an example since it is human-readable. We can find some sample binary files online too.

In Caffe2, I think both: predict_net and init_net are binary files, since they both have a .pb extension in the documentation.

kavyasrinet created this issue from a note in Inference Framework (DOING) Jan 4, 2018

kavyasrinet self-assigned this Jan 4, 2018

Xreki mentioned this issue Jan 5, 2018

Plan to develop the inference library of fluid #7145

Closed

Xreki added the 预测原名Inference，包含Capi预测问题等 label Jan 5, 2018

kavyasrinet moved this from DOING to DONE in Inference Framework Jan 8, 2018

Xreki closed this as completed Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Survey Tensorflow and Caffe2 to decide standard storing format #7222

Survey Tensorflow and Caffe2 to decide standard storing format #7222

kavyasrinet commented Jan 4, 2018 •

edited

Xreki commented Jan 5, 2018

kavyasrinet commented Jan 5, 2018

Survey Tensorflow and Caffe2 to decide standard storing format #7222

Survey Tensorflow and Caffe2 to decide standard storing format #7222

Comments

kavyasrinet commented Jan 4, 2018 • edited

TensorFlow

Caffe2

PaddlePaddle with fluid

Xreki commented Jan 5, 2018

kavyasrinet commented Jan 5, 2018

kavyasrinet commented Jan 4, 2018 •

edited