Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Survey Tensorflow and Caffe2 to decide standard storing format #7222

Closed
kavyasrinet opened this issue Jan 4, 2018 · 2 comments
Closed

Survey Tensorflow and Caffe2 to decide standard storing format #7222

kavyasrinet opened this issue Jan 4, 2018 · 2 comments
Assignees
Labels
预测 原名Inference,包含Capi预测问题等

Comments

@kavyasrinet
Copy link

kavyasrinet commented Jan 4, 2018

The survey on model storage formats in standard frameworks like TensorFlow and Caffe2.

TensorFlow

TensorFlow stores the model file as a Protocol Buffer format.
The Graph object in TensorFlow holds a network of nodes, where each node represents one operation and different nodes in a Graph are connected to each other. After creating the Graph object we save it by calling the method: as_graph_def(), which returns a GraphDef object.

The GraphDef class is an object created by the ProtoBuf library from the definition given in the file: graph.proto. We have a similar definition in PaddlePaddle in the form of ProgramDesc.

The protobuf library has all the tools required to parse this text file, and generate the code to load, store, and manipulate the graph definitions, and also provides cross-language support.

A TensorFlow model file generally contains a serialized version of these GraphDef objects saved out by the ProtoBuf code.

A ProtoBuf can be saved in two different formats:

  1. Binary : Binary format files are very small in size. The data is stored in binary format and is not human-readable.
  2. Text : TextFormat is the human-readable form which is easy for debugging, but it comes at the cost of being too large when there's numerical data like weights stored in it. Example: graph_run_run2.pbtxt.

An example of how to load the model in TensorFlow is as follows:

graph_def = graph_pb2.GraphDef()

This creates an empty GraphDef object using the definition of the class from graph.proto. This is the object that we will populate with the data from our model file.

with open(model_file, "rb") as f:
  if FLAGS.input_binary:      # if binary format
    graph_def.ParseFromString(f.read())
  else:                                   # if text format
    text_format.Merge(f.read(), graph_def)

After this step the graph_def variable will have the Graph loaded in it.

Caffe2

Caffe2 uses the protobuf .pb file format for the model files as compared to .caffemodel files in Caffe (the previous version).

Caffe2 exposes the model as the following files:

  1. A protobuf file that defines the network.
  2. A protobuf file that has all of the network weights.

The first file is generally referred to as the predict_net and the second is referred to as init_net.
The predict_net is small and the init_net file is usually quite large since it contains all the network weights and parameters.

An example of how to load a model file in Caffe2 is :

with open(path_to_INIT_NET) as f:
    init_net = f.read()
with open(path_to_PREDICT_NET) as f:
    predict_net = f.read()

p = workspace.Predictor(init_net, predict_net)

Here we read the two files are read into init_net and predict_net. Then we spawn a new Caffe2 workspace by calling workspace.Predictor. This call is a wrapper to the C++ API Predictor in Caffe2 and needs the two protobuf files as passed in the example above.

To perform inference after we have loaded the files, we just call the run method:

results = p.run(input)

PaddlePaddle with fluid

The proposal of the new model file format after removing Pickle is discussed in detail in the issue: #7221

@kavyasrinet kavyasrinet created this issue from a note in Inference Framework (DOING) Jan 4, 2018
@kavyasrinet kavyasrinet self-assigned this Jan 4, 2018
@Xreki Xreki added the 预测 原名Inference,包含Capi预测问题等 label Jan 5, 2018
@Xreki
Copy link
Contributor

Xreki commented Jan 5, 2018

Great work!

So in tensorflow, the Graph contains both the definition of the network and all weights? If stored in text format, all the weights will be stored in text format too? The resulted model file will be so huge.

In caffe2, the init_net and predict_net are stored in binary format or text format? If spilt into two parts, the predict_net can be stored in text format, which will be friendly for users to debug, and the init_net should be stored in binary format.

@kavyasrinet
Copy link
Author

Yes in TensorFlow the GraphDef contains both: the node and weights. An example of result file is : model_file
Though in general it is stored in binary format. Text format is given here as an example since it is human-readable. We can find some sample binary files online too.

In Caffe2, I think both: predict_net and init_net are binary files, since they both have a .pb extension in the documentation.

@kavyasrinet kavyasrinet moved this from DOING to DONE in Inference Framework Jan 8, 2018
@Xreki Xreki closed this as completed Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
预测 原名Inference,包含Capi预测问题等
Projects
No open projects
Inference Framework
Basic Usage (DONE)
Development

No branches or pull requests

2 participants