Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add save and load method for Module class #691

Closed
nudles opened this issue May 9, 2020 · 27 comments
Closed

Add save and load method for Module class #691

nudles opened this issue May 9, 2020 · 27 comments

Comments

@nudles
Copy link
Member

nudles commented May 9, 2020

Updated on May 15

class Layer:
   def get_params(self):
       """the params of this layer and sublayers as a dict;  param name is: layername.param
           e.g., self.W = Tensor(), self.b=Tensor()
                  name of W and b is  like conv1.W and conv1.b  
       """

    def get_states(self):
       """states of this layer as sublayers that are necessary for model evaluation/inference.
           the states include the params and others, e.g., the running mean and var of batchnorm.
      """

class Module(Layer):   
  def compile(self ...):
     """set the name of each layer and sublayers, which will be used to create the dict 
          for get_params and get_states. Then no need to manually config the layer name 
         the __init__ method of a layer.
 
        For instance,
        class Blk(Layer):
             def __init__(self):
                  self.conv1= Conv2d()
                  self.conv2 = Conv2d()

        class MyModel(Module):
              def __init__(self):         
                 self.blk1 = Blk() --> blk1.conv1, blk1.conv2
                 self.blk2 = Blk()  --> blk2.conv1, blk2.conv2
   """

  # high priority
  def save(self, fpath, ckp_states={}):
      """Save the model and optionally some states.
      
      Args:
         fpath: output file path (without the extension)
         ckp_states(dict): states for checkpoint that are not attributes of Module, e.g., epoch ID.
      """
      cust_states = {}
      if ckp_states is not None:
         cust_states = ckp_states + model (include sublayers) attributes - get_states()
      save model states via onnx with customized field for the cust_states

  def load(self, fpath, dev, use_graph, graph_alg):
      """Load the model onto dev
   
       Args:
         path: input file path (without the extension)

       Returns:
          dict for the ckp_states.
      ```
      load model states + cust_states
      model attributes = model states + attributes from cust_states
      self.compile()
      restore the model attributes
      return the rest states as a dict

# lower priority
def save(fpath, model, ckp_states):
    attributes <-- model
    replace all tensors in attributes + ckp_states into dict name -->(shape, dtype)
    dump the tensors via numpy.savez_compressed
    dump model via pickle

def load(fpath, dev, use_graph, graph_alg):
     load model via pickle
     load tensors via numpy.load
     restore the tensors 
     return the ckp_states

Clarification:

  • Params: layer parameters (Tensor) that are updated via SGD. Layer.get_params()
  • States: Params + other variables that are necessary for model evaluation/inference. Superset of params. Layer.get_states()
  • Attributes: members of a class instance class.__dict__. Superset of states.
@XJDKC
Copy link
Member

XJDKC commented May 12, 2020

Do we need to save params of the model?

@chrishkchris
Copy link
Contributor

chrishkchris commented May 12, 2020

Do we need to save params of the model?

Yes, says if we did the training and want to save the model, so that later we can deploy it as web service for inference

This is common because we can train the model inside a cluster with many computational resource.
Then, we deploy the model in a different environment, e.g. to host a inference job web application like image classification.

@XJDKC
Copy link
Member

XJDKC commented May 12, 2020

Do we need to save params of the model?

Yes, says if we did the training and want to save the model, so that later we can deploy it as web service for inference

This is common because we can train the model inside a cluster with many computational resource.
Then, we deploy the model in a different environment, e.g. to host a inference job web application like image classification.

Got it. So I think we better implement this feature on the Python side because the scheduler doesn't have information about the type of operator and it also has no concept of neural network layer.

@chrishkchris
Copy link
Contributor

chrishkchris commented May 12, 2020

Do we need to save params of the model?

Yes, says if we did the training and want to save the model, so that later we can deploy it as web service for inference
This is common because we can train the model inside a cluster with many computational resource.
Then, we deploy the model in a different environment, e.g. to host a inference job web application like image classification.

Got it. So I think we better implement this feature on the Python side because the scheduler doesn't have information about the type of operator and it also has no concept of neural network layer.

If it is in the python side, I guess the easiest way is use pickle to pack a python list or dict, but a drawback is that pickle cannot pack SWIG python object (if I am not wrong)
https://docs.python.org/3/library/pickle.html

@XJDKC
Copy link
Member

XJDKC commented May 12, 2020

Do we need to save params of the model?

Yes, says if we did the training and want to save the model, so that later we can deploy it as web service for inference
This is common because we can train the model inside a cluster with many computational resource.
Then, we deploy the model in a different environment, e.g. to host a inference job web application like image classification.

Got it. So I think we better implement this feature on the Python side because the scheduler doesn't have information about the type of operator and it also has no concept of neural network layer.

If it is in the python side, I guess the easiest way is use pickle to pack a python list or dict, but a drawback is that pickle cannot pack SWIG python object (if I am not wrong)
https://docs.python.org/3/library/pickle.html

Or we can serialize and deserialize data ourselves. There is no need to serialize the entire objects, we just need to save the state data.

@chrishkchris
Copy link
Contributor

chrishkchris commented May 12, 2020

Do we need to save params of the model?

Yes, says if we did the training and want to save the model, so that later we can deploy it as web service for inference
This is common because we can train the model inside a cluster with many computational resource.
Then, we deploy the model in a different environment, e.g. to host a inference job web application like image classification.

Got it. So I think we better implement this feature on the Python side because the scheduler doesn't have information about the type of operator and it also has no concept of neural network layer.

If it is in the python side, I guess the easiest way is use pickle to pack a python list or dict, but a drawback is that pickle cannot pack SWIG python object (if I am not wrong)
https://docs.python.org/3/library/pickle.html

Or we can serialize and deserialize data ourselves. There is no need to serialize the entire objects, we just need to save the state data.

Yes, I guess it is something like this

  1. save https://github.com/nginyc/rafiki/blob/b027c588f27ed4e801e8e300785b0eca230b5167/examples/models/image_classification/TfVgg16.py#L105
  2. load
    https://github.com/nginyc/rafiki/blob/b027c588f27ed4e801e8e300785b0eca230b5167/examples/models/image_classification/TfVgg16.py#L127
    But the different is that this issue is conncerning the ONNX format.

@XJDKC
Copy link
Member

XJDKC commented May 12, 2020

Rafiki dumps model in ONNX format?

@chrishkchris
Copy link
Contributor

Rafiki dumps model in ONNX format?

No, I didn't mean that. They dumps model in different format not in ONNX.

@dcslin
Copy link
Member

dcslin commented May 15, 2020

How are the topological connection(forward()) and configs(Linear(2,3)) handled when module.save() is called?

@dcslin
Copy link
Member

dcslin commented May 15, 2020

I guess "high/low priority" refers to the preference? In the "low priority" option, the save and load is non intrusive/not part of class module.
Personally, serialization, is a mechanism telling how to handle a class, is preferred to be non intrusive/ not part of the class. The class itself is a structure. For example mymodel = MyModel(); mymodel.load(fp). Maybe can be mymodel=singa.module.load(fp)

@joddiy
Copy link
Member

joddiy commented May 15, 2020

Conslusion first

Good news:

The ONNX can defines the loss and optimizer now within its format. However, current loss only have NegativeLogLikelihoodLoss and SoftmaxCrossEntropyLoss. Also, it only can store optimizers, only have - Adagrad, Adam, Momentum(SGD with standard momentum).

Bad news:

we need to update the onnx to 1.7, which is released last week, may not be so stable. In this release, ONNX defines a comlicated node called GraphCall to specify which gradients should be computed and how to update the tensors by using these gradients. Since we will update the weights following the backward, so this part may not be useful for us.

ONNX Training Preview (TrainingInfoProto)

In last week, the ONNX team has released a new version 1.7.0 which upgrade its opset version to 12. In this new rleases, they add a new feature called TrainingInfoProto.

This new feature defines something about training information. There are two main parts in it, initialization-step and training-algorithm-step.

initialization-step

initialization-step means the developer can defines a initialization. For its type, the initialization is a formal ONNX graph. It doesn't have input but seveal outputs. The developer can defines some nodes in this graph, such as RandomNormal or RandomUniform, and in another field called initialization_binding, the developer can assign these outputs to the specific tensors in the inference graph.

The current supported ramdom methods are: RandomNormal or RandomUniform.

training-algorithm-step

training-algorithm-step defines a field called algorithm. It defines a inference graph which represents a training algorithm's step. Given required inputs, it computes outputs to update tensors in its own or in the main computaton graph. update_binding contains a key-value pair of strings to assign the outputs to some specific tensors.

In general, this graph contains loss node, gradient node, optimizer node, increment of iteration count, and some calls to the inference graph. The field algorithm.node is the only place the user can use GraphCall operator.

Loss node

  • NegativeLogLikelihoodLoss
  • SoftmaxCrossEntropyLoss

Optimizer node

  • Adagrad
  • Adam
  • Momentum: SG with standard momentum

Gradient node

The gradient node actually only defines the necessary information to compute the gradient for all graph, for example, at the following graph, the gradient defines its inputs containing the xs(intermidate weights) and zs(input of the graph), and y(the output of the graph), and its outputs having dY/dW, dY/dZ whose order corresponds to the inputs in xs.

It doesn't defines any logic about how to compute the dY/dW, dY/dZ.

W --> Conv --> H --> Gemm --> Y
|      ^              ^
|      |              |
|      X              Z
|      |              |
|      |   .----------'
|      |   |  (W/Z/X is the 1st/2nd/3rd input of Gradient as shown in
|      |   |   "xs" followed by "zs")
|      v   v
'---> Gradient(xs=["W", "Z"], zs=["X"], y="Y")
       |   |
       |   '-----------------------------------> dY/dW (1st output of Gradient)
       |
       '---------------------------------------> dY/dZ (2nd output of Gradient)

GraphCall node

The GraphCall operator invokes a graph inside TrainingInfoProto's algorithm field. The GraphCall inputs and outputs are bound to those of invoked graph by position.

Based on the above inference graph, the GraphCall can use like this:

.-------- W (a global and mutable variable from
|         |  the inference graph)
|         |
|   .-----'-----------.
|   |                 |
|   |                 v
|   | .-- X_1 --> GraphCall(graph_name="MyInferenceGraph")
|   | |            |  |
|   | |            |  |
|   | |   Z_1 -----'  |
|   | |    |          V
|   | |    |         Y_1 ---> Loss ---> O
|   | |    |                    ^
|   | |    |                    |
|   | `--. |                    C
|   |    | |                    |
|   |    | |   .----------------'
|   |    | |   |
|   |    v v   v
|   `--> Gradient(xs=["W"], zs=["X_1", "Z_1", "C"], y="O")
|        |
|        v
|      dO_dW (gradient of W)      1 (a scalar one)
|        |                        |
|        V                        v
|       Div <--- T ------------> Add ---> T_new
|        |    (T is the number of training iterations.
|        |     T is also globally visible and mutable.)
|        v
`-----> Sub ----> W_new

The previous section's inference graph is called by GraphCall(graph_name="MyInferenceGraph"), and it uses a new batch of inputs (X_1, Z_1) to compute Y_1.

Gradient defines the graidents the graph should compute, finally, it gets W_new amd T_new.

The it uses the following update_binding to udpate the tensors:

update_binding: {"W": "W_new", "T": "T_new"}

@nudles
Copy link
Member Author

nudles commented May 15, 2020

How are the topological connection(forward()) and configs(Linear(2,3)) handled when module.save() is called?

we do a forward (e.g., using placeholder recorded by compile()) inside save() to get the output y and then trace back to get all operations.

@nudles
Copy link
Member Author

nudles commented May 15, 2020

I guess "high/low priority" refers to the preference? In the "low priority" option, the save and load is non intrusive/not part of class module.
Personally, serialization, is a mechanism telling how to handle a class, is preferred to be non intrusive/ not part of the class. The class itself is a structure. For example mymodel = MyModel(); mymodel.load(fp). Maybe can be mymodel=singa.module.load(fp)

we provide two approaches.

  1. save and load as class method. The disk file is in onnx format. Major application scenario: checkpoint and restore training. Or rename is to checkpoint() and restore().
  2. singa.save() and singa.load(). Major application scenario: we do not have the code of MyModule and only have the serialized model. The disk file is in pickle format as we use pickle to serialize MyModule class.

@nudles
Copy link
Member Author

nudles commented May 15, 2020

Updated on May 15 Night

class Layer:
   def get_params(self):
       """the params of this layer and sublayers as a dict;  param name is: layername.param
           e.g., self.W = Tensor(), self.b=Tensor()
                  name of W and b is  like conv1.W and conv1.b  
       """

    def get_states(self):
       """states of this layer as sublayers that are necessary for model training/evaluation/inference.
           the states include the params and others, e.g., the running mean and var of batchnorm.
      """

class Module(Layer):   
  def compile(self ...):
     """set the name of each layer and sublayers, which will be used to create the dict 
          for get_params and get_states. Then no need to manually config the layer name 
         the __init__ method of a layer.
 
        For instance,
        class Blk(Layer):
             def __init__(self):
                  self.conv1= Conv2d()
                  self.conv2 = Conv2d()

        class MyModel(Module):
              def __init__(self):         
                 self.blk1 = Blk() --> blk1.conv1, blk1.conv2
                 self.blk2 = Blk()  --> blk2.conv1, blk2.conv2
   """

  # high priority
  def save_states(self, fpath, aux_states={}):
      """Save states.
      
      Args:
         fpath: output file path (without the extension)
         aux_states(dict): values are standard data types or Tensor, 
                                   e.g., epoch ID, learning rate, optimizer states
      """
      states = get_states() + aux_states + input_placeholders
      tensor_dict = {}
      for k, v in states:
           if type(v) is Tensor:
             tensor_dict[k] = v
             states[k] = {'shape': v.shape, 'dtype': v.dtype}
      save states as json file
      save tensor_dict via numpy or hdf5 or protobuf
      zip the output files

  def load_states(self, fpath, dev, use_graph=True, graph_alg='sequence'):
      """Load the model onto dev
   
       Args:
         path: input file path (without the extension)
       Returns:
          dict 
      ```
      unzip the input file
      load the json file --> states
      load the tensor files --> tensor_dict
      put the tensors into states
      states --> model_states + input_placeholders + aux_states
      self.compile(input_placeholders, dev, use_graph, graph_alg)
     model.set_states(model_states) 
     return the rest states as a dict

# lower priority
def save(fpath, model):
    attributes <-- model
    replace all tensors in attributes --> {'shape': v.shape, 'dtype': v.dtype}
    dump the tensors via numpy or protobuf or hdf5
    dump model via pickle
    zip the output files

def load(fpath, dev, use_graph, graph_alg):
     unzip the input file
     load model via pickle
     load tensors 
     restore the tensors in model attributes
     return the model


# handle ONNX 
def to_onnx(model):
    return a onnx model 

class SONNXModel(Module):
     def __init__(self, onnx_model):
          self.store_output = store_output
          for layer_name, layer_config in get_layer(onnx_model):
              self.__dict__[layer_name] = CreateLayer(...)

    def forward(self, aux_output):
          run forward according to onnx graph 
         return the last output + aux_output

class MyModel(SONNXModel):
     def __init__(self, onnx):
          super.__init__(onnx)
          self.layer1 = Conv()
          self.layer2 = Conv()

     def forward(self, x):
           x1, x2 = super.forward(x, aux_output)
           x = self.layer1.forward(x2)
           return self.layer2.forward(x1) + x

      def train_one_batch(self, x, y):
           y_ = self.forward(x)
           ....

Clarification:

  • Params: layer parameters (Tensor) that are updated via SGD. Layer.get_params()
  • States: Params + other variables that are necessary for model evaluation/inference. Superset of params. Layer.get_states()
  • Attributes: members of a class instance class.__dict__. Superset of states.

@XJDKC
Copy link
Member

XJDKC commented May 15, 2020

If we have the model stats, we can recreate the params. Do the placeholders still make sense? I think we don't need to compile the module if we use the set_states function.

@nudles
Copy link
Member Author

nudles commented May 16, 2020

If we have the model stats, we can recreate the params. Do the placeholders still make sense? I think we don't need to compile the module if we use the set_states function.

The API is a bit ugly.. But we need to compile() to create the handles, which are not serialized as states.

@XJDKC
Copy link
Member

XJDKC commented May 16, 2020

If we have the model stats, we can recreate the params. Do the placeholders still make sense? I think we don't need to compile the module if we use the set_states function.

The API is a bit ugly.. But we need to compile() to create the handles, which are not serialized as states.

Got it. I thought handles were also state info.

@dcslin
Copy link
Member

dcslin commented May 16, 2020

current save params and states

stateful Layers Params States
Linear W, b  
Conv2D W, b  
SeparableConv2d(2 sub layers - Conv2D)    
BatchNorm2d scale, bias running_mean, running_var
RNN Wx, Wh, b  
LSTM Wx * 4, Wh * 4, Bx * 4, Bh * 4  
CudnnRNN(no in master) W  

@joddiy
Copy link
Member

joddiy commented May 16, 2020

# handle ONNX 
def to_onnx(model):
    return a onnx model 

class SONNXModel(Module):
    def __init__(self, onnx_model):
        singa_rep = sonnx.prepare(onnx_model, device=dev, batchsize=1)
        for layer_name, layer in singa_rep.layers:
            self.__dict__[layer_name] = layer
        # store weights here as numpy
        for weith_name, weight in singa_rep.weights:
            self.weights[weith_name] = weight
        # store layer info such as input and output name(only weights)
        for layer_name, layer_info in singa_rep.layer_infos:
            self.layer_infos[layer_name] = layer_info

    def forward(self, aux_output):
        # run forward according to onnx graph 
        return the last output + aux_output

    def compile(self)
        # init weights
        super.compile(self)
        # set weights' value
        for layer_name, layer in self.__dict__:
            input_info, output_info = self.layer_infos[layer_name]
            for input_name in input_info:
                layer.set_weight(self.weights[input_name])

class MyModel(SONNXModel):
     def __init__(self, onnx):
          super.__init__(onnx)
          self.layer1 = Conv()
          self.layer2 = Conv()

     def forward(self, x):
           x1, x2 = super.forward(x, aux_output)
           x = self.layer1.forward(x2)
           return self.layer2.forward(x1) + x

      def train_one_batch(self, x, y):
           y_ = self.forward(x)
           ....

How about this one, we pareses onnx by soon.prepare(Backend), it returns a singa_rep(BackendRep), and the singa_rep contains the layers, weights and input_output_info, we store the layers in self.__dict__. When we compile the model, first we call super() to init the params, then we set its value from the onnx loaded weights.

@nudles
Copy link
Member Author

nudles commented May 16, 2020

Pls check my inline comments starting with **

# handle ONNX 
def to_onnx(model):
    return a onnx model 

class SONNXModel(Module):
    def __init__(self, onnx_model):  ** need to pass the dev as an argument.
        singa_rep = sonnx.prepare(onnx_model, device=dev, batchsize=1)
        for layer_name, layer in singa_rep.layers:
            self.__dict__[layer_name] = layer
        # store weights here as numpy
        for weith_name, weight in singa_rep.weights:
            self.weights[weith_name] = weight
        # store layer info such as input and output name(only weights)
        for layer_name, layer_info in singa_rep.layer_infos:
            self.layer_infos[layer_name] = layer_info

    def forward(self, aux_output):
        # run forward according to onnx graph 
        return the last output + aux_output

    def compile(self)   
        # init weights
        super.compile(self)   ** args like dev, use_graph, graph_alg should be passed.
        # set weights' value
        for layer_name, layer in self.__dict__:
            input_info, output_info = self.layer_infos[layer_name]
            for input_name in input_info:
                layer.set_weight(self.weights[input_name])   ** remember to release self.weights to free memory.

class MyModel(SONNXModel):
     def __init__(self, onnx):
          super.__init__(onnx)
          self.layer1 = Conv()
          self.layer2 = Conv()

     def forward(self, x):
           x1, x2 = super.forward(x, aux_output)
           x = self.layer1.forward(x2)
           return self.layer2.forward(x1) + x

      def train_one_batch(self, x, y):
           y_ = self.forward(x)
           ....

How about this one, we pareses onnx by soon.prepare(Backend), it returns a singa_rep(BackendRep), and the singa_rep contains the layers, weights and input_output_info, we store the layers in self.__dict__. When we compile the model, first we call super() to init the params, then we set its value from the onnx loaded weights.

It's good to reuse singa_rep.
To use MyModel,

ox = onnx.load(fpath)
m = MyModel(ox)
m.compile([x]...)

@joddiy
Copy link
Member

joddiy commented May 16, 2020

# handle ONNX 
def to_onnx(model):
    return a onnx model 

class SONNXModel(Module):
    def __init__(self, onnx_mode): 
        singa_rep = sonnx.prepare(onnx_model) # will update the prepare function to remove device and batchsize
        for layer_name, layer in singa_rep.layers:
            self.__dict__[layer_name] = layer
        # store weights here as numpy
        for weith_name, weight in singa_rep.weights:
            self.weights[weith_name] = weight
        # store layer info such as input and output name(only weights)
        for layer_name, layer_info in singa_rep.layer_infos:
            self.layer_infos[layer_name] = layer_info

    def forward(self, aux_output):
        # run forward according to onnx graph 
        return the last output + aux_output

    def compile(self, inputs, is_train, use_graph, graph_alg)
        # init weights
        super.compile(self, inputs, is_train, use_graph, graph_alg)
        # set weights' value
        for layer_name, layer in self.__dict__:
            input_info, output_info = self.layer_infos[layer_name]
            for input_name in input_info:
                layer.set_weight(self.weights[input_name])   ** remember to release self.weights to free memory.

class MyModel(SONNXModel):
     def __init__(self, onnx):
          super.__init__(onnx)
          self.layer1 = Conv()
          self.layer2 = Conv()

     def forward(self, x):
           x1, x2 = super.forward(x, aux_output)
           x = self.layer1.forward(x2)
           return self.layer2.forward(x1) + x

      def train_one_batch(self, x, y):
           y_ = self.forward(x)
           ....

ox = onnx.load(fpath)
x = Placeholder((2, 3), device = gpu, dtype=singa.float) # alias of Tensor
m = MyModel(ox)
# compatible with existing code which does not have the following two statements.
m.compile([x], is_train=True, use_graph=True, graph_alg='sequence')

y = Placeholder((2,), device = gpu)
for npx, npy in data:
   x.copy_from(npx)
   y.copy_from(npy)
   m.train_one_batch(x, y)  # build the graph in the first iter.  For the old code, the params are initialized here.

update code with the comments with **

And I need to update the current SingaBackend and SingabackendRep, in the SingaBackend, we won't create tensors, we only create layers and store the weights as numpy array. We postpone the tensor creation to SingabackendRep.run to make this API be uniform with the above API.

@nudles
Copy link
Member Author

nudles commented May 16, 2020

To be consistent, I think we'd better always call m.compile() explicitly?

m=MyModel()
m.compile([x], use_graph=True)
m.load_states(fpath)

m=MyONNXModel(onnx_model)
m.compile([x], use_graph=True)

m=singa.load(fpath)
m.compile([x], use_graph=True)

Then the load_states() only has a single argument, i.e., fpath.

Any better solution?

@joddiy
Copy link
Member

joddiy commented May 16, 2020

To be consistent, I think we'd better always call m.compile() explicitly?

m=MyModel()
m.compile([x], use_graph=True)
m.load_states(fpath)

m=MyONNXModel(onnx_model)
m.compile([x], use_graph=True)

m=singa.load(fpath)
m.compile([x], use_graph=True)

Then the load_states() only has a single argument, i.e., fpath.

Any better solution?

Actually, in the above sonnx API, we merge load_states into compile, right?
How about this one:

m=MyModel(path) # check the file is a model or just states
m.compile([x], use_graph=True) # do m.load_states(fpath) within compile

m=MyONNXModel(onnx_model)
m.compile([x], use_graph=True)

m=singa.load(fpath)
m.compile([x], use_graph=True)

@nudles
Copy link
Member Author

nudles commented May 16, 2020

  1. I think you can still store numpy arrays in singabackend. Copy data from numpy array into the param tensors directly later.
  2. The new APIs are more consistent. But users then do not call load_states explicitly. They do call save_states() explicitly. Not symmetric.. @dcslin any comments?

@dcslin
Copy link
Member

dcslin commented May 16, 2020

Quoted from @joddiy , m=MyModel(path) # check the file is a model or just states, this case, user know MyModel class, so just loading states right?

From the perspective of a new onnx user, please let me know if this part is not correct.
singa_rep: model like singa.Module
singa.save()/singa.load(): save/load model w/o states
singa.Module.save_states()/load_states(): save/load model states only

Use case 1, load model from onnx file

class MySONNXModel(SONNXModel):
    pass # so we know the structure of model already?

# load from onnx model
onnx_model=onnx.load('./saved_models/onnx_model_downloaded')
m1=MySONNXModel(onnx_model) # so we know the structure of model already?
m1.compile([placeholder_x], ...)
for _ in data:
    m1.train_one_batch(_)

use case 2: save states and model

# save
m1.save_states('./saved_models/my_checkpoint_1')
singa.save('./saved_models/my_model_1', m1)

use case 3 load model and states from disk

# Later reuse the model
m2=singa.load('./saved_models/my_model_1')
m2.load_states('./saved_models/my_checkpoint_1')
m2.compile([placeholder_x], use_graph=True)

use case 4 load states only

# singa model is known
class MyModel(Module):
    pass

m3=MyModel(states_path='./saved_models/my_checkpoint_1') # could only be states, right?
# m3=MyModel('./saved_models/my_model_1') # could not be saved_model right? since we know the model
m3.compile(...)

To be frank, I am a bit overwhelmed by all the discussions not just in this issue, is it possible to consolidate the new API into a specification including example in singa-doc? Which is useful for new users? btw, is API in onnx-doc gonna change?

@nudles
Copy link
Member Author

nudles commented May 16, 2020

Here is the latest summary: https://gist.github.com/nudles/d7f8043f251872333ec06f2701696cce

APIs in onnx-doc should be backward-compatible.

@chrishkchris
Copy link
Contributor

save and load functions are now available on 3.1 Model Class

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants