Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net design with NetBuilder #2598

Merged
merged 32 commits into from
Jul 4, 2017
Merged

Conversation

Superjomn
Copy link
Contributor

A network design with both protobuf and AddOp support.

# Network Design

`Network` is the container and controller of a set of operators,
users can build a real network from a `NetDef` in protobuf message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should give a simple NetDef proto message

// will run.
virtual bool Run(Scope *scope, OpIndex begin = -1,
OpIndex end = -1) const = 0;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussion in Operator Design, Run method in Operator will have a context parameter. So, I think the Run method in NetBase may also need a context parameter. Or, we can pass the context to operator run inside this method.

Copy link
Member

@QiJune QiJune Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the NetBase Run method is what an engine should do. The engine will get variables from scope, and get runtime resources from context, and then execute.
The runtime resources may include cuda stream, cublasHandle, cudnnHandle.

Copy link
Contributor Author

@Superjomn Superjomn Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add OpContext as NetBase.Run's argument. @QiJune
A NaiveExecutor may be necessary for the future implementation.

```c++
// create an empty scope located on CPU device.
Scope scope(CPUPlace());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we also need to create a context related with a device.

Scope scope(CPUPlace());
Context context(CPUPlace());
...
net->Run(&Scope, &context);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the offline discussion, we decide to create a context inside the net run method. Then, the context will pass to every operator's run method.

// scope will be used instead.
virtual bool Run(Scope *scope = nullptr, OpIndex begin = -1,
OpIndex end = -1) const override;

Copy link
Member

@QiJune QiJune Jun 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible implementation coule be:

virtual bool Run(Scope* scope = nullptr, OpIndex begin = -1, OpIndex end = -1) const override {
  Context* c = CreateContext(CPUPlace());
  for (auto op in ops_) {
    op->Run(c);
  }
}

# Network Design

`Network` is the container and controller of a set of operators,
users can build a real network from a `NetDef` which is a protobuf message
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, users can --> . Users can

// The minimum a network should be implemented.
class NetBase {
public:
// run all the operators and return success(true) or not, all the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. run --> Run
  2. or not --> or not (false)
  3. ", all the variables are located" --> ", with all the variables located" . 不要用逗号连接两个完整的句子,除非有合适的连词。

// variables are located in `scope`. `ctx` describes the detail execution
// environment for ops. `begin` and `end` specify the scope of `ops_` to run,
// If no positive indexes are provided, all operators in `ops_` will run.
virtual bool Run(Scope *scope, OpContext *ctx, OpIndex begin = -1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the word "context" is not too long, why not use context instead of ctx for a better readability?


// Infer the shapes of variables required by operators in the network. The
// `scope` will be mutated according to the inferred shapes.
virtual bool InferShape(Scope *scope) = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name "InferShape" is a little ambiguous, since it does not return a shape in fact.
"CheckVariableShape" or "ValidateVariableShape" or "CheckShape" or "ValidateShape" ?

Copy link
Contributor Author

@Superjomn Superjomn Jun 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can remove this function? Each operator has its own InferShape', just call op.InferShapeinBaseNet.AddOp. Make the AddOpthe only method to add an operator to a network, so thatop.InferShapewill be called every time automatically, no need to call BaseNet.InferShape` manually.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InferShape may invoke in each mini-batch run. Because the input shape will be changed in each mini-batch.

```

Network is designed as the container of operators, to make it more extendable,
we decoupling it from the related variable resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. “, to make it" --> ". To make it"
  2. "we decoupling it" --> "we decouple it"


`Run(Scope* scope)` takes the scope as a argument so that it can run in different scopes.

Finally, `NetBase` can be used as followed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as followed --> as follows


## `PlainNet` as a simple implementation of `BaseNet`

A very basic implementation is as followed, all it does is simply to run every operators in sequence.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as followed, all it" --> "as follows. All it"

// Add a operator which is identified as `type` and has attributes described
// in `attrs`, the `inputs` are the keys of readonly input variables,
// `outputs` are keys of mutable output variables. An `OpIndex` will be
// returned which indicates the offset of the new operator in `ops_`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which indicates --> to indicate


## Compatibility with RNN

Benefit from the decoupling of `PlainNet.Run` and `Scope`, `PlainNet` is compatible with future RNN design,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benefit --> Benefitting

@@ -0,0 +1,251 @@
# Network Design

`Network` is the container and controller of a set of operators,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller => execution engine

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the framework have an independent Executor?
or merge the Executor into `Network'? @wangkuiyi

user can build a real network from a `NetDef` which is a protobuf message
and use `Network.Run()` to run all the operators in the network.

The `Network` will
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A network object knows all Operators belonging to this network. Variables, which are inputs and outputs of these operators, are created and managed by a hierarchy of Scope objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

typedef int OpIndex;

// The minimum a network should be implemented.
class NetBase {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetBase ==> Net, because our other base classes, including Operator, don't have the Base suffix. Let's unify the style.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

virtual OpIndex AddOp(const proto::OpDef &def) = 0;

// Add optimizer operators acctording to `attrs`.
virtual void AddOptimizerOps(const OptAttrs &attrs) = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "optimizer op"? What's the difference between "optimzier op" and "op" as in AddOp?

Copy link
Contributor Author

@Superjomn Superjomn Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddOp add normal forward operators and is called by user or network's constructor.
OptimizerOp is the optimizer (adam for example) on model parameter, network will scan all the forward operators and get their parameters, then add optimizer operators for each parameter.

User can access the AddOp function, but AddOptimizerOps should be called by the network after all the forward operators are added.

// variables are located in `scope`. `context` describes the detail execution
// environment for ops. `begin` and `end` specify the scope of `ops_` to run,
// If no positive indexes are provided, all operators in `ops_` will run.
virtual bool Run(Scope *scope, OpContext *context, OpIndex begin = -1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose that the current Net represent an RNN's step-net, what scope should the parameter scope points to? -- the scope of the current step-net, or its parent?

Copy link
Contributor Author

@Superjomn Superjomn Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Scope's design, there is a member std::shared_ptr<Scope> parent_ {nullptr}; ,
which is a reference to the pararent's scope.

So this scope means the current stepnet's scope, network can get its pararent's scope by access its parent_.

reference Scope Design

class PlainNet final : public NetBase {
public:
// Create a network describe by `def`. NetDef is the definition of a network.
PlainNet(const NetDef &def);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NetDef ==> NetDesc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


protected:
// Create operators accordding to `def`, will be called by the constructor.
bool BuildNet(const NetDef &def);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why BuildNet returns a bool value? How would the constructor treat it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a Status or Error return type here.


```c++
// copy some `vars` form `source` to `target`
void Copy(const Scope &source, Scope &target,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how should we define this function. Can you please complete it so to make this example comprehensive?

Copy link
Contributor Author

@Superjomn Superjomn Jun 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just several lines of pseudo code here, maybe I should remove this RNN demo.

for (int i = 0; i < num_states + 1; i++) {
// Initialize all rnn state scopes, copy parameters and so on.
rnn_states[i].CreateVars(rnn_step_net_def);
Copy(default_scope, rnn_states[i], rnn_related_vars);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How comes this rnn_related_vars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a pseudo code, this means RNN related variables' string keys, rnn_related_vars should be set during RNN step net's construction.

## Compatibility with RNN

Benefitting from the decoupling of `PlainNet.Run` and `Scope`, `PlainNet` is compatible with future RNN design,
for example we can implement a simple recurrent neural network as follows
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the following code snippet supposed to be? In the constructor of RNNOp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a demo code snippet, to make sure the net is compatible with Scope in RNN's implementation.

Copy link
Collaborator

@wangkuiyi wangkuiyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approved this PR, because it provides a general skeleton of Net. We can have futher discussions and polish in coming PRs.

@Superjom : I'd suggest that we verify the design with some "real" examples, other than pseudo-code snippets, in following PRs.

@Superjomn Superjomn merged commit 06156da into PaddlePaddle:develop Jul 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants