-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net design with NetBuilder #2598
Conversation
paddle/framework/net_design.md
Outdated
# Network Design | ||
|
||
`Network` is the container and controller of a set of operators, | ||
users can build a real network from a `NetDef` in protobuf message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should give a simple NetDef proto message
paddle/framework/net_design.md
Outdated
// will run. | ||
virtual bool Run(Scope *scope, OpIndex begin = -1, | ||
OpIndex end = -1) const = 0; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussion in Operator Design, Run method in Operator will have a context parameter. So, I think the Run method in NetBase may also need a context parameter. Or, we can pass the context to operator run inside this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the NetBase Run method is what an engine should do. The engine will get variables from scope, and get runtime resources from context, and then execute.
The runtime resources may include cuda stream, cublasHandle, cudnnHandle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add OpContext
as NetBase.Run
's argument. @QiJune
A NaiveExecutor
may be necessary for the future implementation.
```c++ | ||
// create an empty scope located on CPU device. | ||
Scope scope(CPUPlace()); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we also need to create a context related with a device.
Scope scope(CPUPlace());
Context context(CPUPlace());
...
net->Run(&Scope, &context);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the offline discussion, we decide to create a context inside the net run method. Then, the context will pass to every operator's run method.
paddle/framework/net_design.md
Outdated
// scope will be used instead. | ||
virtual bool Run(Scope *scope = nullptr, OpIndex begin = -1, | ||
OpIndex end = -1) const override; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A possible implementation coule be:
virtual bool Run(Scope* scope = nullptr, OpIndex begin = -1, OpIndex end = -1) const override {
Context* c = CreateContext(CPUPlace());
for (auto op in ops_) {
op->Run(c);
}
}
paddle/framework/net_design.md
Outdated
# Network Design | ||
|
||
`Network` is the container and controller of a set of operators, | ||
users can build a real network from a `NetDef` which is a protobuf message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, users can
--> . Users can
paddle/framework/net_design.md
Outdated
// The minimum a network should be implemented. | ||
class NetBase { | ||
public: | ||
// run all the operators and return success(true) or not, all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- run --> Run
- or not --> or not (false)
- ", all the variables are located" --> ", with all the variables located" . 不要用逗号连接两个完整的句子,除非有合适的连词。
paddle/framework/net_design.md
Outdated
// variables are located in `scope`. `ctx` describes the detail execution | ||
// environment for ops. `begin` and `end` specify the scope of `ops_` to run, | ||
// If no positive indexes are provided, all operators in `ops_` will run. | ||
virtual bool Run(Scope *scope, OpContext *ctx, OpIndex begin = -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the word "context" is not too long, why not use context
instead of ctx
for a better readability?
paddle/framework/net_design.md
Outdated
|
||
// Infer the shapes of variables required by operators in the network. The | ||
// `scope` will be mutated according to the inferred shapes. | ||
virtual bool InferShape(Scope *scope) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name "InferShape" is a little ambiguous, since it does not return a shape in fact.
"CheckVariableShape" or "ValidateVariableShape" or "CheckShape" or "ValidateShape" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can remove this function? Each operator has its own InferShape', just call
op.InferShapein
BaseNet.AddOp. Make the
AddOpthe only method to add an operator to a network, so that
op.InferShapewill be called every time automatically, no need to call
BaseNet.InferShape` manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InferShape may invoke in each mini-batch run. Because the input shape will be changed in each mini-batch.
paddle/framework/net_design.md
Outdated
``` | ||
|
||
Network is designed as the container of operators, to make it more extendable, | ||
we decoupling it from the related variable resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- “, to make it" --> ". To make it"
- "we decoupling it" --> "we decouple it"
paddle/framework/net_design.md
Outdated
|
||
`Run(Scope* scope)` takes the scope as a argument so that it can run in different scopes. | ||
|
||
Finally, `NetBase` can be used as followed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as followed --> as follows
paddle/framework/net_design.md
Outdated
|
||
## `PlainNet` as a simple implementation of `BaseNet` | ||
|
||
A very basic implementation is as followed, all it does is simply to run every operators in sequence. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"as followed, all it" --> "as follows. All it"
paddle/framework/net_design.md
Outdated
// Add a operator which is identified as `type` and has attributes described | ||
// in `attrs`, the `inputs` are the keys of readonly input variables, | ||
// `outputs` are keys of mutable output variables. An `OpIndex` will be | ||
// returned which indicates the offset of the new operator in `ops_`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which indicates --> to indicate
paddle/framework/net_design.md
Outdated
|
||
## Compatibility with RNN | ||
|
||
Benefit from the decoupling of `PlainNet.Run` and `Scope`, `PlainNet` is compatible with future RNN design, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benefit --> Benefitting
@@ -0,0 +1,251 @@ | |||
# Network Design | |||
|
|||
`Network` is the container and controller of a set of operators, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
controller => execution engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the framework have an independent Executor
?
or merge the Executor
into `Network'? @wangkuiyi
paddle/framework/net_design.md
Outdated
user can build a real network from a `NetDef` which is a protobuf message | ||
and use `Network.Run()` to run all the operators in the network. | ||
|
||
The `Network` will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A network object knows all Operators belonging to this network. Variables, which are inputs and outputs of these operators, are created and managed by a hierarchy of Scope objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/framework/net_design.md
Outdated
typedef int OpIndex; | ||
|
||
// The minimum a network should be implemented. | ||
class NetBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NetBase ==> Net, because our other base classes, including Operator
, don't have the Base
suffix. Let's unify the style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/framework/net_design.md
Outdated
virtual OpIndex AddOp(const proto::OpDef &def) = 0; | ||
|
||
// Add optimizer operators acctording to `attrs`. | ||
virtual void AddOptimizerOps(const OptAttrs &attrs) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is an "optimizer op"? What's the difference between "optimzier op" and "op" as in AddOp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddOp
add normal forward operators and is called by user or network's constructor.
OptimizerOp
is the optimizer (adam for example) on model parameter, network will scan all the forward operators and get their parameters, then add optimizer operators for each parameter.
User can access the AddOp
function, but AddOptimizerOps
should be called by the network after all the forward operators are added.
paddle/framework/net_design.md
Outdated
// variables are located in `scope`. `context` describes the detail execution | ||
// environment for ops. `begin` and `end` specify the scope of `ops_` to run, | ||
// If no positive indexes are provided, all operators in `ops_` will run. | ||
virtual bool Run(Scope *scope, OpContext *context, OpIndex begin = -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose that the current Net
represent an RNN's step-net, what scope should the parameter scope points to? -- the scope of the current step-net, or its parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Scope
's design, there is a member std::shared_ptr<Scope> parent_ {nullptr};
,
which is a reference to the pararent's scope.
So this scope means the current stepnet's scope, network can get its pararent's scope by access its parent_
.
reference Scope Design
paddle/framework/net_design.md
Outdated
class PlainNet final : public NetBase { | ||
public: | ||
// Create a network describe by `def`. NetDef is the definition of a network. | ||
PlainNet(const NetDef &def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NetDef ==> NetDesc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
paddle/framework/net_design.md
Outdated
|
||
protected: | ||
// Create operators accordding to `def`, will be called by the constructor. | ||
bool BuildNet(const NetDef &def); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why BuildNet returns a bool value? How would the constructor treat it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a Status
or Error
return type here.
|
||
```c++ | ||
// copy some `vars` form `source` to `target` | ||
void Copy(const Scope &source, Scope &target, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how should we define this function. Can you please complete it so to make this example comprehensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just several lines of pseudo code here, maybe I should remove this RNN demo.
for (int i = 0; i < num_states + 1; i++) { | ||
// Initialize all rnn state scopes, copy parameters and so on. | ||
rnn_states[i].CreateVars(rnn_step_net_def); | ||
Copy(default_scope, rnn_states[i], rnn_related_vars); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How comes this rnn_related_vars
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a pseudo code, this means RNN related variables' string keys, rnn_related_vars
should be set during RNN step net's construction.
## Compatibility with RNN | ||
|
||
Benefitting from the decoupling of `PlainNet.Run` and `Scope`, `PlainNet` is compatible with future RNN design, | ||
for example we can implement a simple recurrent neural network as follows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the following code snippet supposed to be? In the constructor of RNNOp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a demo code snippet, to make sure the net is compatible with Scope
in RNN's implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved this PR, because it provides a general skeleton of Net. We can have futher discussions and polish in coming PRs.
@Superjom : I'd suggest that we verify the design with some "real" examples, other than pseudo-code snippets, in following PRs.
A network design with both protobuf and AddOp support.