-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python user interface design #3688
python user interface design #3688
Conversation
doc/design/python/user_interface.md
Outdated
|
||
## Basic Concepts | ||
### Variable | ||
A `Variable` represents shared, persistent state manipulated by a Paddle model program. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model program?
to get the value of the variable, one can call | ||
|
||
```python | ||
print v.val() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think v.eval() is better. because val() means just to get the value, eval() means to calculate and get the value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v.val()
will only return a Python numpy value without executing a sub-graph. v.eval()
may just be a wrapper of pd.eval
and will implement it latter if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
One can freeze a variable by setting `trainable` to `False` like: | ||
|
||
```python | ||
v = pd.Variable(shape=[20,20], trainable=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also change the state during running.
v = pd.Variable(...)
v.trainable = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all the arguments passed to __init__
are members of pd.Varaible
, so free to change like that.
|
||
Take `pd.fc` for example, one can use it like this | ||
```python | ||
out = pd.fc(in, param_names=['W']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
out = pd.fc(int, W="w", B="b")
…user-interface-design2
Each trainable variable has a initialize Op. | ||
|
||
#### Optimizer Ops | ||
These ops will help to optimize trainable variables after backward propagation finished, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backpropagation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several defects in this design.
- It is not users' responsibility to configure what shape of weight should be. It will be very noising.
- So, Never set shapes for weight in the sample code.
- We must inference the shape of operator outputs line by line. Every time we create an operator, we must know the output shape of that operator immediately.
- We should give an EXACTLY SAME PADDLE.V2 API for mnist demo in the current design.
- Where to store device information?
- Using
with
statement to switch global block is a good idea, but:with
is the key-word only used by Python. Should we provide that API? Or just usecallback function
, becausefunction
is a common concept in every programming language.- How can the user configure two neural networks independently? Because the
with
modify the same global instance, it is no other way that user does not use that instance. If I misunderstood this, please give a sample code for the following situation. The user uses mnist dataset, and configure a convolutional network and a fast forward network in same program. He needs to know which network is better. So he runs same mini-batch both on two networks step by step.
- What the different between
is_trainable
andis_param
for Variable?- I know the parameter is not necessarily trainable. But is there any reason that we mark a Variable is a parameter but is not trainable?
- If the variable is not trainable, we just feed it in the first mini-batch. It is a
constant
data variable. - Moreover,
is_param
is used by whom? I do not figure out where is using or should useis_param
.
- What's the different between
Block
and C++NetOp
. Could PythonBlock
class be implemented in C++? I did not see the necessity to implementBlock
in Python. It could be simpler to implement it in C++ because the__extract_op_from_block
is not needed. - Please give a consistent name in this design. There are
cmd
andop
,namespace
,guard
andblock
. There are too many new concepts and global states in this design. I think we must keep our concepts and global stats as few as we can. Are they necessary to implement our Python API??
|
||
```python | ||
# same as | ||
v = pd.get_variable(name="v", shape=[20, 20]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_varibale from what scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_variable from current namespace, in user's model config, they need only one global scope, but different namespace(add prefix to var's name).
The difference between multiple scope and namespace is that scopes will turn into a forest, different sub-scopes without the same ancestor is totally separated, but namespace doesn't suffer from this:
- all the variable are located in a global scope
- different namespaces have a different name prefix
''' | ||
namespace: str | ||
''' | ||
self.cmds = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe ops is better than cmds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
block may contain block ...
{// block
{// sub-block
}
}
ops = [] | ||
for cmd in cmds: | ||
if type(cmd) is Block: | ||
child_ops = self.__extract_op_from_block([cmd]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we must extract op from block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, when a block executes, it will create a NetOp and run it, so operators from multiple blocks should be extracted and inserted into a NetOp.
This may be changed later to make it more natural.
shape=[], | ||
data=None, | ||
initialzier=None, | ||
scope=g_scope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In python, complex instance cannot be default value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [1]: class A(object):
...: def __init__(self):
...: self.x = None
...:
In [2]: def f(x=A()):
...: print x
...:
In [3]: f()
<__main__.A object at 0x10c7cc050>
this seems works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
G_solver = pd.AdamOptimizer().minimize(G_loss, var_list=theta_G) | ||
|
||
# init all parameters | ||
initializer = pd.variable_initialzier() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable_initialziers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
to reference Variable across different Blocks. | ||
''' | ||
|
||
stack = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How could a user want two global namespaces? stack
is a data member in class scope. So all instances of Namespace
share the same stack
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, there needs only one namespace.stack, because user's config is parsed using only one thread.
|
||
def __exit__(self): | ||
Namespace.end() | ||
block_guard.cur_block = block_guard.last_block |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the current block is dropped in Python. Which object could store that block? Should that block be freed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sub-block will inserted into its father-block, so it will not be freed
// father-scope [ op1, op2, sub-block, op3, op4]
{// father-scope
{// sub-scope
}
}
counter = 0 | ||
__varset__ = set() | ||
|
||
def __init__(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
正如我们之前电话沟通过的,只通过名字,是无法定位运行到神经网络的哪个子图的。因为在一个神经网络中,可以有两个Op输出同名的Variable。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个问题,后来想了一下,貌似终点是op也可能碰到同样的问题? 比如以可以有两个Op输出同名的Variable
这个Variable作为input的Op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, each variable will have a unique name across all the sub-graphs, using "var-%d" % (Var.couter++)"
A specific namespace will add some prefix to a Variable's name, and help to support local variables.
|
||
Take `pd.fc` for example, one can use it like this | ||
```python | ||
out = pd.fc(in, param_names=['W']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is parameter
w
defined outside of fc layer or inside the layer? - If a layer has more than one parameter, how can we know the name correspondence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently, maybe layer create parameters better. the big difference between a op and a layer.
doc/design/python/user_interface.md
Outdated
label = pd.Variable(shape=[None, 1]) | ||
|
||
# network config | ||
W1 = pd.Variable('W1', shape=[128, 64]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name = 'W1' ?
|
@Superjom
另外
隐藏的复杂性也是复杂性。真的有必要引入这么多新的概念么?可不可以用更少的概念实现这个设计? 譬如,namespace看起来和guard是配合使用的,也许可以合并起来。当然,我并不完全能够掌握这个设计,可能有些吹毛求疵了 |
all the sub-models is free to use global variables as inputs, all the temp output of op/layers will have a unique name (all the variables will have a different name), no sub-models will change the same variable as output. a minist with multiple sub-models, a import paddle as pd
image = pd.Variable([None, 128])
label = pd.Variable([None, 1])
def FC_model():
#with pd.namespace('fc_model'):
# every Varialbe will get a unique name like "var-%d"%(Variable.counter++)
# both layer's parameters and temp outputs are stored in a global scope with unique names
# so it is OK to configure a submodel in a python function
# and any variable can be passed as argument across all the python scopes.
fc_out = pd.fc(image)
pred = pd.softmax(fc_out, size=10)
return pred
def CNN_model():
#with pd.namespace('fc_model'):
out = pd.conv_group(image, xxx)
pred = pd.softmax(out, size=10)
return pred
def data_reader(path):
xxxx
yield batch
def run_model(pred, batch):
cost = pd.cross_entropy(pred, label)
optimizer = pd.SGDOptimizer().minimize([cost])
_, cost_v = pd.eval([optimizer, cost], feeds={image:batch[0], label:batch[1]})
return cost_v
data_provider = data_reader('./data.txt')
a_batch_of_data = data_provider.next()
print 'fc_cost', run_model(FC_model(), a_batch_of_data)
print 'cnn_cost', run_model(CNN_model(), a_batch_of_data) About new concepts:
|
merge namespace with its guard, good idea. I merge Block with its guard, but forget to merge the namespace's, will update the code latter. |
some concepts.
resolves: #3652