Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python user interface design #3688

Closed

Conversation

Superjomn
Copy link
Contributor

@Superjomn Superjomn commented Aug 25, 2017

some concepts.

resolves: #3652


## Basic Concepts
### Variable
A `Variable` represents shared, persistent state manipulated by a Paddle model program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model program?

to get the value of the variable, one can call

```python
print v.val()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think v.eval() is better. because val() means just to get the value, eval() means to calculate and get the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.val() will only return a Python numpy value without executing a sub-graph. v.eval() may just be a wrapper of pd.eval and will implement it latter if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

One can freeze a variable by setting `trainable` to `False` like:

```python
v = pd.Variable(shape=[20,20], trainable=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also change the state during running.

v = pd.Variable(...)
v.trainable = false

Copy link
Contributor Author

@Superjomn Superjomn Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the arguments passed to __init__ are members of pd.Varaible, so free to change like that.


Take `pd.fc` for example, one can use it like this
```python
out = pd.fc(in, param_names=['W'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

out = pd.fc(int, W="w", B="b")

Each trainable variable has a initialize Op.

#### Optimizer Ops
These ops will help to optimize trainable variables after backward propagation finished,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backpropagation?

Copy link
Collaborator

@reyoung reyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several defects in this design.

  1. It is not users' responsibility to configure what shape of weight should be. It will be very noising.
  • So, Never set shapes for weight in the sample code.
  • We must inference the shape of operator outputs line by line. Every time we create an operator, we must know the output shape of that operator immediately.
  1. We should give an EXACTLY SAME PADDLE.V2 API for mnist demo in the current design.
  2. Where to store device information?
  3. Using with statement to switch global block is a good idea, but:
    • with is the key-word only used by Python. Should we provide that API? Or just use callback function, because function is a common concept in every programming language.
    • How can the user configure two neural networks independently? Because the with modify the same global instance, it is no other way that user does not use that instance. If I misunderstood this, please give a sample code for the following situation. The user uses mnist dataset, and configure a convolutional network and a fast forward network in same program. He needs to know which network is better. So he runs same mini-batch both on two networks step by step.
  4. What the different between is_trainable and is_param for Variable?
    • I know the parameter is not necessarily trainable. But is there any reason that we mark a Variable is a parameter but is not trainable?
    • If the variable is not trainable, we just feed it in the first mini-batch. It is a constant data variable.
    • Moreover, is_param is used by whom? I do not figure out where is using or should use is_param.
  5. What's the different between Block and C++ NetOp. Could Python Block class be implemented in C++? I did not see the necessity to implement Block in Python. It could be simpler to implement it in C++ because the __extract_op_from_block is not needed.
  6. Please give a consistent name in this design. There are cmd and op, namespace, guard and block. There are too many new concepts and global states in this design. I think we must keep our concepts and global stats as few as we can. Are they necessary to implement our Python API??


```python
# same as
v = pd.get_variable(name="v", shape=[20, 20])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_varibale from what scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_variable from current namespace, in user's model config, they need only one global scope, but different namespace(add prefix to var's name).

The difference between multiple scope and namespace is that scopes will turn into a forest, different sub-scopes without the same ancestor is totally separated, but namespace doesn't suffer from this:

  • all the variable are located in a global scope
  • different namespaces have a different name prefix

'''
namespace: str
'''
self.cmds = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ops is better than cmds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block may contain block ...

{// block 
   {// sub-block
   }
}

ops = []
for cmd in cmds:
if type(cmd) is Block:
child_ops = self.__extract_op_from_block([cmd])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we must extract op from block?

Copy link
Contributor Author

@Superjomn Superjomn Aug 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, when a block executes, it will create a NetOp and run it, so operators from multiple blocks should be extracted and inserted into a NetOp.

This may be changed later to make it more natural.

shape=[],
data=None,
initialzier=None,
scope=g_scope,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In python, complex instance cannot be default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In [1]: class A(object):
   ...:     def __init__(self):
   ...:         self.x = None
   ...:         

In [2]: def f(x=A()):
   ...:     print x
   ...:     

In [3]: f()
<__main__.A object at 0x10c7cc050>

this seems works

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

G_solver = pd.AdamOptimizer().minimize(G_loss, var_list=theta_G)

# init all parameters
initializer = pd.variable_initialzier()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable_initialziers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

to reference Variable across different Blocks.
'''

stack = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How could a user want two global namespaces? stack is a data member in class scope. So all instances of Namespace share the same stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there needs only one namespace.stack, because user's config is parsed using only one thread.


def __exit__(self):
Namespace.end()
block_guard.cur_block = block_guard.last_block
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the current block is dropped in Python. Which object could store that block? Should that block be freed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sub-block will inserted into its father-block, so it will not be freed

// father-scope [ op1, op2, sub-block, op3, op4]

{// father-scope
   {// sub-scope
   }
}

counter = 0
__varset__ = set()

def __init__(self,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

正如我们之前电话沟通过的,只通过名字,是无法定位运行到神经网络的哪个子图的。因为在一个神经网络中,可以有两个Op输出同名的Variable。

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个问题,后来想了一下,貌似终点是op也可能碰到同样的问题? 比如以可以有两个Op输出同名的Variable 这个Variable作为input的Op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, each variable will have a unique name across all the sub-graphs, using "var-%d" % (Var.couter++)"

A specific namespace will add some prefix to a Variable's name, and help to support local variables.


Take `pd.fc` for example, one can use it like this
```python
out = pd.fc(in, param_names=['W'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is parameter w defined outside of fc layer or inside the layer?
  2. If a layer has more than one parameter, how can we know the name correspondence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, maybe layer create parameters better. the big difference between a op and a layer.

label = pd.Variable(shape=[None, 1])

# network config
W1 = pd.Variable('W1', shape=[128, 64])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name = 'W1' ?

@Superjomn
Copy link
Contributor Author

Superjomn commented Aug 27, 2017

@reyoung

  1. I think layer can infer the parameters' shape easily, but an op can only have input, a parameter is treated as a normal input, it seems natural that an op can infer its outputs, but weird to infer its inputs.
  2. V2 is treated as a higher interface, which is like Keras to TF, and this design is the underlying API which is similar to TF. It is right that MNist's demo is exactly the same as the V2's.
  3. Will add this latter, TODO
  4. In the implementation, two functions are implemented, __enter__ and __exit__, in another language, if it has something similar to with statement semantically, it can support a similar
    syntax.
  5. they are exactly the same, just one thing, use is_trainable and delete is_param. By default, is_trainable is true, if user or Layer create a Variable, that should be treated as a parameter, and all other temp outputs of op/layer are not trainable.
  6. They are the same role, may remove RNNOp and add some functions to Block, I will add a design doc about Block.
  7. Only Op, Variable, namespace are new, all others will be hidden in the underlying implementation.

@reyoung
Copy link
Collaborator

reyoung commented Aug 27, 2017

@Superjom
I think following comments are not responsed.

How can the user configure two neural networks independently? Because the with modify the same global instance, it is no other way that user does not use that instance. If I misunderstood this, please give a sample code for the following situation. The user uses mnist dataset, and configure a convolutional network and a fast forward network in same program. He needs to know which network is better. So he runs same mini-batch both on two networks step by step.

We must inference the shape of operator outputs line by line. Every time we create an operator, we must know the output shape of that operator immediately.

另外

Only Op, Variable, namespace are new, all others will be hidden in the underlying implementation.

隐藏的复杂性也是复杂性。真的有必要引入这么多新的概念么?可不可以用更少的概念实现这个设计?

譬如,namespace看起来和guard是配合使用的,也许可以合并起来。当然,我并不完全能够掌握这个设计,可能有些吹毛求疵了

@PaddlePaddle PaddlePaddle deleted a comment from reyoung Aug 27, 2017
@Superjomn
Copy link
Contributor Author

Superjomn commented Aug 27, 2017

@reyoung

all the sub-models is free to use global variables as inputs, all the temp output of op/layers will have a unique name (all the variables will have a different name), no sub-models will change the same variable as output.

a minist with multiple sub-models, a with pd.namespace instance is the same, just add a prefix and those temp output variables are still unique across the model program too.

import paddle as pd

image = pd.Variable([None, 128])
label = pd.Variable([None, 1])

def FC_model():
    #with pd.namespace('fc_model'):
    # every Varialbe will get a unique name like "var-%d"%(Variable.counter++)
    # both layer's parameters and temp outputs are stored in a global scope with unique names
    # so it is OK to configure a submodel in a python function
    # and any variable can be passed as argument across all the python scopes.
    fc_out = pd.fc(image)
    pred = pd.softmax(fc_out, size=10)
    return pred

def CNN_model():
    #with pd.namespace('fc_model'):
    out = pd.conv_group(image, xxx)
    pred = pd.softmax(out, size=10)
    return pred

def data_reader(path):
    xxxx
    yield batch

def run_model(pred, batch):
    cost = pd.cross_entropy(pred, label)
    optimizer = pd.SGDOptimizer().minimize([cost])
    _, cost_v = pd.eval([optimizer, cost], feeds={image:batch[0], label:batch[1]})
    return cost_v

data_provider = data_reader('./data.txt')
a_batch_of_data = data_provider.next()

print 'fc_cost', run_model(FC_model(), a_batch_of_data)
print 'cnn_cost', run_model(CNN_model(), a_batch_of_data)

About new concepts:

  • Only the Block is a hidden new concept now? but NetOp will be replaced to Block, so new concepts is not too much, and in the user interface, pd.Block is hidden to users, that's why I remove pd.Block from the design doc.
  • Currently, Block seems a little complex, behaves differently in compile and execution period, and I will add a new design doc about Block latter, and we can have more discussion about that.

@Superjomn
Copy link
Contributor Author

merge namespace with its guard, good idea.

I merge Block with its guard, but forget to merge the namespace's, will update the code latter.
@reyoung

@Superjomn Superjomn closed this Oct 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python API
4 participants