Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python user interface design #3688

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
362 changes: 362 additions & 0 deletions doc/design/python/user_interface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,362 @@
# User Interface Design

## Basic Concepts
### Variable
A `Variable` represents shared, persistent state manipulated by a Paddle model program.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model program?


Variables are maintained by `pd.Variable` class,
each `pd.Variable` represents a tensor whose value can be changed by running ops on it.

A basic way to create a variable is:

```python
import paddle as pd

v = pd.Variable(shape=[20, 20])
```

To make it more converient to share a variable, each `pd.Variable` has a name,
one can use a name to get or create a `pd.Variable` by calling `pd.get_variable`, for example:

```python
# same as
v = pd.get_variable(name="v", shape=[20, 20])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_varibale from what scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_variable from current namespace, in user's model config, they need only one global scope, but different namespace(add prefix to var's name).

The difference between multiple scope and namespace is that scopes will turn into a forest, different sub-scopes without the same ancestor is totally separated, but namespace doesn't suffer from this:

  • all the variable are located in a global scope
  • different namespaces have a different name prefix

```

By default, Variables are model parameters, and will be updated after the network's back propagation.

One can freeze a variable by setting `trainable` to `False` like:

```python
v = pd.Variable(shape=[20,20], trainable=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can also change the state during running.

v = pd.Variable(...)
v.trainable = false

Copy link
Contributor Author

@Superjomn Superjomn Aug 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the arguments passed to __init__ are members of pd.Varaible, so free to change like that.

```

Some initizlization strategies may be applied to variables, for example, we may set a variable to zero or gaussian random.

```
v = pd.Variable(shape=[20,20], initializer=pd.zero_initializer())
z = pd.Variable(shape=[20,20], initializer=pd.gaussian_initializer(mean=0., std=0.1))
```

to get the value of the variable, one can call

```python
print v.val()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think v.eval() is better. because val() means just to get the value, eval() means to calculate and get the value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v.val() will only return a Python numpy value without executing a sub-graph. v.eval() may just be a wrapper of pd.eval and will implement it latter if necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

```


### Block
Paddle use a `Block` to represent and execute user's program,
this is a basic concept when user write a Paddle program.

In computer programming, a block is a lexical structure of source code which is grouped together.
In most programming languages, block is useful when define a function or some conditional statements such as `if-else`, `while`.

Similarlly, the function of `pd.Block` in Paddle is to enable groups of operators to be treated as if they were one operator to make `if_else_op` or RNNOp's declaration simpler and Python's `with` statement is used to make the codes look much like a block.

For example, when defining a `RNNOp`, we can use `pd.Block` to help configure a step network:

```python
v = some_op()
m_boot = some_op()

W = pd.Variable(shape=[20, 20])
U = pd.Variable(shape=[20, 20])

rnn0 = RNNOp()
with rnn0.stepnet() as net:
# declare stepnet's inputs
x = net.add_input(v)
# declare memories
h = net.add_memory(m_boot)

fc_out = pd.matmul(W, x)
hidden_out = pd.matmul(U, h)
sum = pd.add_two(fc_out, hidden_out)
act = pd.sigmoid(sum)

# declare stepnet's outputs
net.add_output(act, hidden_out)

acts, hs = rnn0()
```

The operators inside the `with`-statement defines the rnn's step network,
and will be put into a `pd.Block`.

another example is the definition of `if_else_op`:

```python
# v0 is a output of some_op
v0 = some_op()
v1 = some_op()

ifelseop = pd.if_else_op()
with ifelseop.true_block() as net:
x0, x1 = net.add_input(v0, v1)

y = pd.fc(x)
z = pd.add_two(x1, y)

net.add_output(z)

with ifelseop.false_block() as net:
x0, x1 = net.add_input(v0, v1)

y = pd.add_two(x0, x1)

net.add_output(y)

# output of ifelseop
out = ifelseop()
```

In most cases, user need not to create a `pd.Block` directly, but it is the basis of a Paddle program:

- user's program is stored in `pd.Block`
- when we want to run the codes, we just need to execute a corresponding `pd.Block`

A `pd.Block` can has its own namespace, which makes it possible to hide the local variables from block block.

```python
W = pd.Variable(shape=[20, 20])

# a and b are outputs of some_op
a = some_op()
b = some_op()

with pd.Block('namespace0'):
# W is a local variable and has its own value
W = pd.Variable(shape=[20, 20])
x = pd.matmul(W, a)
y = x + b

with pd.Block('namespace1'):
# W is the global variable
z = pd.matmul(W, a)

# g use local variables in both namespace0 and namespace1
g = pd.add_two(y, z)
```

### Op (short for Operator)
`Op` defines basic operation unit of optimized computation graph in Paddle, one `Op` has several input and output variables, and some attributes.

Take `pd.matmul` for example, one can use it like this

```python
out = pd.matmul(a, b)
```
which means that a operator `pd.matmul` takes two variables `a` and `b` for input,
and return a variable `out`.

### Layer
`Layer` defines a more complex operation which may combines several `Op`s, its usage is the same with `Op`.

Take `pd.fc` for example, one can use it like this
```python
out = pd.fc(in, param_names=['W'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

out = pd.fc(int, W="w", B="b")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Is parameter w defined outside of fc layer or inside the layer?
  2. If a layer has more than one parameter, how can we know the name correspondence?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, maybe layer create parameters better. the big difference between a op and a layer.

```
which means that the `pd.fc` takes an variable `in`, and set its `param_names` attribute to `['W']`,
which will determine the names of its parameters.

Both `Op` and `Layer` will be appended to current `pd.Block` when they are created,
and there will be a sequene of Ops/Layers in the `pd.Block`,
if the `pd.Block` is executed, all the Ops/Layers in this `pd.Block` will be called in order.

### Special Ops
#### Initializer Ops
These ops will initialize variables, for example, we may have

- `pd.zero_initializer()`
- `pd.gaussian_random_initializer(mean, std)`

Each trainable variable has a initialize Op.

#### Optimizer Ops
These ops will help to optimize trainable variables after backward propagation finished,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backpropagation?

each variable will have a optimizer.

## Compatible with V2 Syntax

## Some Demos
### MNist Task Demo

```python
import paddle as pd

# the first shape is None, which means the batch size of variable is not known.
image = pd.Variable(shape=[None, 128])
label = pd.Variable(shape=[None, 1])

# network config
W1 = pd.Variable('W1', shape=[128, 64])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

name = 'W1' ?


fc_out = pd.matmul(image, W1)
prediction = pd.softmax(fc_out, size=10)

cost = pd.cross_entropy(prediction, label)

optimizer = pd.SGDOptimizer().minimize(cost)


# training details
def data_provider(path):
images = []
labels = []
with open(path) as f:
for no, line in enumerate(f):
fs = line.split('\t')
assert len(fs) == 2
image_record = map(int, fs[0].split())
label_record = [int(fs[1])]
images.append(image_record)
labels.append(label_record)
if no > 0 and no % 100 == 0:
yield np.array(images), np.array(labels)
images = []
labels = []


for pass_no in range(100):
for batch_no, batch in enumerate(data_provider('./data.txt')):
# train mode
_, cost_ = pd.eval(
[optimizer, cost], feeds={image: batch[0],
label: batch[1]})
print '%dth pass train cost: %f' % (pass_no, cost_)
# test mode
if batch_no > 0 and batch_no % 10 == 0:
cost_ = pd.eval(cost)
print '%dth pass test cost' % (pass_no, cost_)
```

### GAN Task Demo

```python
import paddle as pd

# input variable whose batch size is unknown now
X = pd.Variable(shape=[None, 128])

# Discriminator Net
# define parameters

# Generator Net
Z = pd.data(pd.float_vector(100))

theta_G = [G_W1, G_W2, G_b1, G_b2]


def sample_Z(m, n):
return np.random.uniform(-1, 1., size=[m, n])


def discriminator(x):
# use block with namespace to hide local variables
with pd.Block('discriminator') as block:
# declare model parameters
W1 = pd.get_variable(
'W1',
shape=[784, 128],
initializer=pd.gaussian_random_initializer(std=0.1),
reuse=True)
b1 = pd.get_variable(
'b1', data=np.zeros(128),
reuse=True
) # variable also support initialization using a numpy data
W2 = pd.get_variable('W2', data=np.random.rand(128, 1),
reuse=True)
b2 = pd.Variable('b2', data=np.zeros(128),
reuse=True)

# network config
h1 = pd.relu(pd.matmul(x, W1) + b1)
fake = pd.matmul(h1, w2) + b2
prob = pd.sigmoid(fake)
return prob, fake


theta_D = [D_W1, D_b1, D_W2, D_b2]


def generator(z):
with pd.Block('generator') as block:
# declare model parameters
W1 = pd.get_variable(
'W1',
shape=[784, 128],
initializer=pd.gaussian_random_initializer())
b1 = pd.get_variable(
'b1', data=np.zeros(128)
) # variable also support initialization using a numpy data
W2 = pd.get_variable('W2', data=np.random.rand(128, 1))
b2 = pd.get_variable('b2', data=np.zeros(128))

# network config
h1 = pd.relu(pd.matmul(z, W1) + b1)
log_prob = pd.matmul(h1, W2) + b2
prob = pd.sigmoid(log_prob)
return prob


# a mini-batch of 1. as probability 100%
ones_label = pd.Variable(shape=[None, 1])
# a mini-batch of 0. as probability 0%
zeros_label = pd.Variable(shape=[None, 1])

# model config
G_sample = generator(Z)
D_real_prob, D_real_image = discriminator(X)
D_fake_prob, D_fake_image = discriminator(G_sample)

D_loss_real = pd.reduce_mean(
pd.cross_entropy(data=D_real_prob, label=ones_label))
D_loss_fake = pd.reduce_mean(
pd.cross_entropy(data=D_real_fake, label=zeros_label))
D_loss = D_loss_real + D_loss_fake

G_loss = pd.reduce_mean(pd.cross_entropy(data=D_loss_fake, label=ones_label))

D_solver = pd.AdamOptimizer().minimize(D_loss, var_list=theta_D)
G_solver = pd.AdamOptimizer().minimize(G_loss, var_list=theta_G)

# init all parameters
initializer = pd.variable_initialzier()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable_initialziers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

# also ok: initializer = pd.variable_initialzier(vars=theta_D+theta_G)ize,
pd.eval(initializer)


def data_provier(path):
# ...
yield batch


for i in range(10000):
for batch_no, batch in enumerate(data_provider('train_data.txt')):
# train Descrimator first
_, D_loss_cur = pd.eval(
[D_solver, D_loss],
feeds={
X: batch,
Z: sample_Z(batch.size, 10),
ones_label: np.ones([batch.size, 1]),
zeros_label: np.zeros([batch.size, 1])
})
# get Generator's fake samples
samples = pd.eval(G_sample, feeds={Z: sample_Z(16, 100)})

# train Generator latter
_, G_loss_cur = pd.eval(
[G_solver, G_loss],
feeds={
Z: sample_Z(batch.size, 10),
ones_label: np.ones([batch.size, 1]),
zeros_label: np.zeros([batch.size, 1])
})

if batch_no % 100:
logger.info("batch %d, D loss: %f" % (batch_no, D_loss_cur))
logger.info("batch %d, G loss: %f" % (batch_no, G_loss_cur))
```

4 changes: 4 additions & 0 deletions python/paddle/python_wrapper_demo/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from block import *
from variable import *
from op import *
from layer import *
Loading