Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer Design #4656

Merged
merged 5 commits into from
Oct 11, 2017
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 98 additions & 0 deletions doc/design/optimizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
## Optimizer Design

### The Problem

A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:

1. the forward pass, which computes intermediate results and the cost(s),
1. the backward pass, which derives gradients from intermediate results and costs, and
1. the optimization pass, which update model parameters to optimize the cost(s).

These works rely on three kinds of operators:

1. forward operators,
1. gradient operators, and
1. optimization operators.

It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.

In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.


### High-level Python API to describe the training process

1. User write code to describe the network:

```python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pseudo code here not format well.

images = layer.data("images")
labels = layer.data("labels")
w1 = pd.var("w1")
b1 = pd.var("b1")
hidden = layer.fc(images, w=w1, b=b1)
cost = layer.mse(hidden, labels)
```

The above code snippet will create forward operators in [Block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md).


2. Users create a certain kind of Optimizer with some argument.

```python
optimizer = AdagradOptimizer(learing_rate=0.001)
```

3. Users use the optimizer to `minimize` a certain `cost` through updating parameters in parameter_list.

```python
opt_op_list = optimizer.minimize(cost, parameter_list=[w1, b1])
```
The above code snippet will create gradient and optimization operators in Block. The return value of `minimize()` is list of optimization operators that will be run by session.

4. Users use Session/Executor to run this opt_op_list as target to do training.

```python
sess.run(target= opt_op_list, ...)
```

#### Optimizer Python interface:

```python
class Optimizer(object):
def create_backward_pass(loss, parameter_list=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter and variable look like interchangeable in Python API. Not sure they are referred to the same concept.

"""
Add gradient Operators into Block to Compute gradients of `loss`
for parameters in parameter_list

Args:
loss: an variable generated by cost function.
parameter_list: parameters that need to compute gradient and update to minimize the lost

Returns:
(parameters, gradients) pair list.
"""
return vars_grads

def create_optimization_pass(vars_grads):
"""Add Operators to Apply gradients to variables.

Args:
vars_grads: a list of (variable, gradient) pair to update.

Returns:
optmization_op_list: a list of optimization operator that will optimize parameter with gradient.
"""
...
return optmization_op_list

def minimize(loss, parameter_list):
"""Add operations to minimize `loss` by updating `parameter_list `.

This method simply combines calls `create_backward_pass()` and
`create_optimization_pass()`.
"""
vars_grads = create_backward_pass(loss)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo create_backward_pass(loss) => create_backward_pass(loss, parameter_list)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

update_ops = create_optimization_pass(var_grads)
return update_ops
```

Users can inherit the Optimizer above to create their own Optimizer with some special logic, such as AdagradOptimizer.
4 changes: 4 additions & 0 deletions doc/design/python_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,3 +214,7 @@ def fc_layer(input, size, ...):
out.writer = op
return out
```

## Optimizer

[Optimizer Design Doc](./optimizer.md)