Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer Design #4656

Merged
merged 5 commits into from
Oct 11, 2017
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 85 additions & 0 deletions doc/design/optimizer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
## Optimizer Design
In deeplearning system, `Optimizer` is used to optimize(minimize) loss thow updating a list of parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thow is a typo?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design doc doesn't explain the challenge.

It looks to me that the challenge is

The Problem

A PaddlePaddle program, or a block, is a sequence of operators operating variables. A training program needs to do three kinds of works:

  1. the forward pass, which computes intermediate results and the cost(s),
  2. the backward pass, which derives gradients from intermediate and costs, and
  3. the optimization pass, which update model parameters.

These works rely on three kinds of operators:

  1. forward operators,
  2. gradient operators, and
  3. optimization operators.

It's true that users should be able to create all these operators manually by calling some low-level API, but it would be much more convenient if they could only describe the forward pass and let PaddlePaddle create the backward and optimization operators automatically.

In this design, we propose a high-level API that automatically derives the optimisation pass and operators from the forward pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


### A typical training process:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above proposed section ## The Problem is accepted, this paragraph of three bullets can be removed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


1. run forward to calculate activation using data and parameter.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this typical training process fits our current design.

Currently, we put every operator into one ProgramDesc. There are not three running stages explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general abstract training process, no matter how complex the training process is, they are all composed of these stages.

In Our design, we also have functions like backward and optimize to put related operators into ProgramDesc. Here we just put the interface into Optimizer as high level API.

1. run backward to calculate the gradient of activation and parameter using cost, activation, and parameter.
1. run optimize operators to apply/update the gradient to the corresponding parameter.

### Python Interface to describe the training process

1. User write code to describe the network:

```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Python program needs to be properly indented -- to the right of 1. in the above line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

images = layer.data("images")
labels = layer.data("labels")
w1 = pd.var("w1")
hidden = layer.fc(images, W=w1)
cost = layer.mse(hidden, labels)
```

the code above will generate forward operators in [block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the => The
the code above => The above code snippet
will generate => creates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done



2. User create a Optimizer and set parameter list that it need to update.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either The user creates or Users create

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


```python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct code snippet indentation in the Markdown doc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

optimizer = AdagradOptimizer(learing_rate=0.001)
```

3. User use the optimizer to `minimize` a certain `cost` thow updating parameters in parameter_list.

```python
opt = optimizer.minimize(cost, parameter_list=[w1, ...])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opt should as a list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

```

The return value of `minimize()` is an Operator that rely on all the optimize operator.

4. Use Session/Executor to run this opt as target.

```python
sess.run(target=[opt], ...)
```

### What does optimizer do:

In PaddlePaddle, we use block of operators to describe computation. From the Python Interface we described above, we can see that `Optimizer` should add some operators to the computation block:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block of operators => blocks of operators
we use => PaddlePaddle uses

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, removed


1. Gradient Ops. Used to calculate the gradients.
2. Optimize Ops. Used to apply gradient to parameters.

#### Optimizer Python interface:

```python
class Optimizer(object):
def _backward(loss):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backward and update should be public.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_backward => create_backward_pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
Add Operators to Compute gradients of `loss`
It returns the variables that will be updated for this loss.
"""
...
return variables

def _update(var_list):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_update => create_optimization_pass

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

"""
Add Operators to Apply gradients to variables
in var_list. It returns an update `Operator`.
Run this operator will trace back to all update and backward
op related.
"""
...
return update_op
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user wants to update twice, the update_op need to trace the first update_op and all update, backward op related. Maybe we need to write some guides to point out it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, after discussing with @dzhwinter , this can be done in the current design, but not the most important thing to consider now.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have already make the backward interface public, user and call it directly multiple times to create more gradient operators in the graph.


def minimize(loss, var_list):
"""Add operations to minimize `loss` by updating `var_list`.

This method simply combines calls `_backward()` and
`_update()`.
"""
variables = _backward(loss)
update_op = _update(variables)
return update_op
```

because we do not want users to know the step of `_backward` and `_update`, so we decide to export only `minimize()` to users.