-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimizer Design #4656
Optimizer Design #4656
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
## Optimizer Design | ||
In deeplearning system, `Optimizer` is used to optimize(minimize) loss thow updating a list of parameters. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This design doc doesn't explain the challenge. It looks to me that the challenge is
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
### A typical training process: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the above proposed section There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
1. run forward to calculate activation using data and parameter. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do not think this typical training process fits our current design. Currently, we put every operator into one There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a general abstract training process, no matter how complex the training process is, they are all composed of these stages. In Our design, we also have functions like backward and optimize to put related operators into ProgramDesc. Here we just put the interface into |
||
1. run backward to calculate the gradient of activation and parameter using cost, activation, and parameter. | ||
1. run optimize operators to apply/update the gradient to the corresponding parameter. | ||
|
||
### Python Interface to describe the training process | ||
|
||
1. User write code to describe the network: | ||
|
||
```python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This Python program needs to be properly indented -- to the right of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
images = layer.data("images") | ||
labels = layer.data("labels") | ||
w1 = pd.var("w1") | ||
hidden = layer.fc(images, W=w1) | ||
cost = layer.mse(hidden, labels) | ||
``` | ||
|
||
the code above will generate forward operators in [block](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/block.md). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the => The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
|
||
2. User create a Optimizer and set parameter list that it need to update. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
||
```python | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct code snippet indentation in the Markdown doc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
optimizer = AdagradOptimizer(learing_rate=0.001) | ||
``` | ||
|
||
3. User use the optimizer to `minimize` a certain `cost` thow updating parameters in parameter_list. | ||
|
||
```python | ||
opt = optimizer.minimize(cost, parameter_list=[w1, ...]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. opt should as a list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
``` | ||
|
||
The return value of `minimize()` is an Operator that rely on all the optimize operator. | ||
|
||
4. Use Session/Executor to run this opt as target. | ||
|
||
```python | ||
sess.run(target=[opt], ...) | ||
``` | ||
|
||
### What does optimizer do: | ||
|
||
In PaddlePaddle, we use block of operators to describe computation. From the Python Interface we described above, we can see that `Optimizer` should add some operators to the computation block: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. block of operators => blocks of operators There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, removed |
||
|
||
1. Gradient Ops. Used to calculate the gradients. | ||
2. Optimize Ops. Used to apply gradient to parameters. | ||
|
||
#### Optimizer Python interface: | ||
|
||
```python | ||
class Optimizer(object): | ||
def _backward(loss): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. backward and update should be public. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. _backward => create_backward_pass There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
""" | ||
Add Operators to Compute gradients of `loss` | ||
It returns the variables that will be updated for this loss. | ||
""" | ||
... | ||
return variables | ||
|
||
def _update(var_list): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. _update => create_optimization_pass There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
""" | ||
Add Operators to Apply gradients to variables | ||
in var_list. It returns an update `Operator`. | ||
Run this operator will trace back to all update and backward | ||
op related. | ||
""" | ||
... | ||
return update_op | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When a user wants to update twice, the update_op need to trace the first There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok, after discussing with @dzhwinter , this can be done in the current design, but not the most important thing to consider now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. have already make the backward interface public, user and call it directly multiple times to create more gradient operators in the graph. |
||
|
||
def minimize(loss, var_list): | ||
"""Add operations to minimize `loss` by updating `var_list`. | ||
|
||
This method simply combines calls `_backward()` and | ||
`_update()`. | ||
""" | ||
variables = _backward(loss) | ||
update_op = _update(variables) | ||
return update_op | ||
``` | ||
|
||
because we do not want users to know the step of `_backward` and `_update`, so we decide to export only `minimize()` to users. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thow
is a typo?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed