Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimizer design #3711

Closed

Conversation

jacquesqiao
Copy link
Member

No description provided.

@jacquesqiao jacquesqiao changed the title add optimizer design optimizer design Aug 28, 2017
This method simply combines calls `_backward()` and
`_update()`.
"""
backward_net = _backward(loss)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the type of backward_net? NetOp, Block or var_list?
And does class optimizer has some private member? Is backward_net the member of Optimizer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part has some problems, I will update it.

because we do not want users to know the step of `_backward` and `_update`, so we decide to export only `minimize()` to users.

## Three situation in parameter update.
1. One machine no card/one GPU card.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are four situations:

  1. single thread or single GPU
  2. multi-threads
  3. multi-GPUs
  4. multi-nodes

In multi-threads, we only have one copy of parameters and gradients in memory. But in multi-GPUS, we have parameters and gradients in every GPU card.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I will update this

return update_op
```

because we do not want users to know the step of `_backward` and `_update`, so we decide to export only `minimize()` to users.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have backward first, and then update later. And will we implement the stragety that backward and update op by op at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backward will take a loss as input and add gradient op to all the related variable, so backward will be a whole step, but optimizer take a parameter var_list, so you can add update op to one var at a time and add different update op to different paramets

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean when we run the network, can we have gradients computing and gradients updating in parallel.
In paddle, we use callback to implement this. Once a Op's backward gradient is calculated, the callback updating function is executed to update the gradient. At the same time, next Op's gradients is calculating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be done be re arrange the order of operators in the netop/block when run.

@jacquesqiao
Copy link
Member Author

fixed by #4656

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants