-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add design doc for lookup remote table in Fluid #9068
Changes from 3 commits
cb7891a
af8c728
b382747
f839e91
302136e
5948fd2
1b4db80
282aa96
cc1e949
1c5616b
285e7ac
e343afb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Design Doc: Large Model | ||
|
||
## Abstract | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to tell about the background, why we need this feature. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
|
||
We propose an approach to support the large parameter. | ||
For embedding layer, the parameter may very large and could | ||
not be stored in one trainer's memory. In this approach, a Trainer would | ||
prefetch a sliced parameter from different Parameter Server instances | ||
according to the input `Ids`, and then run forward, backward and send | ||
the gradient to Parameter Server to execute the optimize program. | ||
|
||
## Design | ||
|
||
**NOTE**: this approach is a feature of Fluid distributed trianing, maybe you want | ||
to know [Distributed Architecture](./distributed_architecture.md) and | ||
[Parameter Server](./parameter_server.md) before reading the following content. | ||
|
||
Fluid large model distributed training use | ||
[Distributed Transpiler](./parameter_server.md#distributed-transpiler) to split | ||
a large parameter into multiple parameters which stored on Parameter Server, and | ||
the Trainer would prefetch them by `RPC` interface. | ||
|
||
### Split Large Parameter | ||
|
||
<img src="src/split_parameter.png" width="400" /> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems the picture's number's are wrong. |
||
|
||
**Distributed Transpiler** would split the large parameter | ||
(weight) into some sliced parameters (weight_0, weight_1, weight_2) as the | ||
figure above. | ||
|
||
### Prefetch Parameters from Parameter Servers | ||
|
||
<img src="src/prefetch_parameters.png" width="400" /> | ||
|
||
- `PrefetchRpc` operator would send the rows index the multiple Parameter Servers, | ||
and then receive the SelctedRows. | ||
- The different with normal Fluid distributed training, we only prefetch the rows | ||
|
||
## TODO | ||
|
||
- Async Update | ||
|
||
To avoid slow-node, Async update is important for distributed training, | ||
we need an design doc and implement it in future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a more meaningful name, like "remote large parameter prefetching"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, maybe
Prefetching Parameter From Parameter Server
sounds good?