-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non uniform partitioning for Distributed
architectures
#3339
Conversation
Can you update the top-level description, and add a docstring for |
`x`, `y` and `z` can be `Int`, `Equal`, `Fractional` or `Sizes` | ||
(see below) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to explain what each option means, something like:
`x`, `y` and `z` can be `Int`, `Equal`, `Fractional` or `Sizes` | |
(see below) | |
`x`, `y` and `z` can be: | |
`x::Int`: allocate `x` processors to the first dimension | |
`Equal()`: divide the domain in `x` equally among the remaining processes | |
`Fractional(ϵ1, ϵ2, ..., ϵN):` divide the domain unequally among `N` processes. The total work is `W = sum(ϵi)`, and each process is then allocated `ϵi / W` of the domain. | |
`Sizes`: |
Fractional(args...) = Fractional(tuple(args ./ sum(args)...)) # We need to make sure that `sum(R) == 1` | ||
Sizes(args...) = Sizes(tuple(args...)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings. Make sure to add @ref
to Partition
.
"""type representing equal domain partitioning (not supported for more than one direction)""" | ||
struct Equal end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a real docstring because this is user facing, something like
"""type representing equal domain partitioning (not supported for more than one direction)""" | |
struct Equal end | |
""" | |
Equal() | |
Return a type that partitions a direction equally among remaining processes. | |
`Equal()` can be used for only one direction. Other directions must either be unspecified, or | |
specifically defined by `Int`, `Fractional`, or `Sizes`. | |
""" |
Base.show(io::IO, p::Partition) = | ||
print(io, | ||
"Domain partitioning with $(ranks(p)) ranks", "\n", | ||
"├── x-partitioning: $(ranks(p.x) == 1 ? "none" : p.x)", "\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say "1" rather than "none"?
The purpose of a display is to give as much code-relevant information about the content of a type as possible. So I think "1" is more accurate, while "none" is not right, it doesn't correspond to any julia type (unless we are using "nothing"), but in that case we should say "nothing" not "none").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like to differentiate from 1 because with 1 rank there is no partitioning. nothing
is a good option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably it is just better to omit the directions in which rank == 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the difference between "1" and "nothing"? If you write x=1
, isn't that the same thing as no partition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's important that it matches the underlying code, ie partition.x
should match what we claim the x
partitioning is. That's a general philosophy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's go with nothing then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think we need to change validate_partition
as well in that case
test/utils_for_runtests.jl
Outdated
# TODO: add support for Non uniform partitioning | ||
# Distributed(child_arch; partition = Partition(Rx = [0.2, 0.1, 0.5, 0.3]))) | ||
# Distributed(child_arch; partition = Partition(Ry = [0.2, 0.1, 0.5, 0.3]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# TODO: add support for Non uniform partitioning | |
# Distributed(child_arch; partition = Partition(Rx = [0.2, 0.1, 0.5, 0.3]))) | |
# Distributed(child_arch; partition = Partition(Ry = [0.2, 0.1, 0.5, 0.3]))) |
@@ -39,7 +39,7 @@ if supplied as positional arguments `x` will be the first argument, | |||
`Equal()`: divide the domain in `x` equally among the remaining processes (not supported for multiple directions) | |||
`Fractional(ϵ1, ϵ2, ..., ϵN):` divide the domain unequally among `N` processes. The total work is `W = sum(ϵi)`, | |||
and each process is then allocated `ϵi / W` of the domain. | |||
`Sizes(ϵ1, ϵ2, ..., ϵN)`: divide the domain unequally. EThe total work is `W = sum(ϵi)`, | |||
`Sizes(ϵ1, ϵ2, ..., ϵN)`: divide the domain unequally. The total work is `W = sum(ϵi)`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Sizes(ϵ1, ϵ2, ..., ϵN)`: divide the domain unequally. The total work is `W = sum(ϵi)`, | |
`Sizes(n1, n2, ..., nN)`: divide the direction by number of grid points, where `ni` is the number of grid points allocated to process `i`. The total size of the direction is `N = sum(ni)`, |
Maybe a different letter than N
for the number of processors, maybe P
is better?
Sizes(args...) = Sizes(tuple(args...)) | ||
|
||
""" | ||
`Sizes(ϵ1, ϵ2, ..., ϵN)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used ϵ
for fractions because that made sense but we should use a different letter for sizes.
It's good to get this right because users will copy the notation we use in docstrings, so it propagates everywhere.
|
||
Base.size(p::Partition) = ranks(p) | ||
|
||
validate_partition(x, y, z) = (x, y, z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to convert 1
to nothing
right?
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
…nigans.jl into ss/non_uniform_partitioning
This PR tweaks the API to simplify non-uniform partitioning which should already be supported by the algorithm.
This PR also extends the tests to include non-uniform distributed partitioning
The proposal of this PR (up to discussion and tweaking) is to allow calling
which allows to distributed the domain over 3 workers which hold 30%, 10% and 60% of the computation, respectively