Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non uniform partitioning for Distributed architectures #3339

Merged
merged 181 commits into from
Nov 3, 2023

Conversation

simone-silvestri
Copy link
Collaborator

@simone-silvestri simone-silvestri commented Oct 13, 2023

This PR tweaks the API to simplify non-uniform partitioning which should already be supported by the algorithm.

This PR also extends the tests to include non-uniform distributed partitioning

The proposal of this PR (up to discussion and tweaking) is to allow calling

arch = Distributed(CPU(); partition = Partition(Rx = [0.3, 0.1, 0.6])

which allows to distributed the domain over 3 workers which hold 30%, 10% and 60% of the computation, respectively

@glwagner
Copy link
Member

Can you update the top-level description, and add a docstring for Partition with a few examples that enumerates the various possible syntaxes for common important cases?

Comment on lines 37 to 38
`x`, `y` and `z` can be `Int`, `Equal`, `Fractional` or `Sizes`
(see below)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to explain what each option means, something like:

Suggested change
`x`, `y` and `z` can be `Int`, `Equal`, `Fractional` or `Sizes`
(see below)
`x`, `y` and `z` can be:
`x::Int`: allocate `x` processors to the first dimension
`Equal()`: divide the domain in `x` equally among the remaining processes
`Fractional(ϵ1, ϵ2, ..., ϵN):` divide the domain unequally among `N` processes. The total work is `W = sum(ϵi)`, and each process is then allocated `ϵi / W` of the domain.
`Sizes`:

Comment on lines 99 to 100
Fractional(args...) = Fractional(tuple(args ./ sum(args)...)) # We need to make sure that `sum(R) == 1`
Sizes(args...) = Sizes(tuple(args...))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstrings. Make sure to add @ref to Partition.

Comment on lines 72 to 73
"""type representing equal domain partitioning (not supported for more than one direction)"""
struct Equal end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a real docstring because this is user facing, something like

Suggested change
"""type representing equal domain partitioning (not supported for more than one direction)"""
struct Equal end
"""
Equal()
Return a type that partitions a direction equally among remaining processes.
`Equal()` can be used for only one direction. Other directions must either be unspecified, or
specifically defined by `Int`, `Fractional`, or `Sizes`.
"""

Base.show(io::IO, p::Partition) =
print(io,
"Domain partitioning with $(ranks(p)) ranks", "\n",
"├── x-partitioning: $(ranks(p.x) == 1 ? "none" : p.x)", "\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say "1" rather than "none"?

The purpose of a display is to give as much code-relevant information about the content of a type as possible. So I think "1" is more accurate, while "none" is not right, it doesn't correspond to any julia type (unless we are using "nothing"), but in that case we should say "nothing" not "none").

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to differentiate from 1 because with 1 rank there is no partitioning. nothing is a good option

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably it is just better to omit the directions in which rank == 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the difference between "1" and "nothing"? If you write x=1, isn't that the same thing as no partition?

Copy link
Member

@glwagner glwagner Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's important that it matches the underlying code, ie partition.x should match what we claim the x partitioning is. That's a general philosophy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok let's go with nothing then

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think we need to change validate_partition as well in that case

Comment on lines 27 to 29
# TODO: add support for Non uniform partitioning
# Distributed(child_arch; partition = Partition(Rx = [0.2, 0.1, 0.5, 0.3])))
# Distributed(child_arch; partition = Partition(Ry = [0.2, 0.1, 0.5, 0.3])))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# TODO: add support for Non uniform partitioning
# Distributed(child_arch; partition = Partition(Rx = [0.2, 0.1, 0.5, 0.3])))
# Distributed(child_arch; partition = Partition(Ry = [0.2, 0.1, 0.5, 0.3])))

@@ -39,7 +39,7 @@ if supplied as positional arguments `x` will be the first argument,
`Equal()`: divide the domain in `x` equally among the remaining processes (not supported for multiple directions)
`Fractional(ϵ1, ϵ2, ..., ϵN):` divide the domain unequally among `N` processes. The total work is `W = sum(ϵi)`,
and each process is then allocated `ϵi / W` of the domain.
`Sizes(ϵ1, ϵ2, ..., ϵN)`: divide the domain unequally. EThe total work is `W = sum(ϵi)`,
`Sizes(ϵ1, ϵ2, ..., ϵN)`: divide the domain unequally. The total work is `W = sum(ϵi)`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Sizes(ϵ1, ϵ2, ..., ϵN)`: divide the domain unequally. The total work is `W = sum(ϵi)`,
`Sizes(n1, n2, ..., nN)`: divide the direction by number of grid points, where `ni` is the number of grid points allocated to process `i`. The total size of the direction is `N = sum(ni)`,

Maybe a different letter than N for the number of processors, maybe P is better?

Sizes(args...) = Sizes(tuple(args...))

"""
`Sizes(ϵ1, ϵ2, ..., ϵN)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used ϵ for fractions because that made sense but we should use a different letter for sizes.

It's good to get this right because users will copy the notation we use in docstrings, so it propagates everywhere.


Base.size(p::Partition) = ranks(p)

validate_partition(x, y, z) = (x, y, z)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to convert 1 to nothing right?

simone-silvestri and others added 6 commits November 2, 2023 13:52
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
Co-authored-by: Gregory L. Wagner <wagner.greg@gmail.com>
@simone-silvestri simone-silvestri merged commit 7796f57 into main Nov 3, 2023
48 checks passed
@simone-silvestri simone-silvestri deleted the ss/non_uniform_partitioning branch November 3, 2023 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed 🕸️ Our plan for total cluster domination
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants