Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added local_shapes as a new parameter to DistributedArray #61

Merged
merged 6 commits into from
Aug 13, 2023

Conversation

rohanbabbar04
Copy link
Collaborator

@rohanbabbar04 rohanbabbar04 commented Aug 11, 2023

Closes #59
Implemented local_shapes parameter and changed all instances of DistributedArray to handle local_shapes
I see that local_shapes is a better option which will make the local_shape=local_shapes[rank] otherwise the default.

# Example
from pylops_mpi import DistributedArray

arr = DistributedArray(global_shape=(100, ), local_shapes=[(30, ), (40, ), (30, )])
print(arr.rank, arr)

# Output
0 <DistributedArray with global shape=(100,), local shape=(30,), dtype=<class 'numpy.float64'>, processes=[0, 1, 2])> 
1 <DistributedArray with global shape=(100,), local shape=(40,), dtype=<class 'numpy.float64'>, processes=[0, 1, 2])> 
2 <DistributedArray with global shape=(100,), local shape=(30,), dtype=<class 'numpy.float64'>, processes=[0, 1, 2])> 

In DistributedArray.py

  • Added local_shapes as a List[Tuple]
  • Added local_shapes to to_dist method
  • Added a check for local_shapes so that they align with global shape
  • Changed all instances to use local_shape
  • Removed send/recv from ravel as now it is being handled by local_shapes

In FirstDerivative.py, SecondDerivative.py cls_basic.py and plotting.py

  • Minor: Add local_shapes as parameter to DistributedArray

In decorators.py

  • Updated the reshaped method by adding forward and stacking to handle reshaping/redistributing of stacking operators(I did it like that bcoz the code for them and derivatives was quite similar, if another decorator needs to be made to keep them separate then let me know)...
  • Also made the code a little clean and easier to understand

In VStack.py and BlockDiag.py

  • Now we give an option to handle it under the hood, also added local_shapes to y.
  • Removed the Value Error as now that can be handled by the local_shapes and decorator.

Tests and example

  • Updated tests for laplacian to check if it works for non-divisibility of size
  • Added a test for local_shapes
  • Added an example for local_shapes(since we are working with different ranks, adding raw numbers wasn't possible so just reversed the list of local_split shapes)

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff!

This PR looks very good and I am happy to see how the local_shapes can help streamline various parts of the library.

I have left a few minor comments, and I have only one major one: what is the reasoning for passing to DistributedArray a list of tuples containing the local shapes for all ranks instead of passing only that of the current rank?

Basically, in other words, does it make sense to have the code pattern all the time user side:

local_shape = local_split(global_shape, MPI.COMM_WORLD, Partition.SCATTER, 0)
local_shapes = MPI.COMM_WORLD.allgather(local_shape)[::-1]
arr = pylops_mpi.DistributedArray(global_shape=global_shape, local_shapes=local_shapes, axis=0)

instead of

local_shape = local_split(global_shape, MPI.COMM_WORLD, Partition.SCATTER, 0)
arr = pylops_mpi.DistributedArray(global_shape=global_shape, local_shape=local_shape, axis=0)

and have the gathering part inside the init method?

I am sure you have probably thought about this, but at first sight I cannot find a reason why the latter would not work 🤔

examples/plot_distributed_array.py Outdated Show resolved Hide resolved
pylops_mpi/DistributedArray.py Show resolved Hide resolved
pylops_mpi/DistributedArray.py Outdated Show resolved Hide resolved
pylops_mpi/DistributedArray.py Outdated Show resolved Hide resolved
raise ValueError(f"Dimension mismatch: x shape-{x.local_shape} does not match operator shape "
f"{self.localop_shape}; {x.local_shape[0]} != {self.mops} (dim1) at rank={self.rank}")
y = DistributedArray(global_shape=self.shape[0], dtype=x.dtype)
local_shapes = self.base_comm.allgather((self.nops, ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this not be moved in the init method, as it seems to me this can be done once... local_shapes can be put into a member of the class and used every time matvec is called (same for rmatvec). I would just use local_shapes_n and local_shapes_m to distinguish between the two :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in BlockDiag as well as VStack.py

tests/test_distributedarray.py Show resolved Hide resolved
@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 12, 2023

Hi @mrava87, Lets take an example of global_shape=100 with 3 processes and the person wants to split it in (30, ), (40,) and (30, )
Using local_shapes

arr = DistributedArray(global_shape=100, local_shapes=[(30,), (40,), (30,)])

When it comes to using the local_shape what I see(correct me if I am wrong)

if rank == 0:
   arr = DistributedArray(global_shape=100, local_shape=(30, )])
elif rank == 1:
   arr = DistributedArray(global_shape=100, local_shape=(40, )])
elif rank == 2:
   arr = DistributedArray(global_shape=100, local_shape=(30, )])

That is why I preferred the local_shapes parameter, plus I see that the users can write the local_shapes, thus also checking if all the local_shapes match the global shape in the initial line itself
What do you think?

@mrava87
Copy link
Contributor

mrava87 commented Aug 12, 2023

Oh good point. I was more thinking about the case a user would use local_split, but your point is valid and indeed sometimes users may want to choose custom splits themselves and the code with if statements is much less clean and elegant than providing all local shapes as you did. So nothing to be changed here :)

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 12, 2023

Oh good point. I was more thinking about the case a user would use local_split, but your point is valid and indeed sometimes users may want to choose custom splits themselves and the code with if statements is much less clean and elegant than providing all local shapes as you did. So nothing to be changed here :)

Great..
I will just make all the necessary changes which you asked for to make local_shapes work and commit soon..

@rohanbabbar04 rohanbabbar04 merged commit 6cae759 into main Aug 13, 2023
15 checks passed
@rohanbabbar04 rohanbabbar04 deleted the local_shapes branch August 14, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add local_shape to DistributedArray
2 participants