Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added MPIFirstDerivative #55

Merged
merged 16 commits into from
Aug 4, 2023
Merged

Added MPIFirstDerivative #55

merged 16 commits into from
Aug 4, 2023

Conversation

rohanbabbar04
Copy link
Collaborator

Taking into consideration #49

  • Added MPIFirstDerivative
  • Added tests and updated docs..

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Jul 28, 2023

In the previous PR #49, we had discussed about some edge cases, I have handled them in this PR.
Now since we have added the conditions to add_ghost_cells, there was no need for check_matvec_rmatvec.
Also in the DistributedArray we keep the min_num process to be 1(added max(1, ... to handle that).
Individually each method takes a min(y.global_shape[0], ....) to handle those edge cases(this is primarly observed in _rmatvec_centered3, _matvec_centered5 and _rmatvec_centered5.
If the users enters a Broadcasted DistributedArray, we convert the broadcasted array to scattered and then proceed...

Copy link
Contributor

@mrava87 mrava87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think this PR looks very good and aligned with our discussion.

There is only one thing left, which I think we need to decide on and handle / make explicit in the doc: currently we do not provide axis to this operator, meaning that we implicitly assume the derivative is always applied to a given axis (looking at the code and how you write the stencils, this looks to me like it is axis=0, see for example: y_forward = ghosted_x[1:] - ghosted_x[:-1]. However, when you do the appends you seem to do them along y.axis (which is x.axis). I suspect that if x.axis>0 we would not be doing the correct thing... do you agree?

So one option is to raise errors if axis!=0, whilst the other option is to add axis, check that self.axis==x.axis and handle the stencils over any axis. In fact if x.axis=0 but self.axis=1, it won't really make sense to use this operator as the derivative stencils do not go across ranks, and users can just leverage pylops.FirstDerivative together with pylops_mpi.BlockDiag like we have done before :)

def _matvec(self, x: DistributedArray) -> DistributedArray:
# If Partition.BROADCAST, then convert to Partition.SCATTER
if x.partition is Partition.BROADCAST:
x = DistributedArray.to_dist(x=x.local_array, axis=x.axis)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmh if the array is of Partition.BROADCAST, what does x.axis really mean? I guess from how DistributedArray is implemented right now, one could choose any axis, but here this would lead to problems... see my general comment


def _matvec_forward(self, x: DistributedArray) -> DistributedArray:
y = DistributedArray(global_shape=x.global_shape, dtype=self.dtype, axis=x.axis)
y[:] = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really needed... if I look at 138, it seems to me you are setting the entire array, so we may be able to avoid setting it to zero first

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Jul 31, 2023

Thanks!

I think this PR looks very good and aligned with our discussion.

There is only one thing left, which I think we need to decide on and handle / make explicit in the doc: currently we do not provide axis to this operator, meaning that we implicitly assume the derivative is always applied to a given axis (looking at the code and how you write the stencils, this looks to me like it is axis=0, see for example: y_forward = ghosted_x[1:] - ghosted_x[:-1]. However, when you do the appends you seem to do them along y.axis (which is x.axis). I suspect that if x.axis>0 we would not be doing the correct thing... do you agree?

So one option is to raise errors if axis!=0, whilst the other option is to add axis, check that self.axis==x.axis and handle the stencils over any axis. In fact if x.axis=0 but self.axis=1, it won't really make sense to use this operator as the derivative stencils do not go across ranks, and users can just leverage pylops.FirstDerivative together with pylops_mpi.BlockDiag like we have done before :)

Thanks
@mrava87 When it comes to adding a parameter axis to the class and calculating the stencil over that axis, since we use dims as an integer the only possible axis if I am not wrong will be -1 and 0. So when will we be encountering a self.axis =1 ?

@mrava87
Copy link
Contributor

mrava87 commented Jul 31, 2023

Mmh my bad, I had forgotten about that... but I think we need to change it and allow Nd arrays. Let me explain why: imagine you have a 3d array ny x nx x nz which is distributed over axis=0. This is for example the case of our Poststack tutorial. Now, if you use dims=ny*nx*nz (since this is the only option so far) you will not be able to apply a first derivative over the y axis, which is what this operator is intended for.

This is why I suggested to have no axis and implicitly compute the derivative over axis=0, but to handle the dimensions. Basically, we need to be able to have something that produces the same result as:

nytot, nx, nz = 6, 10, 8
Fop = pylops.basicoperators.FirstDerivative(dims=(ny, nx, nz), axis=0)

if we do

nytot, nx, nz = 6, 10, 8
ny = nytot // size
Fop = pylops_mpi.basicoperators.FirstDerivative(dims=(ny, nx, nz), axis=0)

Makes sense?

@rohanbabbar04
Copy link
Collaborator Author

Thanks Understood.
I will get onto it..

@mrava87
Copy link
Contributor

mrava87 commented Jul 31, 2023

Thanks Understood.
I will get onto it..

Great! Sorry, maybe we didn't think so deeply about the implications of having only 1d, but I think having a start with 1d only and now move to nd is still a probably better/easier way to do then if we immediately started with nd... and I feel there is little to be done to make this work (I may be wrong ;) )

@rohanbabbar04
Copy link
Collaborator Author

I think what we have done will greatly help as now we would prioritize calculating the derivative along axis=0.
The concept should remain the same, would discuss if I find any other challenges in its implementation..

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 1, 2023

Hi @mrava87, I think we need to go little different than the 1-D approach, I think pretty much the code in _matvec and _rmatvec will remain the same.
But to work with ND-Arrays(dims), like we do in pylops.FirstDerivative, we may have to redistribute the DistributedArray x which is 1d, meaning we will convert the 1-D Distributed array to an ND-Array, and distribute over that axis. For converting if the np.prod(local_shape) is same, we can reshape it as it is. For unequal local shapes product we can use the ghost cells to make our ND DistributedArray, then we can proceed with what we have done...
What do you think @mrava87 ?
I am still thinking if we can do it with just the 1d DistributedArray....

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 2, 2023

Hi @mrava87, I have handled the case for dims=(...),some of the changes I made as doing that with only a 1-D DistributedArray was complicated..

  • Added ravel as a method in DistributedArray class.
  • Added redistribute decorator which reshapes the x to an n-d distributed array, also ravels the y.
  • The redistribute uses the ghost cells to reshape itself..
  • Used the decorator on each _matvec, _rmatvec(similar to what we do for pylops.FirstDerivative), Made changes to the edge cases of 0s accordingly.
  • Currently have made if work for axis=0, which currently is the main requirement(we also may be able to do for other axis, but will keep that for later, so we don't have axis as a parameter)
  • Updated the tests wherein we take a dims as both an int and a tuple.

This is what I came up with, which seemed to be the best and apt approach what do you think?

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

Thanks @rohanbabbar04, but I am a bit confused by the approach you took... and a bit worried that we are doing a lot of additional communications that may not be needed (mostly referring to the @redistribute decorator).

First of all, I am not really sure I understand what redistribute does?

Second, perhaps I am missing something, but let me try to explain what I think is a easier way which does not require any communication other than for the ghost cells within the derivative part:

  • you have arr = DistributedArray(global_shape=ny*nx*nz, axis=0). Assuming ny is divisible by size then each rank has a portion of this 1d array. However, we can reinterpret this as if it was a 3d array (ny, nx, nz) and there each rank will have an array of size(ny//size, nx, nz).
  • you make an operator Fop = pylops_mpi.basicoperators.FirstDerivative(dims=(ny, nx, nz), axis=0)
  • you apply Fop @ x; each rank has internally access to a portion of arr or size (ny//size*nx*nz) which using the knowledge of dims can be reshaped into (ny//size, nx, nz), which can reshape (instead of redistribute you may think of having something like the pylops reshape). Now the .add_ghost_cells method is called and the stencil is applied.

Let's try to agree on this first before changing the code further :D

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

Ok, perhaps I get the point... are you trying to handle a case where the operator and x are not agreeing on how the array is distributed?

I would say this should be something users have to ensure, and our codes should only check and raise an error otherwise... my worry is that if we silently do this we will end up with unefficients codes as users do not bother about aligning things and internally we do a lot more communications than needed to fix laziness from the user side... or maybe I don't get what the code is trying to handle 😕

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 2, 2023

@mrava87 Yup you are right...
The case when lets say an example dims=(10, 5, 10) and the x will have a shape of (500, ), now if we use the number of processes to be 3, x will have local shapes of (167,), (167, ) and (166, ) at each rank.
whereas for dims it would be (4, 5, 10) -> (200, ) ; (3, 5, 10) -> (150, ) ; (3, 5, 10) -> 150
These shapes would not align, now one way is what I have done the other is to ask the users(like you said) to either change the number of process so that is perfectly divisible with axis=0(and hence the flattened version of both will align)..
For your first comment as you said that is what I have done, but to handle the case where dims does not align with the shape of x. I thought about handling it this way.
I know the calculation when both dims and x are not aligned is there, but no communication currently happens when they both align with each other as cells_back=0.
What do you think?

@rohanbabbar04
Copy link
Collaborator Author

Sorry I now see that the communication can be really ugly at times, so we should ask the users to reshape their arrays so that it doesn't affect other things...

@rohanbabbar04
Copy link
Collaborator Author

So @mrava87 , As you mentioned, now I have raised an error if the np.prod(dims.shape) != np.prod(x.shape), which means the user should change the number of processes(something divisible with dims[0])...
Other than that all your suggested points I think are covered(currently it works for axis=0)

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

Right, I see the point about using dims or shape for Nd or 1D arrays, which also when they are of same size, can be distributed differently… but, would this case ever happen. Here you say that you have an array of size (500, 0) and an operator with dims (4, 5, 10), but since their total size is different they won’t talk to each other properly.

Let's however assume that we have this case with nproc=3:

x (500, ) -> 167, 167, 166
Op (10, 5, 10) -> (4, 5, 10), (3, 5, 10), (3, 5, 10)

this will be again a problem...

Having said that, I am myself unsure right now what is best, just a few thought:

  • your approach: solves the problem under the hood, but it does it in a rather expensive way… however, it is good that is things are aligned no communication happen (I didn’t realize this would be the case), so the cost of the decorator is only local (some local copies..)
  • other approach: forces the user to think more, but it may be a bit stiff. Imagine you have the case above and you cannot change the physical size of the problem (10, 5, 10), then you will really need to either go down to nproc=2 or do some padding, neither of which is optimal

The only way I can think of that strikes a balance between performance and user friendliness is to allow users to choose the splitting of the array, by adding a parameter to DistributedArray. We can default it to None and leave the current behavior as default, and let users choose it only in scenarios where this is useful. What do you think?

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 2, 2023

For the other approach, one case is if the dims[0] is completely divisible by the number of processes, the first derivative can calculate.....
So I think the user can change the number of processes and make sure that it is divisible with dims[0] so we don't need to do any communication
What do you think which case should we go for?
I have tried the code for both cases whichever preferred we can go for...

@rohanbabbar04
Copy link
Collaborator Author

Also for my approach, it expensive only for the cases where there is difference in local shapes but no communication takes place when they are aligned(it is I will say more user convenient)...

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

Other that this, I think this new version looks good and almost ready to go :D

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

So I think the user can change the number of processes and make sure that it is divisible with dims[0] so we don't need to do any communication

In real life I think this isn't always possible... what if dims[0]=79 (this is what the data size is and you dont want to remove any part to make it divisible..), then there is no way to find a rank that divides it...

Also, take a look at my last sentence, which I just added in the previous comment. Perhaps having this option + your method will be the best way to go. So lazy users will not need to think but incur in some extra communication cost (this could be also during quick development), whilst users that care about real performance will choose the split of the array in a manner that is consistent with that of the Operator...

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 2, 2023

The only way I can think of that strikes a balance between performance and user friendliness is to allow users to choose the splitting of the array, by adding a parameter to DistributedArray. We can default it to None and leave the current behavior as default, and let users choose it only in scenarios where this is useful.

Can you explain it a bit more?
Will we allow users to decide the local_shapes at each rank???
And for clarification your method meaning the expensive one or the current one?

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

That's correct. So far the local_shape is automatically detected, which may be what we want to do in most scenarios, but in few cases like this one we will have more freedom to choose a different subdivision between ranks such that things align with an operator acting on one axis of a reshaped view of the total array :)

I think this should be doable without modifying DistributedArray that much, isn't it?

And, for your method, I mean the expensive one which handles everything under the hood... I am just thinking about a way to be both flexible but also allow users to be more selective and avoid overheads of communication when avoidable :D

@rohanbabbar04
Copy link
Collaborator Author

So if I am right we will just ask them the local shapes
Like DistributedArray(500, local_shapes=[(200,), (200,), (100,)]

@mrava87
Copy link
Contributor

mrava87 commented Aug 2, 2023

Indeed, with default local_shape=None.. and build it from the global_shape as we do already

@rohanbabbar04
Copy link
Collaborator Author

Okay... to summarize everything

  • We give an option to the users to select local shapes to reduce communication.
  • We also provide a way to users for flexibility if he/she puts local shapes to None, using the more expensive method.
  • And for the ones which align well with the dims local shape we solve it as it is...

@rohanbabbar04
Copy link
Collaborator Author

Indeed, with default local_shape=None.. and build it from the global_shape as we do already

Yeah changes shouldn't be much..

@rohanbabbar04
Copy link
Collaborator Author

Hi @mrava87, like we discussed this PR contains and solves 2 problems:

  • Dims aligned(Solved)
  • Dims not aligned(we add more communication to align)

I will add the local_shapes paramater in a new PR, as I think we may want to handle it whenever in our codebase DistributedArray is called which is easy as we just need to add parameter everywhere, so I think we can add it in a new PR..
Plus I was thinking we should add it after doing the MPISecondDerivative, as the concept remains the same...

@mrava87
Copy link
Contributor

mrava87 commented Aug 3, 2023

Perfect, agree!

@rohanbabbar04
Copy link
Collaborator Author

rohanbabbar04 commented Aug 3, 2023

Perfect, agree!

@mrava87
Is this good to go?

@mrava87
Copy link
Contributor

mrava87 commented Aug 3, 2023

Sure :)

@rohanbabbar04 rohanbabbar04 merged commit 4d6ed3a into main Aug 4, 2023
15 checks passed
@rohanbabbar04 rohanbabbar04 deleted the deriv branch August 6, 2023 05:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants