mpi: Fix data distribution bugs [part 2] #1949

rhodrin · 2022-06-22T16:11:51Z

Fix for issues #1862 and #1892 + associated tests.

codecov · 2022-06-22T16:21:05Z

Codecov Report

Merging #1949 (29112c5) into master (12995d3) will increase coverage by 0.00%.
The diff coverage is 92.68%.

@@           Coverage Diff            @@
##           master    #1949    +/-   ##
========================================
  Coverage   87.90%   87.91%            
========================================
  Files         214      214            
  Lines       36504    36613   +109     
  Branches     5513     5538    +25     
========================================
+ Hits        32090    32189    +99     
- Misses       3897     3907    +10     
  Partials      517      517

Impacted Files	Coverage Δ
devito/data/utils.py	`89.12% <18.18%> (-3.46%)`	⬇️
devito/data/data.py	`95.79% <100.00%> (+0.62%)`	⬆️
tests/conftest.py	`84.41% <100.00%> (-0.11%)`	⬇️
tests/test_data.py	`97.87% <100.00%> (+0.11%)`	⬆️
devito/ir/support/basic.py	`91.98% <0.00%> (-0.22%)`	⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

devito/data/data.py

georgebisbas · 2022-09-09T16:36:48Z

devito/data/data.py

@@ -366,6 +420,11 @@ def _normalize_index(self, idx):

    def _process_args(self, idx, val):
        """If comm_type is parallel we need to first retrieve local unflipped data."""
+        if (len(as_tuple(idx)) < len(val.shape)) and (len(val.shape) <= len(self.shape)):


This "looks" expensive or am I wrong?
Is process_args called for all points?

FabioLuporini

Great! Only minor comments left.

FabioLuporini · 2022-09-12T07:29:36Z

devito/data/data.py

+    def _prune_shape(self, shape):
+        # Reduce distributed MPI `Data`'s shape to that of an equivalently
+        # sliced numpy array.
+        decomposition = tuple([d for d in self._decomposition if d.size > 1])


nitpicking

tuple(d for d in ...)

is fine -- you don't need the inner list comprehension

tuple([...])

FabioLuporini · 2022-09-12T07:30:16Z

devito/data/data.py

    def _check_idx(func):
        """Check if __getitem__/__setitem__ may require communication across MPI ranks."""
        @wraps(func)
        def wrapper(data, *args, **kwargs):
            glb_idx = args[0]
-            if len(args) > 1 and isinstance(args[1], Data) \
+            is_gather = True if (kwargs and isinstance(kwargs['gather_rank'], int)) \


is_gather = kwargs and isinstance(...)

FabioLuporini · 2022-09-12T07:31:00Z

devito/data/data.py

@@ -190,7 +202,45 @@ def __str__(self):
    def __getitem__(self, glb_idx, comm_type, gather_rank=None):
        loc_idx = self._index_glb_to_loc(glb_idx)
        is_gather = True if isinstance(gather_rank, int) else False


is_gather = isinstance(...)

devito/data/utils.py

FabioLuporini · 2022-09-12T07:33:34Z

devito/data/data.py

-            if reshape and (0 not in reshape) and (reshape != retval.shape):
-                return retval.reshape(reshape)
+            if not is_gather:
+                newshape = tuple([s for s, i in zip(retval.shape, loc_idx)


as before, you may drop the [ ]

FabioLuporini · 2022-09-12T07:35:39Z

devito/data/data.py

@@ -190,7 +202,45 @@ def __str__(self):
    def __getitem__(self, glb_idx, comm_type, gather_rank=None):
        loc_idx = self._index_glb_to_loc(glb_idx)
        is_gather = True if isinstance(gather_rank, int) else False
-        if comm_type is index_by_index or is_gather:
+        if is_gather and comm_type == gather:


comm_type is gather

FabioLuporini · 2022-09-12T07:40:21Z

devito/data/data.py

+        # sliced numpy array.
+        decomposition = tuple([d for d in self._decomposition if d.size > 1])
+        retval = self.reshape(shape)
+        retval._decomposition = decomposition


reshape creates a new Data I presume, so why do we have to reset the _decomposition here? I would expect this to happen in the constructor or at least __array_finalize__, so why it's not the case?

After a _reshape the _decomposition is set to None.

As I mentioned previously, I was thinking of overriding reshape for our Data type but this turned out to be a bit of a mess. These 'reductions' are very specific, which is why we can get away with the above. Can maybe talk about this a bit more though.

Doesn't numpy has a no copy reshape with view?

If able, the method returns a view:

This will be a new view object if possible; otherwise, it will be a copy. Note there is no guarantee of the memory layout (C- or Fortran- contiguous) of the returned array.

FabioLuporini · 2022-09-12T07:41:39Z

devito/data/data.py

@@ -190,7 +202,45 @@ def __str__(self):
    def __getitem__(self, glb_idx, comm_type, gather_rank=None):
        loc_idx = self._index_glb_to_loc(glb_idx)
        is_gather = True if isinstance(gather_rank, int) else False
-        if comm_type is index_by_index or is_gather:
+        if is_gather and comm_type == gather:


Refresh my mind please -- what does the user need to write so that we end up here?

For MPI gathers, the decorator will infer how to gather based on the slice (or lack of) supplied by the user. (That is, gather vs index_by_index is set by the decorator, not by the user).

georgebisbas · 2022-09-12T09:51:00Z

devito/data/data.py

+                return retval
+            else:
+                return None
+        elif comm_type is index_by_index or is_gather:
            # Retrieve the pertinent local data prior to mpi send/receive operations


mpi -> MPI ?

georgebisbas · 2022-09-12T10:01:49Z

tests/test_data.py

+        g.data[0, :, :] = dat1
+        f1.data[:] = g.data[0, ::-1, ::-1]
+        result = np.array(f1.data[:])
+        if LEFT in glb_pos_map[x] and LEFT in glb_pos_map[y]:


These lines can be done parametrically as well, but its pretty clear this way, not sure if it is gonna be better parametrically

mloubout · 2022-09-12T13:00:19Z

devito/data/data.py

    def _check_idx(func):
        """Check if __getitem__/__setitem__ may require communication across MPI ranks."""
        @wraps(func)
        def wrapper(data, *args, **kwargs):
            glb_idx = args[0]
-            if len(args) > 1 and isinstance(args[1], Data) \
+            is_gather = kwargs and isinstance(kwargs['gather_rank'], int)


guarantied to have gather_rank? maybe is_gather = isinstance(kwargs.get('gather_rank', None) int)

mloubout · 2022-09-12T13:01:08Z

devito/data/data.py

-        is_gather = True if isinstance(gather_rank, int) else False
-        if comm_type is index_by_index or is_gather:
+        is_gather = isinstance(gather_rank, int)
+        if is_gather and comm_type is gather:


why not just if gather_rank and ....

0 is a valid tank so I think we need this?

mloubout · 2022-09-12T13:05:02Z

devito/data/data.py

+            comm = self._distributor.comm
+            rank = comm.Get_rank()
+
+            sendbuf = np.array(self[:].flatten())


flattent makes a copy no? wouldn something like self.flat work here (1D view of ndarray without copy)

Yes, indeed. Changed this to self.flat[:].

georgebisbas · 2022-09-14T13:02:22Z

devito/data/utils.py

+
+    # If necessary, add the time index to the `topology` as this will
+    # be required to correctly construct various maps.
+    if len(np.amax(dat_len)) > len(topology):


is amax really needed instead of max?

rhodrin added bug-py WIP Still work in progress MPI mpi-related labels Jun 22, 2022

rhodrin force-pushed the fix_mpi_slicing_p2 branch from f3ba92a to fa63aa2 Compare July 28, 2022 16:22

rhodrin mentioned this pull request Aug 31, 2022

MPI gather for TimeFunction #1892

Closed

rhodrin force-pushed the fix_mpi_slicing_p2 branch from eacc611 to 1d95fed Compare September 2, 2022 10:16

rhodrin force-pushed the fix_mpi_slicing_p2 branch from 1d95fed to 4e27ab7 Compare September 9, 2022 15:24

rhodrin removed the WIP Still work in progress label Sep 9, 2022

georgebisbas reviewed Sep 9, 2022

View reviewed changes

FabioLuporini reviewed Sep 12, 2022

View reviewed changes

georgebisbas reviewed Sep 12, 2022

View reviewed changes

mloubout reviewed Sep 12, 2022

View reviewed changes

FabioLuporini approved these changes Sep 14, 2022

View reviewed changes

georgebisbas reviewed Sep 14, 2022

View reviewed changes

georgebisbas approved these changes Sep 14, 2022

View reviewed changes

rhodrin and others added 10 commits September 14, 2022 15:40

data: Fix for issue #1862 + tests.

b573a20

data: When reshaping data ensure the decomposition is not lost.

c4aa34c

data: Fix for issue #1892.

5e553cd

tests: Add time function gather test to test_data.

b9db414

data: Ensure Data erros mimic those of numpy arrays.

dcc1af9

data: Intoduce _prune_shape method for MPI distributed .

3319910

data: Minor tidying.

b89ce7a

tests: drop unused conftest imports

83b61c6

tests: import Eq in conftest

a641c38

Replace flatten() with .flat to avoid copy.

29112c5

rhodrin force-pushed the fix_mpi_slicing_p2 branch from e68b418 to 29112c5 Compare September 14, 2022 14:40

mloubout approved these changes Sep 14, 2022

View reviewed changes

mloubout merged commit a647eca into master Sep 14, 2022

mloubout deleted the fix_mpi_slicing_p2 branch September 14, 2022 17:46

This was linked to issues Sep 19, 2022

MPI slicing broken #1862

Closed

MPI gather for TimeFunction #1892

Closed

georgebisbas mentioned this pull request Sep 19, 2022

MPI slicing broken #1862

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mpi: Fix data distribution bugs [part 2] #1949

mpi: Fix data distribution bugs [part 2] #1949

rhodrin commented Jun 22, 2022 •

edited

codecov bot commented Jun 22, 2022 •

edited

georgebisbas Sep 9, 2022

FabioLuporini left a comment

FabioLuporini Sep 12, 2022

FabioLuporini Sep 12, 2022

FabioLuporini Sep 12, 2022

FabioLuporini Sep 12, 2022

FabioLuporini Sep 12, 2022

FabioLuporini Sep 12, 2022

rhodrin Sep 12, 2022

mloubout Sep 12, 2022

rhodrin Sep 13, 2022

FabioLuporini Sep 12, 2022

rhodrin Sep 12, 2022

georgebisbas Sep 12, 2022

georgebisbas Sep 12, 2022

mloubout Sep 12, 2022

mloubout Sep 12, 2022

rhodrin Sep 13, 2022

mloubout Sep 12, 2022

rhodrin Sep 13, 2022

georgebisbas Sep 14, 2022

mpi: Fix data distribution bugs [part 2] #1949

mpi: Fix data distribution bugs [part 2] #1949

Conversation

rhodrin commented Jun 22, 2022 • edited

codecov bot commented Jun 22, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

FabioLuporini left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhodrin commented Jun 22, 2022 •

edited

codecov bot commented Jun 22, 2022 •

edited