Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: speed-up mlab.contiguous_regions using numpy #4174

Merged
merged 4 commits into from Feb 28, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
31 changes: 17 additions & 14 deletions lib/matplotlib/mlab.py
Expand Up @@ -3874,23 +3874,26 @@ def contiguous_regions(mask):
"""
return a list of (ind0, ind1) such that mask[ind0:ind1].all() is
True and we cover all such regions

TODO: this is a pure python implementation which probably has a much
faster numpy impl
"""
mask = np.asarray(mask, dtype=bool)

if not mask.size:
return []

# Find the indices of region changes, and correct offset
idx, = np.nonzero(mask[:-1] != mask[1:])
idx += 1

# List operations are faster for moderately sized arrays
idx = idx.tolist()

in_region = None
boundaries = []
for i, val in enumerate(mask):
if in_region is None and val:
in_region = i
elif in_region is not None and not val:
boundaries.append((in_region, i))
in_region = None
# Add first and/or last index if needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we are polishing this bikeshed: it's probably most efficient to convert idx to a list at this point. Then the prepending and/or appending can be done with a total of 4 lines, as in my example. These operations are very fast with python lists--probably faster than np.concatenate. I suspect the final indexing and zip operation would also be at least as fast if idx is already a list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lists are faster on my system for arrays of less than 10000 indices. After that arrays take over. It does seem like the more common use case, so I guess it does make sense to change it.

if mask[0]:
idx = [0] + idx
if mask[-1]:
idx.append(len(mask))

if in_region is not None:
boundaries.append((in_region, i+1))
return boundaries
return list(zip(idx[::2], idx[1::2]))


def cross_from_below(x, threshold):
Expand Down
24 changes: 24 additions & 0 deletions lib/matplotlib/tests/test_mlab.py
Expand Up @@ -2963,6 +2963,30 @@ def test_evaluate_equal_dim_and_num_lt(self):


#*****************************************************************

def test_contiguous_regions():
a, b, c = 3, 4, 5
# Starts and ends with True
mask = [True]*a + [False]*b + [True]*c
expected = [(0, a), (a+b, a+b+c)]
assert_equal(mlab.contiguous_regions(mask), expected)
d, e = 6, 7
# Starts with True ends with False
mask = mask + [False]*e
assert_equal(mlab.contiguous_regions(mask), expected)
# Starts with False ends with True
mask = [False]*d + mask[:-e]
expected = [(d, d+a), (d+a+b, d+a+b+c)]
assert_equal(mlab.contiguous_regions(mask), expected)
# Starts and ends with False
mask = mask + [False]*e
assert_equal(mlab.contiguous_regions(mask), expected)
# No True in mask
assert_equal(mlab.contiguous_regions([False]*5), [])
# Empty mask
assert_equal(mlab.contiguous_regions([]), [])


#*****************************************************************

if __name__ == '__main__':
Expand Down