##  Illustrating some issues with `skimage.segmentation.relabel_sequential`

In [2]:
import numpy as np
from skimage.segmentation import relabel_sequential

  return f(*args, **kwds)
  return f(*args, **kwds)
  return f(*args, **kwds)


## maximum label value

The maximum label value that can be relabeled appears to be maximum integer of `np.int` minus `1`. Larger labels, such as the maximum integer or the maximum unsigned integer cause issues. The reason is that the new map seems to be intialized as dtype
`np.min_scalar_type(max_label)` with max_label forced to `    max_label = int(label_field.max()) # Ensure max_label is an integer
`


In [6]:
# get some info about the maximum label type
ii = np.iinfo(np.int)
i32 = np.iinfo(np.int32)
i64 = np.iinfo(np.int64)
iui = np.iinfo(np.uint)

In [7]:
ii

iinfo(min=-2147483648, max=2147483647, dtype=int32)

In [8]:
iui

iinfo(min=0, max=4294967295, dtype=uint32)

## relabel_sequential does not allocate the correct output dtype

Need to fix in the source around here somewhere, keeping in mind signed/unsigned and that there might be an increment of 1: 
https://github.com/scikit-image/scikit-image/blob/d7df7c8f7215ed476d625a19da5d24adfc6d3eef/skimage/segmentation/_join.py#L126

In [14]:
# this fails as we get a sign flip
label_field = np.array([1, 1, 5, 5, 8, 99, ii.max , 42])
print(f"label_field dtype is {label_field.dtype}")
relab, fw, inv = relabel_sequential(label_field)


label_field dtype is int32


ValueError: negative dimensions are not allowed

In [15]:
# if we subtract -1 it works
label_field = np.array([1, 1, 5, 5, 8, 99, ii.max-1, 42])
print(f"label_field dtype is {label_field.dtype}")
relab, fw, inv = relabel_sequential(label_field)


label_field dtype is int32


In [19]:
# if the input type is unsigned it works
label_field = np.array([1, 1, 5, 5, 8, 99, iui.max, 42], dtype=np.uint)
print(f"label_field dtype is {label_field.dtype}")
relab, fw, inv = relabel_sequential(label_field)

label_field dtype is uint32


## Storage requirements
Even a small array may require a huge amount of memory, if it contains large values.

In [26]:
print(f"initial array has {label_field.nbytes} bytes of storage,")
print(f"Forward map requires {fw.nbytes/(1024*1024*1024)} GB of storage")
print(f"Storage requirements increase by factor {fw.nbytes/label_field.nbytes} just for the forward map ")

initial array has 32 bytes of storage,
Forward map requires 16.0 GB of storage
Storage requirements increase by factor 536870912.0 just for the forward map 


For int64 values we easily exceed the maximum possible array  size.

At some value we get `MemoryError`, numpy seems to try to allocate memory but fails:

In [30]:

label_field = np.array([1, 1, 5, 5, 8, 99, int(i64.max/12) , 42])
print(f"label_field dtype is {label_field.dtype}")
relab, fw, inv = relabel_sequential(label_field)

label_field dtype is int64


MemoryError: 

If we go to some even larger values we get `ValueError`,  numpy won't even try to allocate memory for an array that size:

In [31]:

label_field = np.array([1, 1, 5, 5, 8, 99, int(i64.max/4) , 42])
print(f"label_field dtype is {label_field.dtype}")
relab, fw, inv = relabel_sequential(label_field)

label_field dtype is int64


ValueError: array is too big; `arr.size * arr.dtype.itemsize` is larger than the maximum possible size.