Skip to content

Fix array_to_datum/datum_to_array crashing on NumPy >= 2.0#7108

Open
Chessing234 wants to merge 1 commit intoBVLC:masterfrom
Chessing234:fix/io-py-numpy-2-compat-frombuffer-tobytes
Open

Fix array_to_datum/datum_to_array crashing on NumPy >= 2.0#7108
Chessing234 wants to merge 1 commit intoBVLC:masterfrom
Chessing234:fix/io-py-numpy-2-compat-frombuffer-tobytes

Conversation

@Chessing234
Copy link
Copy Markdown

Bug

python/caffe/io.py's datum (de)serializers still use two NumPy entry points that were deprecated and then removed in NumPy 2.0:

# array_to_datum, line 76
if arr.dtype == np.uint8:
    datum.data = arr.tostring()

# datum_to_array, line 89
if len(datum.data):
    return np.fromstring(datum.data, dtype=np.uint8).reshape(
        datum.channels, datum.height, datum.width)
  • numpy.ndarray.tostring was deprecated in NumPy 1.19 in favour of tobytes and removed in NumPy 2.0.
  • numpy.fromstring for binary data was deprecated in NumPy 1.14 in favour of frombuffer and removed in NumPy 2.0.

On any user running NumPy ≥ 2.0, array_to_datum (save path) and datum_to_array (load path for uint8 datums, e.g. image datasets) raise AttributeError: 'numpy.ndarray' object has no attribute 'tostring' and AttributeError: module 'numpy' has no attribute 'fromstring' respectively. This breaks pycaffe data ingestion for anyone on a modern scientific-Python stack.

Fix

Switch to the binary-safe replacements NumPy's deprecation notices have been directing users to for years:

     if arr.dtype == np.uint8:
-        datum.data = arr.tostring()
+        datum.data = arr.tobytes()
     if len(datum.data):
-        return np.fromstring(datum.data, dtype=np.uint8).reshape(
+        return np.frombuffer(datum.data, dtype=np.uint8).reshape(
             datum.channels, datum.height, datum.width)

Both replacements are byte-for-byte equivalent to the deprecated calls: ndarray.tobytes() produces the same bytes as tostring() (the name was changed precisely because "tostring" misled users into expecting a Python str), and np.frombuffer(buf, dtype=np.uint8) reads the same bytes back into the same uint8 ndarray shape. Existing NumPy 1.x users are unaffected — both tobytes and frombuffer have been present since NumPy 1.9 and 1.14 respectively.

Two one-token renames, no other changes.

python/caffe/io.py's datum (de)serializers call two NumPy APIs that
were deprecated and then removed in NumPy 2.0:

    datum.data = arr.tostring()                              # line 76
    return np.fromstring(datum.data, dtype=np.uint8).reshape # line 89

- numpy.ndarray.tostring was deprecated in favour of tobytes in NumPy
  1.19 and removed in NumPy 2.0.
- numpy.fromstring for binary data was deprecated in favour of
  frombuffer in NumPy 1.14 and removed in NumPy 2.0.

On any user running NumPy >= 2.0, both array_to_datum (save path) and
datum_to_array (load path for uint8 datums, e.g. image datasets) raise
AttributeError / AttributeError respectively, breaking pycaffe data
ingestion.

Switching to the explicit binary-safe replacements:

- arr.tostring()           -> arr.tobytes()
- np.fromstring(buf, ...)  -> np.frombuffer(buf, ...)

is exactly the migration NumPy's own deprecation notices recommend and
the behaviour matches the deprecated calls byte-for-byte (same bytes
out on write, same ndarray-view on read). Works on the NumPy 1.x
versions caffe already supported.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant