Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opening corrupt Aperio SVS file causes Out-of-Memory error #102

Closed
ap-- opened this issue Oct 17, 2021 · 10 comments
Closed

Opening corrupt Aperio SVS file causes Out-of-Memory error #102

ap-- opened this issue Oct 17, 2021 · 10 comments
Labels
bug Something isn't working

Comments

@ap--
Copy link

ap-- commented Oct 17, 2021

Hi @cgohlke

I encountered a corrupted AperioSVS file that causes a machine with 64GB of ram to run out-of-memory when trying to instantiate the TiffFile instance. Creating the instance causes a dramatic increase in process memory usage in the TiffTag loading part of tifffile, which just causes the server to kill the python process.
Ideally I would expect the code to just crash early, so that cleanup can happen in python.

Sadly I can't share the file, so here's some additional information:

The filesize is 5958078846 bytes and it's recognized as TIFF.CLASSIC_LE
tifffile detects tagno=3126 tags, of which almost all raise an unknown tag data type error.
Then for tag tagno=2327 it calls TiffTag.from_file with offset=3256275384 and header=b'\x01\x00\x02\x00\x00?\x00\xeb5\x05\x92\x1b'

from_file

Which due to the large count causes a ~4.000.000 character string to be unpacked later in this code path and due to some string ops causes a massive increase in Memory consumption.

I was wondering if there are upper bounds on the amount of data a TiffTag can contain, and this codepath could error early.

Cheers,
Andreas 😃

@cgohlke
Copy link
Owner

cgohlke commented Oct 17, 2021

Thank you. You are right, unpacking bytes or ASCII type values is very inefficient and not necessary. This is going to be fixed in the next version.
As for corruption, there are checks for the number of tags (< 4096), valid dtype, and valueoffset+size< filesize.

@cgohlke cgohlke added the bug Something isn't working label Oct 17, 2021
@ap--
Copy link
Author

ap-- commented Oct 18, 2021

Thanks for the quick reply. And thank you so much for all the great work you pour into tifffile!

@cgohlke
Copy link
Owner

cgohlke commented Oct 18, 2021

I'm not able to reproduce this with a simple example:

tifffile.imwrite('test.tif', shape=(1,1), dtype='uint8', extratags=[(1, 2, 3942661888, b'1'*3942661888)])

Opening the file with TiffFile uses ~4GB. The peak usage is ~8GB, so at some point there's a copy made, probably during decoding bytes to string:

t = TiffFile('test.tif')

In the case of your corrupted file with 3126 tags, there are probably many large tag values such that the process is running out of memory at some point.

@ap--
Copy link
Author

ap-- commented Oct 18, 2021

That sounds plausible. I'll try to build a test case for this file and will report back.

What would be a good way to determine that a file is corrupt without risking to run out of memory on a machine?
As of now, I could put together some code to count the number of emitted tag data warnings and then just consider a file corrupt after N emitted warnings.

@cgohlke
Copy link
Owner

cgohlke commented Oct 18, 2021

count the number of emitted tag data warnings and then just consider a file corrupt after N emitted warnings.

I just implemented that too, putting the limit to 8, but there really is no guarantee that will work for corrupt files. Also, it would be nice to at least inspect the structure of corrupt tags.
Instead, I will try to only read inline tag values and values of basic tags and delay read other tag values on demand...

@ap--
Copy link
Author

ap-- commented Oct 19, 2021

That sounds great!

In the meantime I ran some tests on the broken svs file. The following was run with tifffile.__version__ == '2021.10.12'

1. memory profiler

Here's the memory consumption over time on a machine with enough ram (using memory-profiler)
Screenshot 2021-10-18 at 23 30 16

2. filehandle logging

Here's the logs regarding how many bytes are read for all corrupted tags:

import os
import types
from tifffile import FileHandle
from tifffile import TiffFile

class LoggingFileHandle:

    def __init__(self, file, mode=None, name=None, offset=None, size=None):
        self._obj = FileHandle(file, mode, name, offset, size)

    def __getattr__(self, item):
        attr = getattr(self._obj, item)
        if isinstance(attr, types.MethodType):
            def _logged_attr(other, *args, **kwargs):
                n = list(map(repr, args))
                n.extend(f"{k}={v!r}" for k, v in kwargs.items())
                print("CALL", f".{item}({', '.join(n)})")
                ret = attr(*args, **kwargs)
                if isinstance(ret, bytes) and len(ret) > 16:
                    print("RETURNED", len(ret), 'bytes:', ret[:16], "...(rest ommitted)")
                elif ret is not None:
                    print("RETURNED", ret)
                return ret
            return _logged_attr.__get__(self, type(self))
        else:
            print("ACCESS", attr, f'type={type(attr).__name__}')
            return attr


fh = LoggingFileHandle(os.path.expanduser("~/broken.test.file.svs"))
t = TiffFile(fh)
# output
CALL .tell()
RETURNED 0
CALL .tell()
RETURNED 0
ACCESS broken.test.file.svs type=str
ACCESS rb type=str
CALL .tell()
RETURNED 0
CALL .seek(0, 2)
CALL .tell()
RETURNED 5958078846
CALL .seek(0)
ACCESS <built-in method fileno of _io.BufferedReader object at 0x7f989f8327d0> type=builtin_function_or_method
CALL .seek(0, 0)
CALL .read(4)
RETURNED b'II*\x00'
CALL .tell()
RETURNED 4
CALL .read(4)
RETURNED b'\xa2d\x16\xc2'
CALL .seek(3256247458, 0)
CALL .tell()
RETURNED 3256247458
CALL .read(2)
RETURNED b'6\x0c'
CALL .read(37512)
RETURNED 37512 bytes: b'\x83di\x9e\r_\xb5\xbcu\x0f<\xd2\r\xd3\x1d\xa8' ...(rest ommitted)
CALL .seek(16781568, 0)
CALL .read(132121056)
RETURNED 132121056 bytes: b'\xc0=\xa8\xc6F{\xd3wg9\xa4\r\xd7 \x8a]' ...(rest ommitted)
CALL .seek(462554421, 0)
CALL .read(3942661888)
RETURNED 3942661888 bytes: b'"2Y\x9b\x9c\x0e\x95\t\xd3\xd8\'\x92\x87%\x8f\x00' ...(rest ommitted)
CALL .seek(1540266541, 0)
CALL .read(2815614684)
RETURNED 2815614684 bytes: b'H\x81\x86w6})\x108\x043g\xd2\x96\x9dN' ...(rest ommitted)

Which means it basically just loads two big chunks the ~3.9GB and ~2.8GB chunk.

3. tracemalloc

And here's the tracemalloc output of the top 100 lines sorted by increase in memory consumption:

import os
import linecache
import tracemalloc
import types
from tifffile import TiffFile

def display_top(snapshot, key_type='lineno', limit=100):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)

    print("Top %s lines" % limit)
    for index, stat in enumerate(top_stats[:limit], 1):
        frame = stat.traceback[0]
        print("#%s: %s:%s: %.1f KiB"
              % (index, frame.filename, frame.lineno, stat.size / 1024))
        line = linecache.getline(frame.filename, frame.lineno).strip()
        if line:
            print('    %s' % line)

    other = top_stats[limit:]
    if other:
        size = sum(stat.size for stat in other)
        print("%s other: %.1f KiB" % (len(other), size / 1024))
    total = sum(stat.size for stat in top_stats)
    print("Total allocated size: %.1f KiB" % (total / 1024))

tracemalloc.start()

t = TiffFile(os.path.expanduser("~/broken.test.file.svs"))

snap = tracemalloc.take_snapshot()
display_top(snap)
#output
Top 100 lines
#1: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/site-packages/tifffile/tifffile.py:7832: 55408700.0 KiB
    value = unpack(fmt, fh.read(size))
#2: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/enum.py:248: 9.6 KiB
    enum_member._value_ = member_type(*args)
#3: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/enum.py:193: 6.6 KiB
    enum_class = super().__new__(metacls, cls, bases, classdict)
#4: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/site-packages/tifffile/tifffile.py:11215: 6.4 KiB
    value = func() if callable(func) else func
#5: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/enum.py:243: 5.2 KiB
    enum_member = __new__(enum_class, *args)
#6: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/enum.py:428: 5.2 KiB
    super().__setattr__(name, value)
#7: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/site-packages/tifffile/tifffile.py:12796: 3.7 KiB
    a[3:]
#8: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/enum.py:272: 3.6 KiB
    enum_class._value2member_map_[value] = enum_member
#9: /home/user/environment/miniconda3/envs/broken-test-svs/lib/python3.8/enum.py:267: 3.6 KiB
    enum_class._member_map_[member_name] = enum_member
#10: <frozen importlib._bootstrap_external>:640: 3.1 KiB

[... ommitted 90 lines with very little memory consumption ...]

90 other: 11.5 KiB
Total allocated size: 55408821.8 KiB

Which I guess suggests that the line tifffile.py:7832 is responsible for ~55GB of memory consumption?

Cheers,
Andreas 😃

@cgohlke
Copy link
Owner

cgohlke commented Oct 19, 2021

Do you know the codes and types of the three large tags? I don't think it's the ASCII one mentioned earlier that's causing the issue. Maybe one of the large tag values is getting unpacked to Python integers.

@ap--
Copy link
Author

ap-- commented Oct 19, 2021

You're right:

TAG::2325 {
  'parent': <tifffile.tifffile.TiffFile object at 0x7f716dd1fe80>,
  'offset': 3256275360,
  'code': 4352,
  'dtype': <DATATYPES.SSHORT: 8>,
  'count': 66060528,
  'Type[value]': 'Tuple[int]',
  'valueoffset': 16781568
}
TAG::2327 {
  'parent': <tifffile.tifffile.TiffFile object at 0x7f716dd1fe80>,
  'offset': 3256275384,
  'code': 1,
  'dtype': <DATATYPES.ASCII: 2>,
  'count': 3942661888,
  'Type[value]': 'bytes',
  'valueoffset': 462554421
}
TAG::3026 {
  'parent': <tifffile.tifffile.TiffFile object at 0x7f716dd1fe80>,
  'offset': 3256283772,
  'code': 38699,
  'dtype': <DATATYPES.SHORT: 3>,
  'count': 1407807342,
  'Type[value]': 'Tuple[int]',
  'valueoffset': 1540266541
}

@cgohlke
Copy link
Owner

cgohlke commented Nov 2, 2021

Should be mitigated by v2021.11.2, which delay-loads non-essential TIFF tag values.

@cgohlke cgohlke closed this as completed Nov 2, 2021
@ap--
Copy link
Author

ap-- commented Nov 8, 2021

Tested with all corrupted files that were problematic and it works perfectly now.
Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants