Skip to content

Unable to allocate memory #345

@ankravch

Description

@ankravch
[GCC 8.5.0 20210514 (Red Hat 8.5.0-24)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyreadstat as prs
>>> d,m=prs.read_dta("test.dta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyreadstat/pyreadstat.pyx", line 296, in pyreadstat.pyreadstat.read_dta
  File "pyreadstat/_readstat_parser.pyx", line 1282, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat/_readstat_parser.pyx", line 955, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat/_readstat_parser.pyx", line 877, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to allocate memory
>>>

From my investigation, the issue is caused by L451 in readstat_dta_read.c, within dta_read_strls() function. It allocates memory for each string separately in a while loop. Later, at L445, the code is unable to allocate a large continuous chunk of memory because the heap is heavily fragmented.

With the reproducible example https://www.dropbox.com/scl/fi/sx9cz7vjekvud3ail9ph3/test.dta?rlkey=7e5qmwl9tbuoa0967kq3uq65f&st=g3wxulnc&dl=0,
L451 (malloc for each string) was executed approximately 1.6 million times. After that, L445 failed to allocate 26MB of continuous heap memory.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions