Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(process-files): quit Python interpreter after every batch to prevent "too many open files" error (DEV-2268) #402

Merged
merged 35 commits into from Jul 14, 2023

Conversation

jnussbaum
Copy link
Collaborator

resolves DEV-2268

@jnussbaum jnussbaum self-assigned this Jun 12, 2023
@linear
Copy link

linear bot commented Jun 12, 2023

DEV-2268 DSP-TOOLS: process-files errors: Too many open files

Description

On Vij's machine, the process-files command failed: The first half an hour went well, but then ca. 25'000 .orig files of TIFs could not be created.

After these 25'000 TIF errors, some dozens OSError: [Errno 24] Too many open files appear, together with some more thousands of .orig files of TIFs that cannot be created. (for more details: see "Terminal output" below).

The OSError seems to be always triggered by shutil.copyfile(in_file, orig_file_full_path) in the method _create_orig_file().

The 4 log files of 5 MB size each were not enough, they quickly were filled and overwritten. Every second ca. 1 MB of log entries are logged (!)

Analysis

A similar case on SO (https://stackoverflow.com/questions/40158725/40158989#40158989) leads me to the conclusion that too many files/sockets are open at the same time.

Open files/sockets could be:

  1. log file handlers
  2. images to process
    1. copy orig file (of image/video/other) to *.orig (that's the shutil.copyfile() from above that triggers the error)
    2. socket that communicates with SIPI container to create JP2
    3. SIPI container that writes JP2 (?)
    4. shell script that extracts the frames from video (when writing a frame, a file is opened?)
    5. write sidecar file (of image/video/other)

I assume the culprit is no. 2.

Possible solutions

Reduce number of threads

Difficult to achieve. It would perhaps be possible to read the system's ulimit and then adapt to it or tweak it. But the current implementation of process-files is that every thread does a lot of things, and among the many things, opens a file from time to time. So a reduction of threads (or tweaking of ulimit) would not exactly control the number of files that are open at the same time.

Catch OSError, retry until it works

I could try-catch the OSError, and retry until it works, for the cases that are under my control (i.e. copy orig file, socket, write sidecar file).

Steps to do

  • Only for process-files: increase number of log files
  • reduce log level / number of log entires (?)
  • try-catch OSError where it's possible

Terminal output

(dsp-tools-py3.10) bash-3.2$ dsp-tools process-files --input-dir=multimedia --output-dir=/Volumes/Thunderbay\ 1/tmp-test-sgv sgv-v8.7_v1.0_real-files.xml 
2023-06-08 15:17:35.683407: Created and started Sipi container 'sipi'.
2023-06-08 15:17:35.689103: Sipi container is running.
2023-06-08 15:17:45.376517: Found 102934 bitstreams in the XML file.
2023-06-08 15:17:45.898360: Start local file processing...
2023-06-08 15:52:37.839471: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/ba/92/ba92df62-1650-41bf-8d15-f4836699dd8b.tif.orig
2023-06-08 15:52:39.313818: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/ba/60/ba600fb1-fce8-4ab1-b12a-cd6b49f07561.TIF.orig
2023-06-08 15:52:40.608074: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/02/43/02430c21-7ff0-4170-83ed-3a35a8bb2030.tif.orig

~25'000 lines later (all the same: couldn't create .tif.orig file)

2023-06-08 15:59:36.018012: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/11/91/1191a25c-e495-40e8-a9c6-6db01ff001ac.TIF.orig
--- Logging error ---
Traceback (most recent call last):
  File "/Users/vijeinathtissaveerasingham/Git/dsp-tools/src/dsp_tools/fast_xmlupload/process_files.py", line 346, in _create_orig_file
    shutil.copyfile(in_file, orig_file_full_path)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
OSError: [Errno 24] Too many open files: 'multimedia/SGV_12N_46662.tif'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/handlers.py", line 74, in emit
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/handlers.py", line 181, in doRollover
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/__init__.py", line 1201, in _open
OSError: [Errno 24] Too many open files: '/Users/vijeinathtissaveerasingham/.dsp-tools/logging.log'
Call stack:
2023-06-08 15:59:36.027534: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/8e/a2/8ea24045-68e0-436d-98a6-3c90e9c0d303.TIF.orig
2023-06-08 15:59:36.041387: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/79/ac/79acecc1-7c7c-4c04-9a37-d17ea62adaca.tif.orig
2023-06-08 15:59:36.041558: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/de/90/de90f507-5112-4551-9a15-d99c0830b214.TIF.orig
2023-06-08 15:59:36.041857: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/29/69/29691900-2cca-48a3-95a7-812d87821639.tif.orig
2023-06-08 15:59:36.041940: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/ad/52/ad52f792-c2e4-4272-b51f-a6700bbb2a2c.tif.orig
2023-06-08 15:59:36.042126: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/38/7b/387bcead-a2f0-497d-aa84-db2bc17f41cf.TIF.orig
2023-06-08 15:59:36.042351: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/1b/3a/1b3abbdf-aea3-450d-9dc6-5e7c776babef.TIF.orig
2023-06-08 15:59:36.042427: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/0d/61/0d61eeaa-2aa4-463e-af67-0c7bbe52a057.tif.orig
2023-06-08 15:59:36.041887: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/56/9b/569bfd8b-539b-49a6-8a2c-8021b2f72902.tif.orig
2023-06-08 15:59:36.042742: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/c3/f2/c3f275a1-407b-4d8f-919f-eb4eabdcd98b.TIF.orig
2023-06-08 15:59:36.041728: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/63/95/6395af65-6a87-4f20-9437-6f24e8312b3f.TIF.orig
2023-06-08 15:59:36.042524: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/08/db/08db420b-7ee2-42ac-87ab-4954e81d8f24.tif.orig
2023-06-08 15:59:36.042072: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/b2/56/b25625d1-4488-4114-bbec-c8b215034147.tif.orig
2023-06-08 15:59:36.042389: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/a3/4b/a34b9868-b50a-4012-bd34-f48187543e14.tif.orig
2023-06-08 15:59:36.042625: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/b1/15/b1157017-9d2c-46ef-b45f-c9a730192223.TIF.orig
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 83, in _worker
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
  File "/Users/vijeinathtissaveerasingham/Git/dsp-tools/src/dsp_tools/fast_xmlupload/process_files.py", line 571, in _process_file
    if not _create_orig_file(
  File "/Users/vijeinathtissaveerasingham/Git/dsp-tools/src/dsp_tools/fast_xmlupload/process_files.py", line 351, in _create_orig_file
    logger.error(f"Couldn't create .orig file {orig_file_full_path}", exc_info=True)
Message: "Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/5b/1e/5b1e7020-91a6-4d28-a422-da301225726c.tif.orig"
Arguments: ()
2023-06-08 15:59:36.065978: ERROR: Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/13/9c/139c0cb3-d259-4256-be52-a6826020f9ee.TIF.orig

(all the same it continues with couldn't create .tif.orig, until line ~33'800)

--- Logging error ---
Traceback (most recent call last):
  File "/Users/vijeinathtissaveerasingham/Git/dsp-tools/src/dsp_tools/fast_xmlupload/process_files.py", line 346, in _create_orig_file
    shutil.copyfile(in_file, orig_file_full_path)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/shutil.py", line 254, in copyfile
    with open(src, 'rb') as fsrc:
OSError: [Errno 24] Too many open files: 'multimedia/AH24-SGV_01P_01830.tif'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/handlers.py", line 74, in emit
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/handlers.py", line 181, in doRollover
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/logging/__init__.py", line 1201, in _open
OSError: [Errno 24] Too many open files: '/Users/vijeinathtissaveerasingham/.dsp-tools/logging.log'
Call stack:
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 973, in _bootstrap
    self._bootstrap_inner()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 83, in _worker
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/concurrent/futures/thread.py", line 58, in run
  File "/Users/vijeinathtissaveerasingham/Git/dsp-tools/src/dsp_tools/fast_xmlupload/process_files.py", line 571, in _process_file
    if not _create_orig_file(
  File "/Users/vijeinathtissaveerasingham/Git/dsp-tools/src/dsp_tools/fast_xmlupload/process_files.py", line 351, in _create_orig_file
    logger.error(f"Couldn't create .orig file {orig_file_full_path}", exc_info=True)
Message: "Couldn't create .orig file /Volumes/Thunderbay 1/tmp-test-sgv/6e/cf/6ecf71bf-ce7c-48a2-8fb1-5e761f5de04c.tif.orig"
Arguments: ()

This last block is repeated some dozens of times, almost identically, until line ~35'600.

Then, some thousands of .orig files of TIFs cannot be created, again interrupted several times by blocks of the same OSError: [Errno 24] Too many open files, until the process ended at 16:00:28.

Processing-Terminal-Output.txt.zip

@jnussbaum jnussbaum changed the title fix(fast-xmlupload): handle "too many open files" error (DEV-2268) fix(process-files): quit Python interpreter after every batch to prevent "too many open files" error (DEV-2268) Jul 14, 2023
@jnussbaum jnussbaum merged commit 1cbf927 into main Jul 14, 2023
5 checks passed
@jnussbaum jnussbaum deleted the wip/dev-2268-too-many-open-files branch July 14, 2023 07:17
@daschbot daschbot mentioned this pull request Jul 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant