Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Export] Dt freezes and progress bar doesn't come to an end #15939

Closed
paolobenve opened this issue Dec 23, 2023 · 25 comments
Closed

[Export] Dt freezes and progress bar doesn't come to an end #15939

paolobenve opened this issue Dec 23, 2023 · 25 comments
Labels
scope: image processing correcting pixels

Comments

@paolobenve
Copy link
Contributor

paolobenve commented Dec 23, 2023

dt 4.6.0 self compiled, ubuntu 22.04

When I export various images, the export progress bar never comes to an end, like you can see in the screenshot:

dt-exporting

... and dt freezes. You must kill it and rerun it in order to keep working

Note 1: the exporting ends up correctly

Note 2: the bugs doesn't happen always, I'm trying to understan what triggers it.

@gi-man
Copy link
Contributor

gi-man commented Dec 23, 2023

Please use the template to provide all the info. GPU? Opencl? Can you provide a -d common output?

@ralfbrown ralfbrown added the scope: image processing correcting pixels label Dec 23, 2023
@paolobenve
Copy link
Contributor Author

opencl on nvidia 3060

$ darktable -d common
darktable 4.6.0+25~g83cec53276
Copyright (C) 2012-2023 Johannes Hanika and other contributors.

Compile options:
  Bit depth              -> 64 bit
  Debug                  -> DISABLED
  SSE2 optimizations     -> ENABLED
  OpenMP                 -> ENABLED
  OpenCL                 -> ENABLED
  Lua                    -> ENABLED  - API version 9.2.0
  Colord                 -> ENABLED
  gPhoto2                -> ENABLED
  GMIC                   -> ENABLED  - Compressed LUTs are supported
  GraphicsMagick         -> ENABLED
  ImageMagick            -> DISABLED
  libavif                -> ENABLED
  libheif                -> DISABLED
  libjxl                 -> DISABLED
  OpenJPEG               -> ENABLED
  OpenEXR                -> ENABLED
  WebP                   -> ENABLED

See https://www.darktable.org/resources/ for detailed documentation.
See https://github.com/darktable-org/darktable/issues/new/choose to report bugs.

     0.0002 application_directory: /opt/darktable/bin
     0.0002 darktable.datadir: /opt/darktable/share/darktable
     0.0002 darktable.plugindir: /opt/darktable/lib/darktable
     0.0002 darktable.localedir: /opt/darktable/share/locale
     0.0003 darktable.configdir: /home/paolo/.config/darktable
     0.0003 darktable.cachedir: /home/paolo/.cache/darktable
     0.0003 darktable.sharedir: /opt/darktable/share
     0.0003 darktable.tmpdir: /tmp
     0.0003 new_xdg_data_dirs: /opt/darktable/share:/usr/share/xubuntu:/usr/share/xfce4:/usr/local/share:/usr/share:/var/lib/snapd/desktop:/usr/share
     0.0297 [dt_worker_threads] using 6 worker threads
     0.0308 [dt_get_sysresource_level] switched to 3 as `unrestricted'
     0.0308   total mem:       15339MB
     0.0308   mipmap cache:    1917MB
     0.0308   available mem:   245438MB
     0.0308   singlebuff:      15339MB
     0.0364 [opencl_init] opencl library 'libOpenCL' found on your system and loaded, preference 'default path'
     0.0514 [opencl_init] found 1 platform
[opencl_init] found 1 device

[dt_opencl_device_init]
   DEVICE:                   0: 'NVIDIA GeForce RTX 3060 Laptop GPU'
   PLATFORM, VENDOR & ID:    NVIDIA CUDA, NVIDIA Corporation, ID=4318
   CANONICAL NAME:           nvidiacudanvidiageforcertx3060laptopgpu
   DRIVER VERSION:           535.129.03
   DEVICE VERSION:           OpenCL 3.0 CUDA, SM_20 SUPPORT
   DEVICE_TYPE:              GPU, dedicated mem
   GLOBAL MEM SIZE:          5938 MB
   MAX MEM ALLOC:            1484 MB
   MAX IMAGE SIZE:           32768 x 32768
   MAX WORK GROUP SIZE:      1024
   MAX WORK ITEM DIMENSIONS: 3
   MAX WORK ITEM SIZES:      [ 1024 1024 64 ]
   ASYNC PIXELPIPE:          NO
   PINNED MEMORY TRANSFER:   NO
   USE HEADROOM:             400Mb
   AVOID ATOMICS:            NO
   MICRO NAP:                250
   ROUNDUP WIDTH & HEIGHT    16x16
   CHECK EVENT HANDLES:      128
   TILING ADVANTAGE:         0.000
   DEFAULT DEVICE:           NO
   KERNEL BUILD DIRECTORY:   /opt/darktable/share/darktable/kernels
   KERNEL DIRECTORY:         /home/paolo/.cache/darktable/cached_v3_kernels_for_NVIDIACUDANVIDIAGeForceRTX3060LaptopGPU_53512903
   CL COMPILER OPTION:       -cl-fast-relaxed-math
   CL COMPILER COMMAND:      -w -cl-fast-relaxed-math  -DNVIDIA_SM_20=1 -DNVIDIA=1 -I"/opt/darktable/share/darktable/kernels"
   KERNEL LOADING TIME:       0.0214 sec
[opencl_init] OpenCL successfully initialized. internal numbers and names of available devices:
[opencl_init]		0	'NVIDIA CUDA NVIDIA GeForce RTX 3060 Laptop GPU'
     0.1361 [opencl_init] FINALLY: opencl is AVAILABLE and ENABLED.
[opencl_init] opencl_scheduling_profile: 'very fast GPU'
[opencl_init] opencl_device_priority: '*/!0,*/*/*/!0,*'
[opencl_init] opencl_mandatory_timeout: 200
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		1	1	1	1	1
[opencl_synchronization_timeout] synchronization timeout set to 0
[dt_opencl_update_priorities] these are your device priorities:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		0	0	0	0	0
[dt_opencl_update_priorities] show if opencl use is mandatory for a given pixelpipe:
[dt_opencl_update_priorities] 		image	preview	export	thumbs	preview2
[dt_opencl_update_priorities]		1	1	1	1	1
[opencl_synchronization_timeout] synchronization timeout set to 0
     0.1750 [dt_worker_threads] using 6 worker threads

@paolobenve
Copy link
Contributor Author

paolobenve commented Dec 27, 2023

Today's freeze: dt wasn't in first plane in the desktop, gthumb was in first plane -> the freeze in dt produced a freeze in the whole desktop, only the mouse was active, everything else was freezed.

Killing dt (from a non-desktop terminal) resolved the freeze.

@gi-man
Copy link
Contributor

gi-man commented Dec 27, 2023

Your availability memory is larger than your total memory. How much actual memory do you have?

I see a snap folder in the paths above. Why?

Is the -d common from when it happen?

Does it happen without opencl?

@paolobenve
Copy link
Contributor Author

paolobenve commented Dec 27, 2023

Your availability memory is larger than your total memory. How much actual memory do you have?

16 GB, and 16 GB swap

the 245 GB that it shows seems a bug.

dt is set to use unrestricted resources (processing tab)

I see a snap folder in the paths above. Why?

in /etc/profile.d/apps-bin-path.sh there's a line saying snap_xdg_path="/var/lib/snapd/desktop"

Is the -d common from when it happen?

if you are asking if I usually run dt with -d desktop, then no

Does it happen without opencl?

I should use dt for a while disabling opencl, it gets slow. Freezes happen randomly.

@wpferguson
Copy link
Member

dt is set to use unrestricted resources (processing tab)

set it to small and see if it works. If it does, then try default. I have 32 GB of memory and a 8GB Nvidia 3070 and my resources are set at default. If I try large, I have problems.

@gi-man
Copy link
Contributor

gi-man commented Dec 27, 2023

I think the system showing more memory than what is really available can be a problem.

You posted a darktable -d common but it doesnot appear to be one when the issue happen. Posting a -d common log from when the issue happens is the main way to identify the source of your issue.

So far this Issue report has very minimal information.

@gi-man
Copy link
Contributor

gi-man commented Dec 28, 2023

@jenshannoschwalm I would like your thoughts on this.

For 4.6.1

I think this value for unrestricted is too large (16384):

16384, 1024, 128, 900, // unrestricted

It leads to a very large and incorrect available memory (15339 * 16382 /1024) = 245346. I think a value of 1024 (or maybe slightly less at 1000) might be better.

For 4.8

Looking deeper into this area, I wonder if we should move the unrestricted to be in the special mode section. I think users have this option might think they are making things better/faster, but that is not the case, since dt could try to use memory the OS is using.

Which then made me wonder why even look at MemTotal and then take a percent from it (50% in default). I was thinking we could look at MemAvailable . For Win32 we can look at ullAvailPhys, but for Apple, I cant find a quick answer. I think available memory is safer (with some safety margin) since it already discounts OS usage.

found = !strncmp(line, "MemTotal:", 9);

@jenshannoschwalm
Copy link
Collaborator

jenshannoschwalm commented Dec 28, 2023

Using unrestricted is mostly bad, it's only purpose is to allow extremely large image to be processed. But it requires a working memory swapping. EDIT we should probably rethink this as most distributions or windows at least don't offer such large swap space. Likely double of available physical memory would be a safe choice.

We don't get that correct for flatpack apps for sure. EDIT: this has been something i don't know to fix properly yet. We have a number of reports related to out-of-memory killed darktable. I would be very good if someone could find a safe way to get amount of memory for flatpacks (using cgroup settings)

Looking for currently-available mem would mean to check that at runtime whenever we need memory and that would be performance ineffective. That could also lead to fighting between darktable subsystems, mipmap cache, thumb generation ...

My personal view would be to disable unrestricted completely.

@wpferguson
Copy link
Member

My personal view would be to disable unrestricted completely.

Or, don't offer it as a choice in the settings but leave it for expert mode (i.e. edit the darktablerc file).

@jenshannoschwalm
Copy link
Collaborator

@gi-man one more point that is certainly wrong for unrestricted mode is 90% of opencl mem. Assuming a common 4GB card would just take 400 for headroom which is certainly on the too low side.

@paolobenve
Copy link
Contributor Author

changed from unrestricted to large doesn't solve the dt freeze as described at the principle.

@wpferguson
Copy link
Member

The correct setting for you system is probably small, but it might run on default. It won't run on large or unrestricted.

@paolobenve
Copy link
Contributor Author

Well, I'm wondering: why does dt propose me wrong values? if it's a matter of the ram size, it should permit choosing only between correct values.

@jenshannoschwalm
Copy link
Collaborator

  1. There have been done some more OpenCL fixes that could be related to this issue - at least the log suggests so. Could you please try with latest git master?
  2. The manual tells you what setting are recommended. Unrestricted is only good for certain situations and on platforms that support a stable swapping. If using the default, dt should not crash at all - if so that would be a bug for me. If using unrestricted, dt assumes you know what you are doing :-)

@paolobenve
Copy link
Contributor Author

Actually dt crashes with large too

@gi-man
Copy link
Contributor

gi-man commented Dec 29, 2023

Actually dt crashes with large too

Your post are not helpful. Is this with master? Where is the log?

I'm going to unsubscribe.

@jenshannoschwalm
Copy link
Collaborator

@paolobenve i would appreciate a fresh test with todays git master. Please try with resources set to default.

You should start darktable -d pipe -d imageio

Before starting your darktable session from the console as above, could you do nvidia-smi and report the log too?

@paolobenve
Copy link
Contributor Author

$ nvidia-smi
Sat Dec 30 09:36:08 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off | 00000000:01:00.0 Off |                  N/A |
| N/A   37C    P8              13W /  60W |    143MiB /  6144MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4421      G   /usr/lib/xorg/Xorg                           76MiB |
|    0   N/A  N/A   3452671    C+G   ...28233051,1187029011363625870,262144       54MiB |
+---------------------------------------------------------------------------------------+

@paolobenve
Copy link
Contributor Author

I tried with master, I couldn't get any crash.

I used test copies of the images.

The crashes are random, I don't risk to use master in my daily work

@paolobenve
Copy link
Contributor Author

I got a general desktop freeze with current 4.6.x with resurces large, but I couldn't see anything at the end of the log.

perhaps dt writes out of its memory and that freezes the desktop?

@jenshannoschwalm
Copy link
Collaborator

got a general desktop freeze with current 4.6.x with resurces large, but I couldn't see anything at the end of the log.

I am not sure if you know what patterns to look after :-) I am interested in signs of some modules behaving in strange ways for example. So either share that log or let's forget about it as i can't get information of value...

@jenshannoschwalm
Copy link
Collaborator

Closing this for now as there is no provided data to investigate ... feel free to reopen :-)

@paolobenve
Copy link
Contributor Author

feel free to reopen :-)

I cannot :-( perhaps I need a permission which I haven't.

I could grab the output with the freeze:

$ /opt/darktable/bin/darktable -d pipe -d imageio > ~/DT-OUT.txt
remote: Enumerating objects: 6, done.
remote: Counting objects: 100% (6/6), done.
remote: Total 6 (delta 5), reused 6 (delta 5), pack-reused 0
Unpacking objects: 100% (6/6), 509 bytes | 169.00 KiB/s, done.
From https://github.com/darktable-org/lua-scripts
 * [new branch]      po/lua-translation -> origin/po/lua-translation

(darktable:679024): GLib-GObject-CRITICAL **: 18:51:56.303: g_object_set_data: assertion 'G_IS_OBJECT (object)' failed

(darktable:679024): GLib-GObject-CRITICAL **: 18:51:56.303: g_object_set_data: assertion 'G_IS_OBJECT (object)' failed

(darktable:679024): Gtk-CRITICAL **: 18:51:56.303: gtk_tree_model_foreach: assertion 'GTK_IS_TREE_MODEL (model)' failed

(darktable:679024): GLib-GObject-CRITICAL **: 18:52:02.312: g_object_set_data: assertion 'G_IS_OBJECT (object)' failed

(darktable:679024): GLib-GObject-CRITICAL **: 18:52:02.312: g_object_set_data: assertion 'G_IS_OBJECT (object)' failed

(darktable:679024): Gtk-CRITICAL **: 18:52:02.312: gtk_tree_model_foreach: assertion 'GTK_IS_TREE_MODEL (model)' failed

(darktable:679024): OsmGpsMap-WARNING **: 19:24:22.927: Error downloading tile: 7 - Connection terminated unexpectedly

(darktable:679024): libsoup-CRITICAL **: 19:24:22.928: soup_session_real_requeue_message: assertion 'item != NULL' failed

(darktable:679024): OsmGpsMap-WARNING **: 19:25:06.697: Error downloading tile: 7 - Connection terminated unexpectedly

(darktable:679024): libsoup-CRITICAL **: 19:25:06.697: soup_session_real_requeue_message: assertion 'item != NULL' failed

(darktable:679024): OsmGpsMap-WARNING **: 19:26:08.140: Error downloading tile: 7 - Connection terminated unexpectedly

(darktable:679024): libsoup-CRITICAL **: 19:26:08.140: soup_session_real_requeue_message: assertion 'item != NULL' failed

(darktable:679024): OsmGpsMap-WARNING **: 19:30:28.420: Error downloading tile: 7 - Connection terminated unexpectedly

(darktable:679024): libsoup-CRITICAL **: 19:30:28.420: soup_session_real_requeue_message: assertion 'item != NULL' failed
Terminado (killed)

And here is the DT-OUT.txt

It's a work session quite large, I think two days, each day about one hour. At the first export no freeze. Hoy freeze.

@paolobenve
Copy link
Contributor Author

I reopened dt and immediately repeated the last export with the same images, and I didn't get the freeze

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope: image processing correcting pixels
Projects
None yet
Development

No branches or pull requests

5 participants