Skip to content

Fix rare startup race condition between OpenCL background job and dt_wb_presets_init#20765

Open
da-phil wants to merge 1 commit intodarktable-org:masterfrom
da-phil:fix_rare_race_condition_at_startup
Open

Fix rare startup race condition between OpenCL background job and dt_wb_presets_init#20765
da-phil wants to merge 1 commit intodarktable-org:masterfrom
da-phil:fix_rare_race_condition_at_startup

Conversation

@da-phil
Copy link
Copy Markdown
Contributor

@da-phil da-phil commented Apr 7, 2026

Fix rare startup race condition between OpenCL background job and dt_wb_presets_init by reordering startup sequence. See gdb backtrace: darktable_bt_WNW7M3.txt

In darktable.c:1878-1887, the GUI startup path queues OpenCL detection as a background job on the system worker pool and then immediately continues to call dt_wb_presets_init(NULL).

dt_opencl_init (called by a worker thread) loads OpenCL vendor libraries (ICD loader, driver). These vendor libraries are known to call setlocale() internally.
setlocale() in glibc:

  • Acquires __libc_setlocale_lock
  • Frees and replaces internal locale data structures (including cached locale strings)

Meanwhile, the main thread is inside dcigettextguess_category_valuegetenv("LANGUAGE"). The glibc dcigettext code is not re-entrant with setlocale(): it caches locale state that can be freed mid-read when setlocale() runs concurrently, leading to a SIGSEGV. This does not happen in the non-GUI/CLI path (init_gui=0) because there dt_opencl_init is called synchronously before dt_wb_presets_init.

Potential fixes:

  1. Move dt_wb_presets_init (and dt_noiseprofile_init) to before the OpenCL background job is enqueued, so the JSON/gettext code finishes before any worker thread starts running the OCL job. In darktable.c around line 1876. The fix is confirmed clean for the non-GUI path since it already calls dt_opencl_init synchronously — the proposed reorder makes both paths consistent.
  2. Alternatively, a more targeted fix is to call setlocale(LC_ALL, "") once and freeze it (or save/restore it around dt_opencl_init) before JSON parsing begins, but reordering the init sequence is cleaner and less invasive.

Fix 1) was chosen in this case.

Fixes #20770

Fix rare starup race condition between OpenCL background job
and `dt_wb_presets_init`.

In darktable.c:1878-1887, the GUI startup path queues OpenCL detection as a background job on the system worker pool and then immediately continues to call `dt_wb_presets_init(NULL)`.

`dt_opencl_init` (called by a worker thread) loads OpenCL vendor libraries (ICD loader, driver). These vendor libraries are known to call `setlocale()` internally.
`setlocale()` in glibc:
* Acquires `__libc_setlocale_lock`
* Frees and replaces internal locale data structures (including cached locale strings)

Meanwhile, the main thread is inside `dcigettext` → `guess_category_value` → `getenv("LANGUAGE")`. The glibc `dcigettext` code is not re-entrant with `setlocale()`: it caches locale state that can be freed mid-read when `setlocale()` runs concurrently, leading to a SIGSEGV.
This does not happen in the non-GUI/CLI path (`init_gui=0`) because there `dt_opencl_init` is called synchronously before `dt_wb_presets_init`.

Potential fixes:
* Move `dt_wb_presets_init` (and `dt_noiseprofile_init`) to before the OpenCL background job is enqueued, so the JSON/gettext code finishes before any worker thread starts running the OCL job. In darktable.c around line 1876. The fix is confirmed clean for the non-GUI path since it already calls `dt_opencl_init` synchronously — the proposed reorder makes both paths consistent.
* Alternatively, a more targeted fix is to call `setlocale(LC_ALL, "")` once and freeze it (or save/restore it around `dt_opencl_init`) before JSON parsing begins, but reordering the init sequence is cleaner and less invasive.
@da-phil da-phil force-pushed the fix_rare_race_condition_at_startup branch from 3cc3ee3 to 61c41a7 Compare April 7, 2026 09:18
@wpferguson
Copy link
Copy Markdown
Member

IIUC we can't reproduce this bug. So, how did you find it? And more importantly, how are we going to test a "fix" to a "bug" we can't reproduce or verify?

@da-phil
Copy link
Copy Markdown
Contributor Author

da-phil commented Apr 7, 2026

IIUC we can't reproduce this bug. So, how did you find it? And more importantly, how are we going to test a "fix" to a "bug" we can't reproduce or verify?

Who is "we"? I encountered it 3 times over the past days & weeks. I'll attach a backtrace if I still find one. I also don't know how to reproduce it.
But I guess the explanation alone should be evidence enough that this is a justified measure to prevent such race conditions in the future.

EDIT: found one and attached it to the original post.

@wpferguson
Copy link
Copy Markdown
Member

Who is "we"?

All the people who raise issues when darktable crashes? There's no issue, no steps to reproduce,

I encountered it 3 times over the past days & weeks

In your description you said it was rare. You never said you encountered it. An issue would have been good, because others might have chimed in with they had seen it too. Also all the devs that have done a lot of work on the startup sequence would have a chance to weigh in.

But I guess the explanation alone should be evidence enough

Honestly when I read this I thought it was an AI generated code review and not necessarily a real problem.

I also don't know how to reproduce it.

That'll make testing fun. I guess we test to make sure nothing breaks and sooner or later decide it's fixed if it doesn't happen again.

@da-phil
Copy link
Copy Markdown
Contributor Author

da-phil commented Apr 8, 2026

In your description you said it was rare. You never said you encountered it. An issue would have been good, because others might have chimed in with they had seen it too. Also all the devs that have done a lot of work on the startup sequence would have a chance to weigh in.

To be honest, I was just too lazy to first create an issue, as I already spent some time on understanding and fixing the issue.

But I created a an issue now for tracking purpose: #20770

Honestly when I read this I thought it was an AI generated code review and not necessarily a real problem.

I used AI to help me with understand the gdb backtrace and the rootcause and it helped writing this summary, but it wasn't done all by AI ;)

That'll make testing fun. I guess we test to make sure nothing breaks and sooner or later decide it's fixed if it doesn't happen again.

Yeah, makes sense, will also keep testing with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix pull request fixing a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash at startup due to race condition between OpenCL background job and dt_wb_presets_init

3 participants