AI Model Policy

This page is the implementation of darktable's broader AI policy as it applies to this repo. The parent document – covering the project-wide stance, rationale, and high-level rules – lives in the main darktable wiki:

→ AI Model Integration Policy (parent)

When the two pages disagree, the darktable wiki wins. If you think a rule here needs to change, raise it against the parent first.

Every model we ship goes through this checklist before merge. The aim is honest disclosure, not unattainable purity: open-source AI is a moving target, and many otherwise-excellent models have one or two compromises around training data or pre-training corpora. We accept models with such compromises provided every compromise is disclosed up front in the model card's Known limitations row.

Why this matters

darktable is GPL-3.0 free software. Every default a user encounters in the application is, in effect, a recommendation from the project. We don't want to recommend something we couldn't audit, or that could harm the people in front of the camera. The bar is deliberately conservative – better to ship fewer models we trust than more we can't vouch for.

Hard requirements

These are non-negotiable. A model that fails any of them isn't accepted.

Requirement	What we want
Model weights license	Compatible with GPL-3.0 distribution. Apache-2.0, MIT, BSD, GPL-3.0, and similar permissive licenses are fine. Proprietary or non-commercial-only weights are not.
OSAID v1.0 class	Open Source AI, Open Weights, or Open Model.
MOF class	Class I (Open Science), Class II (Open Tooling), or Class III (Open Model).
Local inference	All inference runs locally. No telemetry, no cloud calls, no remote endpoints.
Purpose-limited scope	Photo editing tasks only: denoising, masking, depth, upscaling, inpainting, embeddings. We do not accept models designed for generating, manipulating, or synthesizing human likenesses.
Reproducibility	Conversion scripts, configs, and source references must be complete enough that anyone can rebuild the ONNX from the original checkpoints.
Published research	A peer-reviewed paper or public technical report describing the architecture and training procedure.

Disclosure requirements

These must be filled in honestly. Imperfect answers are allowed; missing or vague answers are not.

Field	What goes here
Training data provenance	Where every training dataset came from, who collected it, and how. "Web-crawled", "stock images from provider X", "photographed by the authors" – be specific.
Training data license	The license(s) of each dataset, as published. Datasets without an explicit license, or with non-OSI / research-only licenses, are acceptable – but they must be named in Known limitations (see below).
Training code	Link to the public training code, with its license. If training code isn't released, say so – this is a known limitation.
Known limitations	Everything that fell short of an ideal – see examples below. Document the issue clearly so users and downstream maintainers know what they're getting.

Examples of acceptable-with-disclosure known limitations, drawn from models we currently ship:

"Training datasets X and Y do not have explicit open-source licenses."
"Web-crawled training images; individual licenses not verified."
"Pre-training dataset is research-only (not OSI); prohibits commercial use of the data itself."
"Aggregated dataset bundles items with varying terms."

The model-card review is a sanity check on this row in particular: maintainers verify that the limitations actually listed match the model's real situation, and aren't quietly omitting something.

What we won't accept, even with disclosure

Models for generating, manipulating, or synthesizing human likenesses. Out of scope regardless of license.
Models trained on private data scraped without consent of the subjects (e.g. surveillance datasets, leaked photos). "Scraped without consent" here means the people in the photographs didn't consent – not "individual image copyright wasn't verified for every web-crawl entry." The latter is acceptable with disclosure; the former isn't.
Models whose weights cannot be redistributed under a GPL-3.0-compatible license. The training data's license doesn't have to be GPL-compatible – what matters is whether the resulting weights can be redistributed.

How a model gets reviewed

When you open a PR adding or updating a model:

CI runs the full pipeline against your changes (check-pr.yml).
A maintainer reads the model card and verifies every hard requirement is met.
The maintainer checks the Known limitations row against the model's real situation – not for severity, but for honesty and completeness.
The maintainer may ask for changes – usually fleshing out provenance, clarifying licenses, or expanding the limitations row.
Once everything is green, the PR is merged. The next nightly build picks it up; it ships in the next even-minor release.

When in doubt, ask. We'd rather have the conversation early than reject a model after the conversion work is done.

darktable-ai wiki is licensed under the Creative Commons BY-SA 4.0 terms.

Table of contents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AI Model Policy

Why this matters

Hard requirements

Disclosure requirements

What we won't accept, even with disclosure

How a model gets reviewed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally