Skip to content

AI Model Policy

Andrii Ryzhkov edited this page May 21, 2026 · 1 revision

This page is the implementation of darktable's broader AI policy as it applies to this repo. The parent document – covering the project-wide stance, rationale, and high-level rules – lives in the main darktable wiki:

AI Model Integration Policy (parent)

When the two pages disagree, the darktable wiki wins. If you think a rule here needs to change, raise it against the parent first.


Every model we ship goes through this checklist before merge. The aim is honest disclosure, not unattainable purity: open-source AI is a moving target, and many otherwise-excellent models have one or two compromises around training data or pre-training corpora. We accept models with such compromises provided every compromise is disclosed up front in the model card's Known limitations row.

Why this matters

darktable is GPL-3.0 free software. Every default a user encounters in the application is, in effect, a recommendation from the project. We don't want to recommend something we couldn't audit, or that could harm the people in front of the camera. The bar is deliberately conservative – better to ship fewer models we trust than more we can't vouch for.

Hard requirements

These are non-negotiable. A model that fails any of them isn't accepted.

Requirement What we want
Model weights license Compatible with GPL-3.0 distribution. Apache-2.0, MIT, BSD, GPL-3.0, and similar permissive licenses are fine. Proprietary or non-commercial-only weights are not.
OSAID v1.0 class Open Source AI, Open Weights, or Open Model.
MOF class Class I (Open Science), Class II (Open Tooling), or Class III (Open Model).
Local inference All inference runs locally. No telemetry, no cloud calls, no remote endpoints.
Purpose-limited scope Photo editing tasks only: denoising, masking, depth, upscaling, inpainting, embeddings. We do not accept models designed for generating, manipulating, or synthesizing human likenesses.
Reproducibility Conversion scripts, configs, and source references must be complete enough that anyone can rebuild the ONNX from the original checkpoints.
Published research A peer-reviewed paper or public technical report describing the architecture and training procedure.

Disclosure requirements

These must be filled in honestly. Imperfect answers are allowed; missing or vague answers are not.

Field What goes here
Training data provenance Where every training dataset came from, who collected it, and how. "Web-crawled", "stock images from provider X", "photographed by the authors" – be specific.
Training data license The license(s) of each dataset, as published. Datasets without an explicit license, or with non-OSI / research-only licenses, are acceptable – but they must be named in Known limitations (see below).
Training code Link to the public training code, with its license. If training code isn't released, say so – this is a known limitation.
Known limitations Everything that fell short of an ideal – see examples below. Document the issue clearly so users and downstream maintainers know what they're getting.

Examples of acceptable-with-disclosure known limitations, drawn from models we currently ship:

  • "Training datasets X and Y do not have explicit open-source licenses."
  • "Web-crawled training images; individual licenses not verified."
  • "Pre-training dataset is research-only (not OSI); prohibits commercial use of the data itself."
  • "Aggregated dataset bundles items with varying terms."

The model-card review is a sanity check on this row in particular: maintainers verify that the limitations actually listed match the model's real situation, and aren't quietly omitting something.

What we won't accept, even with disclosure

  • Models for generating, manipulating, or synthesizing human likenesses. Out of scope regardless of license.
  • Models trained on private data scraped without consent of the subjects (e.g. surveillance datasets, leaked photos). "Scraped without consent" here means the people in the photographs didn't consent – not "individual image copyright wasn't verified for every web-crawl entry." The latter is acceptable with disclosure; the former isn't.
  • Models whose weights cannot be redistributed under a GPL-3.0-compatible license. The training data's license doesn't have to be GPL-compatible – what matters is whether the resulting weights can be redistributed.

How a model gets reviewed

When you open a PR adding or updating a model:

  1. CI runs the full pipeline against your changes (check-pr.yml).
  2. A maintainer reads the model card and verifies every hard requirement is met.
  3. The maintainer checks the Known limitations row against the model's real situation – not for severity, but for honesty and completeness.
  4. The maintainer may ask for changes – usually fleshing out provenance, clarifying licenses, or expanding the limitations row.
  5. Once everything is green, the PR is merged. The next nightly build picks it up; it ships in the next even-minor release.

When in doubt, ask. We'd rather have the conversation early than reject a model after the conversion work is done.

Clone this wiki locally