-
Notifications
You must be signed in to change notification settings - Fork 2
AI Model Policy
This page is the implementation of darktable's broader AI policy as it applies to this repo. The parent document – covering the project-wide stance, rationale, and high-level rules – lives in the main darktable wiki:
→ AI Model Integration Policy (parent)
When the two pages disagree, the darktable wiki wins. If you think a rule here needs to change, raise it against the parent first.
Every model we ship goes through this checklist before merge. The aim is honest disclosure, not unattainable purity: open-source AI is a moving target, and many otherwise-excellent models have one or two compromises around training data or pre-training corpora. We accept models with such compromises provided every compromise is disclosed up front in the model card's Known limitations row.
darktable is GPL-3.0 free software. Every default a user encounters in the application is, in effect, a recommendation from the project. We don't want to recommend something we couldn't audit, or that could harm the people in front of the camera. The bar is deliberately conservative – better to ship fewer models we trust than more we can't vouch for.
These are non-negotiable. A model that fails any of them isn't accepted.
| Requirement | What we want |
|---|---|
| Model weights license | Compatible with GPL-3.0 distribution. Apache-2.0, MIT, BSD, GPL-3.0, and similar permissive licenses are fine. Proprietary or non-commercial-only weights are not. |
| OSAID v1.0 class | Open Source AI, Open Weights, or Open Model. |
| MOF class | Class I (Open Science), Class II (Open Tooling), or Class III (Open Model). |
| Local inference | All inference runs locally. No telemetry, no cloud calls, no remote endpoints. |
| Purpose-limited scope | Photo editing tasks only: denoising, masking, depth, upscaling, inpainting, embeddings. We do not accept models designed for generating, manipulating, or synthesizing human likenesses. |
| Reproducibility | Conversion scripts, configs, and source references must be complete enough that anyone can rebuild the ONNX from the original checkpoints. |
| Published research | A peer-reviewed paper or public technical report describing the architecture and training procedure. |
These must be filled in honestly. Imperfect answers are allowed; missing or vague answers are not.
| Field | What goes here |
|---|---|
| Training data provenance | Where every training dataset came from, who collected it, and how. "Web-crawled", "stock images from provider X", "photographed by the authors" – be specific. |
| Training data license | The license(s) of each dataset, as published. Datasets without an explicit license, or with non-OSI / research-only licenses, are acceptable – but they must be named in Known limitations (see below). |
| Training code | Link to the public training code, with its license. If training code isn't released, say so – this is a known limitation. |
| Known limitations | Everything that fell short of an ideal – see examples below. Document the issue clearly so users and downstream maintainers know what they're getting. |
Examples of acceptable-with-disclosure known limitations, drawn from models we currently ship:
- "Training datasets X and Y do not have explicit open-source licenses."
- "Web-crawled training images; individual licenses not verified."
- "Pre-training dataset is research-only (not OSI); prohibits commercial use of the data itself."
- "Aggregated dataset bundles items with varying terms."
The model-card review is a sanity check on this row in particular: maintainers verify that the limitations actually listed match the model's real situation, and aren't quietly omitting something.
- Models for generating, manipulating, or synthesizing human likenesses. Out of scope regardless of license.
- Models trained on private data scraped without consent of the subjects (e.g. surveillance datasets, leaked photos). "Scraped without consent" here means the people in the photographs didn't consent – not "individual image copyright wasn't verified for every web-crawl entry." The latter is acceptable with disclosure; the former isn't.
- Models whose weights cannot be redistributed under a GPL-3.0-compatible license. The training data's license doesn't have to be GPL-compatible – what matters is whether the resulting weights can be redistributed.
When you open a PR adding or updating a model:
- CI runs the full pipeline against your changes (
check-pr.yml). - A maintainer reads the model card and verifies every hard requirement is met.
- The maintainer checks the Known limitations row against the model's real situation – not for severity, but for honesty and completeness.
- The maintainer may ask for changes – usually fleshing out provenance, clarifying licenses, or expanding the limitations row.
- Once everything is green, the PR is merged. The next nightly build picks it up; it ships in the next even-minor release.
When in doubt, ask. We'd rather have the conversation early than reject a model after the conversion work is done.
darktable-ai wiki is licensed under the Creative Commons BY-SA 4.0 terms.