Training data for fine-tuning LLM models to convert recipes into Cooklang format.
The dataset is built from HelloFresh recipe pages scraped across multiple countries.
recipes/ Converted recipe pairs (.recipe → .cook) used as training data
inbox/ Raw scraped recipes not yet converted
labler/ Web UI for reviewing and correcting .recipe → .cook conversions
src/main.rs HelloFresh scraper (Rust)
batch_convert.py Batch conversion script using OpenAI Batch API
~17,700 recipe pairs across 11 countries (at, au, be, de, es, fr, gb, ie, it, nl, us). Each recipe has:
.recipe— raw structured text scraped from HelloFresh (YAML frontmatter + ingredients + steps).cook— the same recipe converted to Cooklang format
These pairs form the training data for fine-tuning.
~19,700 raw .recipe files from 4 additional countries (ca, ch, lu, nz) awaiting conversion.
Scrapes HelloFresh sitemaps and downloads recipes using cooklang-import.
cargo run -- --countries us,gb --output recipes
cargo run -- --list-countriesRequires cooklang-import binary (default path: ../cooklang-import/target/debug/cooklang-import).
Converts .recipe files to .cook using the OpenAI Batch API with a fine-tuned model.
pip install -r requirements.txt
python batch_convert.py prepare # Generate JSONL batch input
python batch_convert.py submit # Upload and start batch job
python batch_convert.py collect # Download results when completeRequires OPENAI_API_KEY env var. Optionally set OPENAI_MODEL to override the default model.
Web UI for reviewing and correcting .recipe → .cook conversions side-by-side. Three-panel layout with syntax-highlighted Cooklang editor, real-time parsing, and diff view.
cd labler
cargo run # All recipes (defaults to ../recipes)
cargo run -- ../recipes/us # Only US recipesSee labler/README.md for details.
Multiple people can collaborate on reviewing recipes by splitting country folders between them.
Each reviewer picks a country folder to work on:
cd labler
cargo run -- ../recipes/us # Reviewer A
cargo run -- ../recipes/gb # Reviewer B- Open a recipe and compare the original
.recipe(left panel) with the Cooklang source (right panel) - Fix any conversion issues in the Cooklang editor — the middle panel shows a live preview
- Use "Show Diff" to spot differences between the original and the rendered output
- Once the recipe looks correct, add
fine_tune_status: reviewedto the YAML frontmatter in the.cookfile:
---
fine_tune_status: reviewed
---
Preheat oven to 425 degrees. Dice @potatoes{12%oz}...- Save (
⌘S) and move to the next recipe (⌘])
The fine_tune_status frontmatter field in .cook files tracks review progress:
| Value | Meaning |
|---|---|
| (missing) | Not yet reviewed |
reviewed |
Human-verified and corrected |
This lets the team see at a glance which recipes are done and which still need attention.
---
title: Recipe Name
description: ...
image: https://...
servings: 2
time required: 30m
nutrition:
calories: 500 kcal
...
---
2 unit Onion
1 tablespoon Olive Oil
...
• Dice the onion.
• Heat olive oil in a pan.
...Dice the @onion{2}. Heat @olive oil{1%tablespoon} in a #pan{}.The recipe content is sourced from HelloFresh and remains their intellectual property. This dataset is provided for research and educational purposes only.
