Skip to content

cooklang/fine_tune

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cooklang Fine-Tuning Dataset

Training data for fine-tuning LLM models to convert recipes into Cooklang format.

The dataset is built from HelloFresh recipe pages scraped across multiple countries.

Structure

recipes/           Converted recipe pairs (.recipe → .cook) used as training data
inbox/             Raw scraped recipes not yet converted
labler/            Web UI for reviewing and correcting .recipe → .cook conversions
src/main.rs        HelloFresh scraper (Rust)
batch_convert.py   Batch conversion script using OpenAI Batch API

Data

recipes/ — Training Pairs

~17,700 recipe pairs across 11 countries (at, au, be, de, es, fr, gb, ie, it, nl, us). Each recipe has:

  • .recipe — raw structured text scraped from HelloFresh (YAML frontmatter + ingredients + steps)
  • .cook — the same recipe converted to Cooklang format

These pairs form the training data for fine-tuning.

inbox/ — Unconverted Recipes

~19,700 raw .recipe files from 4 additional countries (ca, ch, lu, nz) awaiting conversion.

Tools

Scraper (Rust)

Scrapes HelloFresh sitemaps and downloads recipes using cooklang-import.

cargo run -- --countries us,gb --output recipes
cargo run -- --list-countries

Requires cooklang-import binary (default path: ../cooklang-import/target/debug/cooklang-import).

Batch Converter (Python)

Converts .recipe files to .cook using the OpenAI Batch API with a fine-tuned model.

pip install -r requirements.txt

python batch_convert.py prepare   # Generate JSONL batch input
python batch_convert.py submit    # Upload and start batch job
python batch_convert.py collect   # Download results when complete

Requires OPENAI_API_KEY env var. Optionally set OPENAI_MODEL to override the default model.

Labler (Rust)

Web UI for reviewing and correcting .recipe.cook conversions side-by-side. Three-panel layout with syntax-highlighted Cooklang editor, real-time parsing, and diff view.

Labler screenshot

cd labler
cargo run                      # All recipes (defaults to ../recipes)
cargo run -- ../recipes/us     # Only US recipes

See labler/README.md for details.

Validation Workflow

Multiple people can collaborate on reviewing recipes by splitting country folders between them.

Setup

Each reviewer picks a country folder to work on:

cd labler
cargo run -- ../recipes/us     # Reviewer A
cargo run -- ../recipes/gb     # Reviewer B

Review Process

  1. Open a recipe and compare the original .recipe (left panel) with the Cooklang source (right panel)
  2. Fix any conversion issues in the Cooklang editor — the middle panel shows a live preview
  3. Use "Show Diff" to spot differences between the original and the rendered output
  4. Once the recipe looks correct, add fine_tune_status: reviewed to the YAML frontmatter in the .cook file:
---
fine_tune_status: reviewed
---

Preheat oven to 425 degrees. Dice @potatoes{12%oz}...
  1. Save (⌘S) and move to the next recipe (⌘])

Status Tracking

The fine_tune_status frontmatter field in .cook files tracks review progress:

Value Meaning
(missing) Not yet reviewed
reviewed Human-verified and corrected

This lets the team see at a glance which recipes are done and which still need attention.

Recipe Format

Input (.recipe)

---
title: Recipe Name
description: ...
image: https://...
servings: 2
time required: 30m
nutrition:
  calories: 500 kcal
  ...
---

2 unit Onion
1 tablespoon Olive Oil
...

• Dice the onion.
• Heat olive oil in a pan.
...

Output (.cook)

Dice the @onion{2}. Heat @olive oil{1%tablespoon} in a #pan{}.

License

The recipe content is sourced from HelloFresh and remains their intellectual property. This dataset is provided for research and educational purposes only.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages