Skip to content

feat: recipes — web URL import with JSON-LD shortcut + LLM fallback#3

Open
BrainMode wants to merge 1 commit into
mainfrom
feat/recipes-import-pipeline
Open

feat: recipes — web URL import with JSON-LD shortcut + LLM fallback#3
BrainMode wants to merge 1 commit into
mainfrom
feat/recipes-import-pipeline

Conversation

@BrainMode
Copy link
Copy Markdown
Owner

Summary

First slice of the multi-source recipe import pipeline. Web URLs only in this PR — YouTube/Instagram/TikTok/Image follow in subsequent PRs.

Pipeline:

  1. URL detection (existing detector covers 10+ source types)
  2. HTTP fetch + JSON-LD parse (schema.org/Recipe)
  3. JSON-LD shortcut: if found, stamp provenance, return directly (no LLM cost, ~2s)
  4. Firecrawl markdown fallback for sites without JSON-LD
  5. Opus 4.7 synthesis with per-field provenance

Verification

  • npm run typecheck — green
  • npm run lint — green
  • npm run test — 90 passing
  • npm run build — 13 routes, /recipes/import 5.4kB

Test plan

  • /recipes empty state → click "Importieren"
  • Paste chefkoch.de URL → JSON-LD shortcut, ~2s, recipe shown with jsonld provenance badges
  • Paste site without JSON-LD → Firecrawl markdown → LLM extraction (~15s)
  • Paste plain-text recipe → manual_text path → LLM
  • Paste YouTube URL → polite "unsupported_source" error pointing to plain-text fallback
  • Save → /recipes/[id] shows ingredients with provenance, steps, source URL

Out of scope (next PRs)

  • YouTube/Instagram/TikTok/Facebook (Phase 4b — yt-dlp + ElevenLabs Scribe)
  • Image-Upload via Vision-LLM (Phase 4c)
  • Migros-Lookup + Pantry-Match + Cook-Log (Phase 4d)
  • Multi-Recipe selection UI (when one URL has 10 recipes)

🤖 Generated with Claude Code

…allback)

First slice of the recipe import pipeline. Web URLs only — YouTube/Instagram/
TikTok/Image follow in subsequent PRs.

Pipeline:
  1. detectSource(url) → DetectedSource type
  2. For 'web_url': fetch HTML
  3. Parse JSON-LD (jsonld.ts) → 80% of recipe blogs covered, no LLM cost
  4. If JSON-LD has exactly one Recipe, SHORTCUT: stamp provenance, return
     directly (Brzycki-fast, ~2s end-to-end)
  5. Else: Firecrawl markdown fallback → LLM-Synthesis with Opus 4.7
  6. ExtractionResult with per-field provenance back to client

Files:

- src/lib/integrations/firecrawl/extract.ts — FirecrawlApp wrapper, scrapeUrl
  with markdown+html formats
- src/lib/recipes/sources/web.ts — fetch + JSON-LD-first + Firecrawl-fallback
  + paywall/Cloudflare detection
- src/lib/recipes/extraction-strategy.ts — orchestrator (web + manual_text path
  fully wired, other source-types return 'unsupported_source' politely)
- src/lib/recipes/synthesis.ts — fs-based prompt loader fix (was throwing stub)
- src/lib/recipes/import.ts — public-API wrapper extractRecipe()

API:
- POST /api/recipes/import — synchronous, returns { result, bundle }
  60s maxDuration cap fits Web-URL extraction (JSON-LD <2s, Firecrawl ~10s,
  LLM ~10-20s)

Server actions (src/app/(app)/recipes/actions.ts):
- saveRecipe — writes recipes + ingredients (with provenance jsonb) + steps
  (with provenance jsonb) + recipe_equipment
- deleteRecipe

UI:
- /recipes — list with tags + cooking time + kcal preview
- /recipes/import — URL/text input, live pipeline log, recipe preview with
  per-ingredient provenance badges, save button
- /recipes/[id] — detail with provenance, scaled servings, source link

Verified: typecheck + lint + 90 tests + build all green.

Next PR (Phase 4b): YouTube + Instagram + TikTok via youtube-dl-exec +
ElevenLabs Scribe for audio fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant