Embed YAML source in PNG output (port of upstream PR #234)#5
Conversation
|
@jacobian91 — heads up, your upstream #234 (Embed YAML data into PNG) is being incorporated here. Context: We use WireViz to document Classic Mini Cooper wiring harnesses at Classic Mini DIY and are building a GUI on top of it. Upstream Your PR is the most consequential one for the GUI direction — it's what makes a PNG a self-contained editable artifact instead of an opaque pixel dump. Round-trip drag-and-drop editing in the GUI depends on this exact mechanism. The 2021-era patch had bit-rotted significantly against current master (argparse → click, parse_cmdline/parse_file gone, in-memory render dict from wireviz#321, etc.) so I rebuilt the implementation from scratch against current master while preserving the load-bearing ideas: iTXt chunk for YAML embedding, Two simplifications I made:
Commit attribution links back to your PR. Thanks for the work — really appreciate it. |
There was a problem hiding this comment.
Code Review
This pull request implements a round-trip editing feature by embedding the source YAML into rendered PNG files using iTXt chunks. It introduces utility functions for metadata manipulation, updates the core parsing and output logic to support this embedding, and enhances the CLI to accept PNG files as input by extracting the embedded YAML. Review feedback highlights opportunities to make the new Pillow dependency optional, optimize the PNG re-encoding process to preserve existing metadata, and improve error handling when processing invalid image files.
| from PIL import Image as PILImage | ||
| from PIL.PngImagePlugin import PngInfo |
There was a problem hiding this comment.
Importing PIL at the top level makes it a mandatory dependency for all wireviz operations. If the intention is to keep Pillow as an optional dependency for users who do not need the YAML embedding feature, consider moving these imports inside the functions that use them (_embed_yaml_in_png and read_yaml_from_png) or wrapping them in a try...except block.
| def _embed_yaml_in_png(png_bytes: bytes, yaml_source: str) -> bytes: | ||
| """Re-encode PNG bytes with the YAML source stored in an iTXt chunk.""" | ||
| with PILImage.open(io.BytesIO(png_bytes)) as im: | ||
| im.load() | ||
| chunks = PngInfo() | ||
| chunks.add_itxt(PNG_YAML_CHUNK_KEY, yaml_source, zip=True) | ||
| out = io.BytesIO() | ||
| im.save(out, format="PNG", pnginfo=chunks) | ||
| return out.getvalue() |
There was a problem hiding this comment.
The current implementation of _embed_yaml_in_png decodes the PNG into raw pixels and then re-encodes it to add the metadata chunk. This is computationally expensive and can lead to the loss of original image metadata (such as DPI settings or color profiles) generated by Graphviz. If re-encoding is kept, consider passing **im.info to im.save() to preserve as much metadata as possible. Alternatively, manipulating the PNG byte stream directly to insert the iTXt chunk would be more efficient and non-destructive.
| # PNG input: try to recover the YAML embedded by an | ||
| # earlier WireViz render. Falls through with a clear | ||
| # error if no chunk is present. | ||
| embedded = read_yaml_from_png(file) |
There was a problem hiding this comment.
The call to read_yaml_from_png is not protected against invalid or corrupted image files. If the input file is not a valid image, PIL.Image.open will raise an UnidentifiedImageError. Wrapping this in a try...except block would allow for a cleaner error message to the user.
try:
embedded = read_yaml_from_png(file)
except Exception as e:
raise click.UsageError(f"Could not read YAML from {file}: {e}")…eam PR wireviz#234) Renders now embed the source YAML in PNG output as a zlib-compressed iTXt chunk under the key ``wireviz:yaml``. The CLI auto-detects ``.png`` inputs and pulls the YAML back out, so a single PNG file is enough to re-render or edit a harness — no sidecar .yml needed. The headline workflow: wireviz harness.yml # produces harness.png with yaml inside wireviz harness.png # round-trips: extract YAML, re-render This is the load-bearing capability for the upcoming wireviz-gui: drag a PNG into the editor and recover the source. Without it, every PNG in the wild is an opaque artifact divorced from its model. API surface: * wireviz.parse() gains ``embed_yaml: bool = True``. The default embeds; pass False to render plain PNGs without source-bearing metadata. * Harness.output() / _render() gain ``yaml_source: Optional[str]``. When non-None and PNG is in the requested formats, the rendered PNG bytes are post-processed through PIL to attach the iTXt chunk. * New module-level helpers in Harness.py: - PNG_YAML_CHUNK_KEY = "wireviz:yaml" - _embed_yaml_in_png(png_bytes, yaml_source) -> bytes - read_yaml_from_png(png_path) -> Optional[str] * CLI ``--no-embed-yaml`` flag opts out of embedding when desired (e.g. before sharing a diagram externally without source). Implementation notes: * The chunk uses ``iTXt`` (international text, zip-compressed) rather than ``zTXt`` so unicode YAML round-trips cleanly. Key prefix ``wireviz:`` namespaces the chunk against PNG software-defined keywords. * When parse() is called with a Dict input, we yaml.safe_dump it back for embedding — round-trip-readable, but without the original comments or formatting (those don't survive the dict-conversion step regardless of embedding). * build_examples.py opts out (``embed_yaml=False``) so the regression baseline PNGs stay deterministic. Adapted from wireviz#234 (originally by @jacobian91, targeting upstream ``dev``). The 2021-era PR was heavily bit-rotted — argparse, the old parse_cmdline / parse_file layer, conceal-input enum — only the load-bearing idea (zTXT/iTXt embed in PNG, .png input recovery) was preserved. Reworked against current master's click CLI, the in-memory render dict from PR wireviz#321 stdin/stdout, and threaded source_path / template_dir from earlier PRs in this chain. Verified: * round-trip: harness.yml → harness.png → re-extract → identical YAML * --no-embed-yaml produces a PNG without the chunk (verified via PIL) * ``wireviz harness.png`` on a chunk-less PNG raises a clean click.UsageError * build_examples.py runs cleanly; .gv and .bom.tsv byte-identical to baseline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d9c733f to
dee9e67
Compare
Summary
Ports upstream PR #234 — renders embed the source YAML in PNG output as a zlib-compressed iTXt chunk under the key `wireviz:yaml`. The CLI auto-detects `.png` inputs and pulls the YAML back out, so a single PNG file is enough to re-render or edit a harness — no sidecar `.yml` needed.
This is the load-bearing capability for the upcoming wireviz-gui: drag a PNG into the editor and recover the source. Without it, every PNG in the wild is an opaque artifact divorced from its model.
Workflow
```bash
wireviz harness.yml # produces harness.png with YAML inside
wireviz harness.png # round-trips: extract YAML, re-render
wireviz --no-embed-yaml harness.yml # plain PNG, no metadata
```
API
Implementation notes
Differences from upstream PR wireviz#234
The 2021-era PR was heavily bit-rotted — argparse, the old `parse_cmdline`/`parse_file` layer, conceal-input enum, etc. Only the load-bearing idea (zTXT/iTXt embed in PNG, `.png` input recovery) was preserved. Reworked against current master's click CLI, the in-memory render dict from PR wireviz#321 stdin/stdout, and integrated cleanly with `source_path` from earlier PRs in this chain.
The upstream's `ConcealEnum` (none/all/prepend/main) was reduced to a single `--no-embed-yaml` boolean — simpler API surface, same privacy escape hatch.
Verification
Test plan
🤖 Generated with Claude Code