Skip to content

Fix #441: Add MinimumExternalMacroDataCompression model#819

Merged
isPANN merged 6 commits intomainfrom
worktree-agent-a35a8b1b
Mar 29, 2026
Merged

Fix #441: Add MinimumExternalMacroDataCompression model#819
isPANN merged 6 commits intomainfrom
worktree-agent-a35a8b1b

Conversation

@isPANN
Copy link
Copy Markdown
Collaborator

@isPANN isPANN commented Mar 28, 2026

Summary

  • Implement MinimumExternalMacroDataCompression problem model (Garey & Johnson SR22)
  • Optimization problem with Value = Min<usize>, minimizing compression cost |D| + |C| + (h-1) * pointer count
  • Pointer-free dictionary (D) variant as specified in the issue
  • CLI support via --string, --pointer-cost flags
  • 13 unit tests covering all code paths
  • Canonical example in example-db
  • Paper entry with problem definition, visualization, and bibliography

Fixes #441

Test plan

  • All 13 model-specific unit tests pass
  • Full workspace test suite passes (3523+ tests)
  • Clippy clean
  • Format check passes
  • Example-db tests pass
  • Paper compiles successfully

🤖 Generated with Claude Code

Implement the Minimum External Macro Data Compression problem (Garey &
Johnson SR22). Given an alphabet, string, and pointer cost, find a
dictionary D and compressed string C minimizing total compression cost.

- Model with Min<usize> value type, pointer-free D restriction
- CLI create support with --string, --pointer-cost flags
- 13 unit tests covering creation, evaluation, brute force, serialization
- Canonical example in example-db
- Paper entry with problem definition and visualization
- Bibliography entries for Storer (1977), Storer & Szymanski (1982),
  Charikar et al. (2005)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 28, 2026

Codecov Report

❌ Patch coverage is 98.47036% with 8 lines in your changes missing coverage. Please review.
✅ Project coverage is 97.89%. Comparing base (81d92e5) to head (e338e36).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ls/misc/minimum_external_macro_data_compression.rs 96.61% 4 Missing ⚠️
...c/rules/minimumexternalmacrodatacompression_ilp.rs 98.28% 4 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff            @@
##             main     #819    +/-   ##
========================================
  Coverage   97.88%   97.89%            
========================================
  Files         647      651     +4     
  Lines       70940    71463   +523     
========================================
+ Hits        69443    69959   +516     
- Misses       1497     1504     +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@GiggleLiu
Copy link
Copy Markdown
Contributor

GiggleLiu commented Mar 28, 2026

Agentic Review Report

Structural Check

Structural Review: model MinimumExternalMacroDataCompression

Structural Completeness

# Check Status
1 Model file exists PASS — src/models/misc/minimum_external_macro_data_compression.rs is present.
2 inventory::submit! present PASS — src/models/misc/minimum_external_macro_data_compression.rs:22.
3 Serialize / Deserialize derive PASS — src/models/misc/minimum_external_macro_data_compression.rs:70.
4 Problem trait impl PASS — src/models/misc/minimum_external_macro_data_compression.rs:154.
5 Aggregate value present PASS — type Value = Min<usize> at src/models/misc/minimum_external_macro_data_compression.rs:156.
6 #[cfg(test)] + #[path = ...] link PASS — src/models/misc/minimum_external_macro_data_compression.rs:323.
7 Test file exists PASS — src/unit_tests/models/misc/minimum_external_macro_data_compression.rs.
8 Test file has >= 3 tests PASS — 13 tests in src/unit_tests/models/misc/minimum_external_macro_data_compression.rs.
9 Registered in misc/mod.rs PASS — module + export at src/models/misc/mod.rs:79 and src/models/misc/mod.rs:120.
10 Re-exported in models/mod.rs PASS — src/models/mod.rs:42.
11 Variant registration exists PASS — declare_variants! at src/models/misc/minimum_external_macro_data_compression.rs:251.
12 CLI resolve_alias support PASS — no special-case entry is needed; canonical names are resolved from the registry and this model is registered through inventory::submit!.
13 CLI create support PASS — problemreductions-cli/src/commands/create.rs:2963.
14 Canonical model example registered PASS — src/models/misc/mod.rs:184 extends minimum_external_macro_data_compression::canonical_model_example_specs().
15 Paper display-name entry PASS — docs/paper/reductions.typ:191.
16 Paper problem-def block PASS — docs/paper/reductions.typ:4853.
17 Blacklisted autogenerated files excluded PASS — none of the prohibited generated files appear in the diff.

Build Status

  • make test: PASS
  • make clippy: PASS

Semantic Review

  • evaluate() correctness: OK — malformed configs return Min(None), pointer references are checked against the realized D prefix, and valid literal/pointer mixes decode to the source string before cost evaluation.
  • dims() correctness: OK — the chosen pointer-free encoding uses 2 * |s| slots with alphabet_size + 1 for D and alphabet_size + 1 + |s|(|s| + 1)/2 for C, matching the implementation.
  • Size getter consistency: OK — string_length(), alphabet_size(), and pointer_cost() match the declared complexity metadata.
  • Weight handling: OK / N/A — this model does not introduce weighted-graph machinery.

Issue Compliance (if linked issue found)

# Check Status
1 Problem name matches issue OK
2 Mathematical definition matches OK — implemented as the pointer-free D variant described in issue #441.
3 Problem framing matches OK — optimization model with Value = Min<usize> and no bound field.
4 Type parameters match OK — no graph/weight variants, as requested.
5 Configuration space matches OK — `2 *
6 Feasibility check matches OK — only configs decoding exactly to s are feasible.
7 Objective function matches OK — cost is `
8 Complexity matches OK — brute-force complexity metadata matches the encoded search space.

Summary

  • 17/17 structural checks passed
  • 8/8 issue compliance checks passed
  • The review-implementation packet reported a missing declare_variants item, but manual inspection confirms declare_variants! is present at src/models/misc/minimum_external_macro_data_compression.rs:251.

Quality Check

Quality Review

Design Principles

  • DRY: OK — no meaningful duplication introduced beyond standard create/help wiring.
  • KISS: ISSUE — canonical_model_example_specs() contains a long scratchpad of abandoned example-search reasoning before returning a single spec, which is maintenance noise rather than stable implementation documentation. src/models/misc/minimum_external_macro_data_compression.rs:257
  • HC/LC: ISSUE — the same function mixes canonical example data with exploratory design notes, making the example-db path harder to audit than the actual model logic. src/models/misc/minimum_external_macro_data_compression.rs:257

HCI (CLI changed)

  • Error messages: OK — the new create validation paths are actionable and include concrete usage strings.
  • Discoverability: ISSUE — the paper example instructs users to run pred solve min-emdc.json, but that command fails for this model unless --solver brute-force is supplied. docs/paper/reductions.typ:4860
  • Consistency: OK — flag names and create UX match existing CLI conventions.
  • Least surprise: ISSUE — the shipped canonical example and paper figure never exercise pointer-based compression; they only show the degenerate uncompressed optimum, which weakens discoverability for the feature’s core behavior. src/models/misc/minimum_external_macro_data_compression.rs:315, docs/paper/reductions.typ:4879
  • Feedback: OK — the failing pred solve path does at least suggest --solver brute-force.

Test Quality

  • Naive test detection: ISSUE
    • The added coverage is model-local unit testing only; there is no CLI/integration test for the documented pred create --example + pred solve path, which is why the broken paper command escaped review. src/unit_tests/models/misc/minimum_external_macro_data_compression.rs:1

Issues

Critical (Must Fix)

  • The documented paper command pred solve min-emdc.json fails for the new canonical example because MinimumExternalMacroDataCompression has no default reduction path to ILP. Reproduced in the review worktree with target/debug/pred solve /tmp/min-emdc.json; the working command is target/debug/pred solve /tmp/min-emdc.json --solver brute-force. docs/paper/reductions.typ:4860

Important (Should Fix)

  • The canonical example and paper figure do not demonstrate compression or pointer use; they only show the no-compression optimum, which undercuts the model’s main behavior and is weaker than the worked example described in issue [Model] MinimumExternalMacroDataCompression #441. src/models/misc/minimum_external_macro_data_compression.rs:315, docs/paper/reductions.typ:4879
  • The canonical example helper includes a large exploratory scratchpad in production code instead of a concise justification for the chosen example. src/models/misc/minimum_external_macro_data_compression.rs:257

Minor (Nice to Have)

  • None.

Summary

  • Critical: broken paper/example solve command for the shipped canonical example.
  • Important: canonical example/paper figure do not showcase pointer compression.
  • Important: example helper contains excessive exploratory commentary.
  • Important: no CLI regression test covers the documented example flow.

Agentic Feature Tests

Feature Test Report: problem-reductions

Date: 2026-03-29
Project type: Rust library + CLI
Features tested: MinimumExternalMacroDataCompression
Profile: ephemeral
Use Case: Validate that a downstream CLI user can discover the new model, create its canonical example, and solve/evaluate it from the documented commands.
Expected Outcome: The model appears in the catalog, pred show describes it, pred create --example succeeds, and the documented solve flow works on the created example.
Verdict: fail
Critical Issues: 1

Summary

Feature Discoverable Setup Works Expected Outcome Met Doc Quality
MinimumExternalMacroDataCompression yes yes partial no outdated solve command

Per-Feature Details

MinimumExternalMacroDataCompression

  • Role: downstream CLI user validating the newly added model from docs/examples alone.
  • Use Case: inspect the model, create its canonical example, and solve/evaluate it from the CLI.
  • What they tried: pred list, pred show MinimumExternalMacroDataCompression, pred create --example MinimumExternalMacroDataCompression -o /tmp/min-emdc.json, pred solve /tmp/min-emdc.json, pred solve /tmp/min-emdc.json --solver brute-force, and pred evaluate /tmp/min-emdc.json --config 2,2,0,1.
  • Discoverability: Good — the model appears in pred list, pred show works, and pred create --help exposes the example entry.
  • Setup: Good — the built target/debug/pred binary was sufficient; no extra setup was required.
  • Functionality: Partial — list/show/create/evaluate work, and brute-force solve works; the default pred solve flow fails.
  • Expected vs Actual Outcome: Expected the paper/example command pred solve min-emdc.json to succeed. Actual behavior: it fails with No reduction path from MinimumExternalMacroDataCompression to ILP, then succeeds only after adding --solver brute-force.
  • Blocked steps: None.
  • Friction points: The canonical example does not demonstrate pointers/compression, and the documented solve command is wrong for the example that the docs tell users to generate.
  • Doc suggestions: Change the paper command to pred solve min-emdc.json --solver brute-force or add a default/direct solve path for this model; consider using a separate paper example that visibly uses pointers.

Expected vs Actual Outcome

  • Discovery and example creation matched expectations.
  • The documented solve step did not: the exact command shown in the paper failed and required a manual solver override.
  • The created example was technically valid but did not showcase the core compression behavior a user would expect from this model.

Issues Found

  • Confirmed / critical: pred solve /tmp/min-emdc.json fails with No reduction path from MinimumExternalMacroDataCompression to ILP. Reproduced directly in the PR worktree. Recommended fix: update docs/example commands to --solver brute-force or wire a default solve path.
  • Confirmed / important: The canonical example and paper figure never exercise pointer compression, so the user path succeeds only on a degenerate no-compression instance. Recommended fix: decouple the paper walkthrough from the tiny brute-force example-db fixture.

Suggestions

  • Update docs/paper/reductions.typ so the advertised solve command matches the actual supported solver path.
  • Add a CLI or integration regression test covering pred create --example MinimumExternalMacroDataCompression followed by pred solve.
  • Replace the long exploratory comment block in canonical_model_example_specs() with a short rationale for the chosen fixture.

Generated by review-pipeline

@isPANN
Copy link
Copy Markdown
Collaborator Author

isPANN commented Mar 29, 2026

On Hold: Canonical example (s="ab", h=2) only demonstrates the uncompressed optimum — it doesn't showcase the core compression behavior (pointers into dictionary). Need to find a better example where compression actually helps, either by:

  1. Finding a small enough instance where compression beats uncompressed and brute force is still feasible, or
  2. Adding a direct ILP formulation for this problem so larger examples can be solved without brute force.

Also investigate: can this problem be solved by ILP? If yes, implement the direct ILP rule alongside the model.

isPANN and others added 3 commits March 29, 2026 21:35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nonical example

- Implement ILP reduction using segment-flow formulation with dictionary
  one-hot, contiguity, flow conservation, and pointer matching constraints
- Update canonical example to issue #441's compression-demonstrating instance
  (s="abcdefabcdefabcdef", h=2, optimal=12 via dictionary+pointers)
- Update paper figure to show compression with D="abcdef" and 3 pointers
- 7 ILP reduction tests + 13 model tests all pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Segment-flow ILP formulation with dictionary one-hot, contiguity,
flow conservation, and pointer matching constraints.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@zazabap zazabap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Resolve conflicts: keep both MinimumExternalMacroDataCompression (PR)
and all main additions (IntExpr, IntegerExpressionMembership,
KthLargestMTuple, RegisterSufficiency, SchedulingToMinimizeWeightedCompletionTime,
FeasibleBasisExtension, QuadraticDiophantineEquations) in imports,
specs, CLI args, and references.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@isPANN isPANN merged commit a70b132 into main Mar 29, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Model] MinimumExternalMacroDataCompression

3 participants