# ILP Production Problem Examples

This notebook pulls ILP production (`ilp_prod`) tasks from prior runs in `src/exps_performance/results/` and shows example responses for each stage: natural language (nl), code execution (code), simulation (sim), and control simulation (controlsim).

In [3]:
import glob
import json
import textwrap
from pathlib import Path
from typing import Any, Dict, List

import pandas as pd

# Robustly locate the exps_performance root so glob finds results
NOTEBOOK_DIR = Path.cwd()
candidates = [
    NOTEBOOK_DIR,
    NOTEBOOK_DIR.parent,
    NOTEBOOK_DIR.parent / "exps_performance",
    Path("/nlpgpu/data/terry/ToolProj/src/exps_performance"),
]

ROOT = None
for cand in candidates:
    if (cand / "results").is_dir():
        ROOT = cand
        break

if ROOT is None:
    raise RuntimeError(f"Could not locate results directory; checked {candidates}")

RESULT_GLOB = ROOT / "results" / "*" / "tb" / "*" / "res.jsonl"

print("Using ROOT", ROOT)
print("Searching files matching", RESULT_GLOB)

Using ROOT /mnt/nlpgpu-io1/data/terry/ToolProj/src/exps_performance
Searching files matching /mnt/nlpgpu-io1/data/terry/ToolProj/src/exps_performance/results/*/tb/*/res.jsonl


In [4]:
def load_ilp_prod(limit: int | None = None) -> List[Dict[str, Any]]:
    """Load all ilp_prod records from results JSONL files."""
    files = glob.glob(str(RESULT_GLOB))
    records: List[Dict[str, Any]] = []
    for fp in files:
        try:
            with open(fp) as f:
                for line in f:
                    if not line.strip():
                        continue
                    rec = json.loads(line)
                    if rec.get("kind") != "ilp_prod":
                        continue
                    records.append(rec)
                    if limit and len(records) >= limit:
                        return records
        except FileNotFoundError:
            continue
    return records


records = load_ilp_prod()
print(f"Loaded {len(records)} ilp_prod records from {len(glob.glob(str(RESULT_GLOB)))} files")

if not records:
    raise RuntimeError("No ilp_prod records found; check that results are present.")

Loaded 432 ilp_prod records from 14 files


## Dataset overview

In [5]:
df = pd.DataFrame(records)

summary = df.groupby(["model", "seed"]).size().reset_index(name="count").sort_values("count", ascending=False)

print("Records by model/seed:")
display(summary.head())

print("\nSample question text:")
print(textwrap.shorten(str(df.iloc[0]["question"]), width=200))

Records by model/seed:


Unnamed: 0,model,seed,count
8,mistralai/ministral-14b-2512,2,54
1,anthropic/claude-haiku-4.5,1,51
11,qwen/qwen-2.5-coder-32b-instruct,1,48
4,meta-llama/llama-3.1-405b-instruct,0,33
10,qwen/qwen-2.5-coder-32b-instruct,0,33



Sample question text:
Production planning: Choose integer quantities x_j ≥ 0 to maximize total profit sum_j profit[j]*x_j, subject to resource constraints sum_j consumption[i][j]*x_j ≤ capacity[i]. Return the max [...]


## Pick one example per response type

We choose the first record that contains data for each stage (nl, code, sim, controlsim). If a single record does not have all stages, we pull the earliest record that has content for each stage independently.

In [6]:
def has_content(rec: Dict[str, Any], prefix: str) -> bool:
    return bool(rec.get(f"{prefix}_answer") or rec.get(f"{prefix}_reasoning") or rec.get(f"{prefix}_question"))


wanted_prefixes = ["nl", "code", "sim", "controlsim"]
examples: Dict[str, Dict[str, Any]] = {}

for prefix in wanted_prefixes:
    for rec in records:
        if has_content(rec, prefix):
            examples[prefix] = rec
            break

missing = [p for p in wanted_prefixes if p not in examples]
if missing:
    print("Missing examples for:", missing)
else:
    print("Found examples for all prefixes")

Found examples for all prefixes


In [7]:
def print_block(title: str, text: str) -> None:
    print(f"\n=== {title} ===")
    print(textwrap.indent(textwrap.fill(text, width=110), prefix="  "))


def show_example(prefix: str, rec: Dict[str, Any]) -> None:
    print(f"\n### {prefix.upper()} example")
    print(f"model={rec.get('model')} seed={rec.get('seed')} digit={rec.get('digit')}")
    print_block("Problem", rec.get("question", ""))
    print_block("Gold answer", str(rec.get("answer", "")))

    q = rec.get(f"{prefix}_question", "")
    a = rec.get(f"{prefix}_answer", "")
    r = rec.get(f"{prefix}_reasoning", "")
    errs = []
    if rec.get(f"{prefix}_parse_err"):
        errs.append(str(rec.get(f"{prefix}_parse_err")))
    if rec.get(f"{prefix}_err_msg"):
        errs.append(str(rec.get(f"{prefix}_err_msg")))
    correct = rec.get(f"{prefix}_correct")

    if q:
        print_block("Prompt", q)
    if r:
        print_block("Reasoning/Code", r)
    # sim_code contains generated code for the code stage
    if prefix == "code" and rec.get("sim_code"):
        print_block("Generated code (sim_code)", rec.get("sim_code", ""))
    if a:
        print_block("Answer", str(a))
    print_block("Correct?", str(correct))
    if errs:
        print_block("Errors", "; ".join(errs))


for pfx, rec in examples.items():
    show_example(pfx, rec)


### NL example
model=qwen/qwen-2.5-coder-32b-instruct seed=1 digit=2

=== Problem ===
  Production planning: Choose integer quantities x_j ≥ 0 to maximize total profit sum_j profit[j]*x_j, subject
  to resource constraints sum_j consumption[i][j]*x_j ≤ capacity[i]. Return the max profit. profit = [6, 20]
  consumption (rows=resources) = [[29, 27], [26, 4]] capacity = [10, 5] upper_bounds = [3, 3]

=== Gold answer ===
  0

=== Prompt ===
  Description: You are going to be given a set of algorithmic problem.Question: Solve the following algorithmic
  problem:   Production planning: Choose integer quantities x_j ≥ 0 to maximize total profit sum_j
  profit[j]*x_j, subject to resource constraints sum_j consumption[i][j]*x_j ≤ capacity[i]. Return the max
  profit. profit = [6, 20] consumption (rows=resources) = [[29, 27], [26, 4]] capacity = [10, 5] upper_bounds =
  [3, 3]YOU ARE NEVER ALLOWED TO USE CODE.FOLLOW THE FORMAT CAREFULLY. Here are the format instructions: The
  output should be 