feat(v1): outputs/ persistence — 생성된 spec/code 영속화 (P3 §69)#105
Merged
Conversation
이전: RunOutcome metric (jsonl) 만 영속. 실제 generated artifact (spec/code) 는 verbose flag 로만 console = 휘발성. 사용자 "생성한 문제 어디서 확인" 답. 신규 ipe/v1/persistence.py: - persist_run_outputs(state, output_dir) 함수 - 각 run 은 outputs/<run_id>/ 디렉토리에 5 file: - spec.json (ProblemSpec) - design.json (AlgorithmDesign) - attempt.py (SolutionAttempt.code, raw Python) - verification.json (sample_results/violations/feedback) - outcome.json (요약 metric + iteration_history) - PersistedPaths dataclass — caller convenience main_v1.py CLI flag 추가: - --output-dir <path> (default outputs/) - --no-output-dir (skip) 기존 jsonl (RunOutcome) 와 보완 — jsonl 은 batch metric, outputs/ 는 단일 run full atomic artifact. 검증: - ruff 0 / mypy 0 (41 src) - pytest non-e2e: 413 passed (+8 persistence tests) - smoke (real LLM): ipe.v1.main_v1 --algorithm sieve → outputs/<run_id>/ 5 files 모두 정상 생성. spec.json 안에 실제 한국어 problem description, attempt.py 안에 working Sieve 구현 확인. narrative: - 이전: "47/57, 91.2%" 수치 narrative 만 - 이후: outputs/<run_id>/ full artifact inspectable → portfolio / catalog browsing / future option C / 사후 RCA 분석 base 자세한 내용: CHANGES §69.
…nt opt-in §69 확장 — 사용자 의도 "실제 문제로 쓰려면 문제 지문/테케/정답" + "측정 단계 옵션 스위칭". persistence.py: - outputs/<run_id>/problem.md (사람 친화 markdown — title/본문/입출력/ 제약사항/샘플) — competitive programming / online judge 표준 - outputs/<run_id>/samples/<i>.in + <i>.out (1-indexed 분리) → diff <(python attempt.py < 1.in) 1.out 채점 가능 - PersistedPaths 에 problem_md + samples_dir 필드 추가 measurement runner 옵션화: - measurement/__main__.py --persist-outputs <dir> flag (default OFF) → 큰 N 측정 시 디스크 부담 회피, 팀원 demo 시 특정 run 만 활성 - run_n_measurements / baseline_5 / phase_2b / phase_2c 모두 optional persist_output_dir param 추가 (backward-compat) - main_v1 single run 은 default ON 유지 리팩토링: - run_baseline_5_measurements 도 _run_multi_algo 으로 통합 (이전 inline 중복 제거) 검증: - ruff 0 / mypy 0 (41 src) - pytest non-e2e: 416 passed (+3 new tests for problem.md/samples) - smoke single (default ON): outputs/demo-two-001/ → problem.md + samples/ 1~4.in/.out 모두 정상 생성 (Two Sum 한국어 markdown) - smoke measurement default OFF: outputs/ 생성 X ✅ - smoke measurement --persist-outputs: outputs/<dir>/<run_id>/ 생성 ✅ 활용 예시: \$ ipe.v1.main_v1 --algorithm sieve --run-id demo \$ cat outputs/demo/problem.md # 사람이 읽기 좋은 문제 \$ python outputs/demo/attempt.py < outputs/demo/samples/1.in # 채점 자세한 내용: CHANGES §69.6.
9230f28 to
999c5b4
Compare
3 tasks
LsMin124
added a commit
that referenced
this pull request
May 28, 2026
- meta version → 'v1.0 D안 (19 algo, 91.2% Gate PASS, anchor freeze)' - mainCommit 706f875, updated 2026-05-29, tests 405 → 418 - recentPrs: #86 (binary search) / #90 (sort) 제거, #104 (Option B) / #105 (outputs persistence) 추가 - completedFixes: D-P3-OptionB / D-P3-outputs 2 entries 추가 - README badges + 'v1 Phase 2c RCA3 final = v1.0 anchor', 'anchor freeze' narrative 추가 추가 측정 = diminishing returns (91.2% + variance only) → freeze 로 전환.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
v1 D안 의 생성된 artifact 영속화 — 이전까지는 RunOutcome metric (jsonl)
만 저장, 실제 spec/design/code/verification 은 verbose flag 로만 휘발성
console. 사용자의 "생성한 문제 어디서 확인" 질문의 직접적 답.
Implementation
신규 `ipe/v1/persistence.py`:
```
outputs/<run_id>/
├── spec.json # ProblemSpec
├── design.json # AlgorithmDesign
├── attempt.py # raw Python solution code
├── verification.json # VerificationResult detail
└── outcome.json # 요약 metric + iteration_history
```
CLI flag — `main_v1.py`:
기존 jsonl 와 보완 — jsonl 은 batch metric, outputs/ 는 단일 run full
atomic artifact.
Test plan
모두 정상 생성. 한국어 description / working Sieve 코드 확인.
Notes
후속 (P3):
자세한 분석: `CHANGES.md` §69.