feat: introduce codeflash_core engine with Rich console UI by KRRT7 · Pull Request #1886 · codeflash-ai/codeflash

KRRT7 · 2026-03-24T08:28:39Z

Summary

Adds src/codeflash_core/ — the language-agnostic optimization engine from codeflash-next-gen
Includes protocols, strategy, config, models, AI client, telemetry, ranking, verification, and diff modules
Adds Rich-based console UI (codeflash_core.ui) with spinners, progress bars, panels, and syntax-highlighted code display
Wires UI into the optimizer and strategy pipeline: discovery, indexing, context extraction, test/candidate generation, baseline runs, candidate evaluation, ranking, and result display
Registers codeflash_core in pyproject.toml with attrs and requests as new dependencies

Test plan

Verify import codeflash_core works
Verify from codeflash_core.ui import console, progress_bar works
Run existing tests to confirm no regressions in the codeflash package

Bring the language-agnostic optimization engine from codeflash-next-gen. Includes protocols, strategy, config, models, AI client, telemetry, ranking, verification, and diff modules. TUI artifacts stripped.

Add attrs and requests to dependencies. Include src/codeflash_core in the sdist build target.

Spinner progress bars, paneled text, syntax-highlighted code display, and RichHandler logging. Matches original codeflash patterns without LSP/subagent concerns.

Spinners on discovery, indexing, context extraction, test/candidate generation, baseline runs, and benchmarking. Per-function rule headers, per-candidate progress logging, diff display on success, and a paneled summary at the end.

claude · 2026-03-24T08:29:12Z

Claude finished @KRRT7's task in 32s —— View job

PR Review Summary

Triage PR size — LARGE (6002 additions, 25 new production files)
Run lint and type checks — 1 ruff error fixed, 121 mypy errors remain (complex type-system issues)
Resolve stale review threads — none found
Code review
Duplicate detection
Test coverage analysis
Post final summary

Prek Checks

Auto-fixed (committed): src/codeflash_core/models.py — moved from pathlib import Path into TYPE_CHECKING block (TC003).

Remaining mypy errors (121 across 5 files — require logic changes, not auto-fixable):

src/codeflash_core/danom/utils.py — Covariant TypeVars (T_co, U_co) used as function parameters, missing type parameters for generic Composable/Filterable. Covariant type variables are not valid as input parameter types; these should be invariant TypeVars.
src/codeflash_core/danom/new_type.py — attrs.define(frozen=frozen) where frozen is a runtime bool violates mypy's [literal-required] constraint for attrs. The frozen parameter must be a literal True/False.
src/codeflash_core/danom/stream.py — Incompatible type assignments and non-overlapping equality checks (str vs int for _MAP/_FILTER/_TAP sentinel comparisons).

Fix danom type errors →

Code Review

Bug: `replace_function` outside `try/finally` — file state not restored on exception

src/codeflash_core/strategy_evaluation.py:85

runtime.plugin.replace_function(function.file_path, function, candidate.code)  # line 85 — OUTSIDE try

try:
    test_results = runtime.plugin.run_tests(...)
    ...
finally:
    for path, content in file_snapshots.items():
        path.write_text(content, encoding="utf-8")  # only runs if line 85 succeeded

If replace_function() raises (e.g., file locked, AST parse error mid-write), the finally block never executes and the target file is left in a corrupted or unknown state for all subsequent candidates in the loop.

Fix this →

Bug: `config.effort` mutation not protected by `try/finally`

src/codeflash_core/optimizer.py:99-105

original_effort = self.config.effort
if i < HIGH_EFFORT_TOP_N and self.config.effort == EffortLevel.MEDIUM.value:
    self.config.effort = EffortLevel.HIGH.value
result = self.optimize_function(function)  # if this raises (not KeyboardInterrupt)
self.config.effort = original_effort       # this line is skipped

If optimize_function() raises any exception other than KeyboardInterrupt, self.config.effort remains "high" for all remaining functions in the loop. The outer try only catches KeyboardInterrupt.

Fix this →

Edge case: `outcomes_match` allows None vs non-None outputs to pass as equivalent

src/codeflash_core/verification.py:43-45

if baseline.output is not None and candidate.output is not None:
    return compare_outputs(baseline.output, candidate.output, comparator=comparator)
return True  # ← if exactly one is None, outputs are NOT compared

If the baseline captured an output but the candidate returned None (or vice versa), this returns True and accepts the candidate as behaviorally equivalent. This could let a broken optimization through.

Fix this →

Design: `Err._extract_details` leaks frame locals into memory

src/codeflash_core/danom/result.py:113

"locals": frame.f_locals,

frame.f_locals captures the entire local scope at each stack frame, which can include large objects, open file handles, or sensitive data. This is stored on every Err instance forever. Consider either omitting locals or converting them to repr strings immediately.

Design: `pyproject.toml` is Python-only; `codeflash_core` config stored there

pyproject.toml:107 — include = ["codeflash", "src/codeflash_core"]. Per the project architecture rules, pyproject.toml is Python-exclusive config. If codeflash_core is intended to be a standalone package (it has its own cli.py with a cfnext entrypoint), it should have its own pyproject.toml under src/. Currently there is no separate [project] section, no [project.scripts] entry for cfnext, and no separate package metadata for codeflash_core.

Duplicate Detection

Finding	Confidence	Location	Duplicates
`FunctionToOptimize`, `HelperFunction`, `CodeContext`	HIGH	`src/codeflash_core/models.py`	`codeflash/models/function_types.py:26`, `codeflash/languages/base.py:54,79`
`unified_diff`	HIGH	`src/codeflash_core/diff.py`	`codeflash/code_utils/code_utils.py:70` (`unified_diff_strings`) — same logic, different signature
`Result`/`Ok`/`Err` monad	MEDIUM	`src/codeflash_core/danom/result.py`	`codeflash/either.py` — different design but same functional purpose
`init_sentry`, `PostHogClient`	MEDIUM	`src/codeflash_core/telemetry/`	`codeflash/telemetry/sentry.py`, `codeflash/telemetry/posthog.py`

These duplicates are likely intentional — codeflash_core is a new standalone engine that will eventually replace the codeflash package internals. This is fine architecturally, but worth noting that until the migration is complete there are two parallel implementations to keep in sync.

Test Coverage

The src/codeflash_core/ module (~2000 lines of new production logic across 25 files) has zero test coverage. There are no tests at all for:

optimizer.py — full orchestration loop
strategy.py / strategy_evaluation.py — candidate evaluation and multi-round repair
verification.py — correctness checking (especially the outcomes_match edge case above)
ranking.py, config.py, diff.py
danom/ — the Result monad and functional utilities

At minimum, verification.py and ranking.py are pure functions that should have unit tests, and the outcomes_match edge case flagged above would have been caught by a test.

Last updated: 2026-03-24T09:40Z
| Branch: codeflash-core

…eachable code - Remove k8bot strategy (nonexistent module) - Guard PythonPlugin import with try/except ImportError - Move logging.basicConfig into setup_logging() to avoid import-time side effects - Rename underscore-prefixed names (_spinners, _DummyTask, etc.) to public - Fix _pending_code_markdown → pending_code_markdown attribute access - Simplify unreachable isinstance branch in review_and_repair_tests - Export setup_logging from ui package

- Remove dead __post_init__ from FunctionToOptimize (file_path is typed as Path; str coercion was unreachable per mypy) - Move TaskID import to runtime so DummyTask.id can be properly typed - Remove stale type: ignore[union-attr] from strategy_evaluation Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>

…command passthrough - Wrap Future.result() in try/except in generate_tests_and_candidates to prevent plugin errors from aborting the entire optimization session - Rename danom/_*.py files to remove leading underscores per project conventions - Pass pytest_cmd through to TestConfig.test_command in resolve_test_config

Fixes TC003 ruff lint error. Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>

KRRT7 added 5 commits March 24, 2026 02:01

Add codeflash_core package under src/ layout

08dc1f5

Bring the language-agnostic optimization engine from codeflash-next-gen. Includes protocols, strategy, config, models, AI client, telemetry, ranking, verification, and diff modules. TUI artifacts stripped.

Register codeflash_core in pyproject.toml and add missing deps

be90440

Add attrs and requests to dependencies. Include src/codeflash_core in the sdist build target.

Add Rich-based console UI to codeflash_core

614beee

Spinner progress bars, paneled text, syntax-highlighted code display, and RichHandler logging. Matches original codeflash patterns without LSP/subagent concerns.

Update uv.lock

1fb1fb4

KRRT7 changed the title ~~Add codeflash_core package with Rich UI~~ feat: introduce codeflash_core engine with Rich console UI Mar 24, 2026

KRRT7 and others added 4 commits March 24, 2026 03:30

Fix ruff lint errors in codeflash_core

68a9bed

Apply ruff format to codeflash_core

f95bd5e

KRRT7 mentioned this pull request Mar 24, 2026

feat: add PythonPlugin with codeflash_core/codeflash_python packages #1887

Closed

3 tasks

KRRT7 and others added 2 commits March 24, 2026 04:22

style: move pathlib.Path to TYPE_CHECKING block in models.py

347316f

Fixes TC003 ruff lint error. Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>

KRRT7 merged commit 6f6c039 into main Mar 24, 2026
25 of 27 checks passed

KRRT7 deleted the codeflash-core branch March 24, 2026 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: introduce codeflash_core engine with Rich console UI#1886

feat: introduce codeflash_core engine with Rich console UI#1886
KRRT7 merged 11 commits intomainfrom
codeflash-core

KRRT7 commented Mar 24, 2026

Uh oh!

claude bot commented Mar 24, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KRRT7 commented Mar 24, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Summary

Prek Checks

Code Review

Bug: replace_function outside try/finally — file state not restored on exception

Bug: config.effort mutation not protected by try/finally

Edge case: outcomes_match allows None vs non-None outputs to pass as equivalent

Design: Err._extract_details leaks frame locals into memory

Design: pyproject.toml is Python-only; codeflash_core config stored there

Duplicate Detection

Test Coverage

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Mar 24, 2026 •

edited

Loading

Bug: `replace_function` outside `try/finally` — file state not restored on exception

Bug: `config.effort` mutation not protected by `try/finally`

Edge case: `outcomes_match` allows None vs non-None outputs to pass as equivalent

Design: `Err._extract_details` leaks frame locals into memory

Design: `pyproject.toml` is Python-only; `codeflash_core` config stored there