Skip to content

feat: introduce codeflash_core engine with Rich console UI#1886

Merged
KRRT7 merged 11 commits intomainfrom
codeflash-core
Mar 24, 2026
Merged

feat: introduce codeflash_core engine with Rich console UI#1886
KRRT7 merged 11 commits intomainfrom
codeflash-core

Conversation

@KRRT7
Copy link
Copy Markdown
Collaborator

@KRRT7 KRRT7 commented Mar 24, 2026

Summary

  • Adds src/codeflash_core/ — the language-agnostic optimization engine from codeflash-next-gen
  • Includes protocols, strategy, config, models, AI client, telemetry, ranking, verification, and diff modules
  • Adds Rich-based console UI (codeflash_core.ui) with spinners, progress bars, panels, and syntax-highlighted code display
  • Wires UI into the optimizer and strategy pipeline: discovery, indexing, context extraction, test/candidate generation, baseline runs, candidate evaluation, ranking, and result display
  • Registers codeflash_core in pyproject.toml with attrs and requests as new dependencies

Test plan

  • Verify import codeflash_core works
  • Verify from codeflash_core.ui import console, progress_bar works
  • Run existing tests to confirm no regressions in the codeflash package

KRRT7 added 5 commits March 24, 2026 02:01
Bring the language-agnostic optimization engine from codeflash-next-gen.
Includes protocols, strategy, config, models, AI client, telemetry,
ranking, verification, and diff modules. TUI artifacts stripped.
Add attrs and requests to dependencies. Include src/codeflash_core
in the sdist build target.
Spinner progress bars, paneled text, syntax-highlighted code display,
and RichHandler logging. Matches original codeflash patterns without
LSP/subagent concerns.
Spinners on discovery, indexing, context extraction, test/candidate
generation, baseline runs, and benchmarking. Per-function rule headers,
per-candidate progress logging, diff display on success, and a paneled
summary at the end.
@KRRT7 KRRT7 changed the title Add codeflash_core package with Rich UI feat: introduce codeflash_core engine with Rich console UI Mar 24, 2026
@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 24, 2026

Claude finished @KRRT7's task in 32s —— View job


PR Review Summary

  • Triage PR size — LARGE (6002 additions, 25 new production files)
  • Run lint and type checks — 1 ruff error fixed, 121 mypy errors remain (complex type-system issues)
  • Resolve stale review threads — none found
  • Code review
  • Duplicate detection
  • Test coverage analysis
  • Post final summary

Prek Checks

Auto-fixed (committed): src/codeflash_core/models.py — moved from pathlib import Path into TYPE_CHECKING block (TC003).

Remaining mypy errors (121 across 5 files — require logic changes, not auto-fixable):

  • src/codeflash_core/danom/utils.py — Covariant TypeVars (T_co, U_co) used as function parameters, missing type parameters for generic Composable/Filterable. Covariant type variables are not valid as input parameter types; these should be invariant TypeVars.
  • src/codeflash_core/danom/new_type.pyattrs.define(frozen=frozen) where frozen is a runtime bool violates mypy's [literal-required] constraint for attrs. The frozen parameter must be a literal True/False.
  • src/codeflash_core/danom/stream.py — Incompatible type assignments and non-overlapping equality checks (str vs int for _MAP/_FILTER/_TAP sentinel comparisons).

Fix danom type errors →


Code Review

Bug: replace_function outside try/finally — file state not restored on exception

src/codeflash_core/strategy_evaluation.py:85

runtime.plugin.replace_function(function.file_path, function, candidate.code)  # line 85 — OUTSIDE try

try:
    test_results = runtime.plugin.run_tests(...)
    ...
finally:
    for path, content in file_snapshots.items():
        path.write_text(content, encoding="utf-8")  # only runs if line 85 succeeded

If replace_function() raises (e.g., file locked, AST parse error mid-write), the finally block never executes and the target file is left in a corrupted or unknown state for all subsequent candidates in the loop.

Fix this →

Bug: config.effort mutation not protected by try/finally

src/codeflash_core/optimizer.py:99-105

original_effort = self.config.effort
if i < HIGH_EFFORT_TOP_N and self.config.effort == EffortLevel.MEDIUM.value:
    self.config.effort = EffortLevel.HIGH.value
result = self.optimize_function(function)  # if this raises (not KeyboardInterrupt)
self.config.effort = original_effort       # this line is skipped

If optimize_function() raises any exception other than KeyboardInterrupt, self.config.effort remains "high" for all remaining functions in the loop. The outer try only catches KeyboardInterrupt.

Fix this →

Edge case: outcomes_match allows None vs non-None outputs to pass as equivalent

src/codeflash_core/verification.py:43-45

if baseline.output is not None and candidate.output is not None:
    return compare_outputs(baseline.output, candidate.output, comparator=comparator)
return True  # ← if exactly one is None, outputs are NOT compared

If the baseline captured an output but the candidate returned None (or vice versa), this returns True and accepts the candidate as behaviorally equivalent. This could let a broken optimization through.

Fix this →

Design: Err._extract_details leaks frame locals into memory

src/codeflash_core/danom/result.py:113

"locals": frame.f_locals,

frame.f_locals captures the entire local scope at each stack frame, which can include large objects, open file handles, or sensitive data. This is stored on every Err instance forever. Consider either omitting locals or converting them to repr strings immediately.

Design: pyproject.toml is Python-only; codeflash_core config stored there

pyproject.toml:107include = ["codeflash", "src/codeflash_core"]. Per the project architecture rules, pyproject.toml is Python-exclusive config. If codeflash_core is intended to be a standalone package (it has its own cli.py with a cfnext entrypoint), it should have its own pyproject.toml under src/. Currently there is no separate [project] section, no [project.scripts] entry for cfnext, and no separate package metadata for codeflash_core.


Duplicate Detection

Finding Confidence Location Duplicates
FunctionToOptimize, HelperFunction, CodeContext HIGH src/codeflash_core/models.py codeflash/models/function_types.py:26, codeflash/languages/base.py:54,79
unified_diff HIGH src/codeflash_core/diff.py codeflash/code_utils/code_utils.py:70 (unified_diff_strings) — same logic, different signature
Result/Ok/Err monad MEDIUM src/codeflash_core/danom/result.py codeflash/either.py — different design but same functional purpose
init_sentry, PostHogClient MEDIUM src/codeflash_core/telemetry/ codeflash/telemetry/sentry.py, codeflash/telemetry/posthog.py

These duplicates are likely intentionalcodeflash_core is a new standalone engine that will eventually replace the codeflash package internals. This is fine architecturally, but worth noting that until the migration is complete there are two parallel implementations to keep in sync.


Test Coverage

The src/codeflash_core/ module (~2000 lines of new production logic across 25 files) has zero test coverage. There are no tests at all for:

  • optimizer.py — full orchestration loop
  • strategy.py / strategy_evaluation.py — candidate evaluation and multi-round repair
  • verification.py — correctness checking (especially the outcomes_match edge case above)
  • ranking.py, config.py, diff.py
  • danom/ — the Result monad and functional utilities

At minimum, verification.py and ranking.py are pure functions that should have unit tests, and the outcomes_match edge case flagged above would have been caught by a test.


Last updated: 2026-03-24T09:40Z
| Branch: codeflash-core

KRRT7 and others added 4 commits March 24, 2026 03:30
…eachable code

- Remove k8bot strategy (nonexistent module)
- Guard PythonPlugin import with try/except ImportError
- Move logging.basicConfig into setup_logging() to avoid import-time side effects
- Rename underscore-prefixed names (_spinners, _DummyTask, etc.) to public
- Fix _pending_code_markdown → pending_code_markdown attribute access
- Simplify unreachable isinstance branch in review_and_repair_tests
- Export setup_logging from ui package
- Remove dead __post_init__ from FunctionToOptimize (file_path is typed as Path; str coercion was unreachable per mypy)
- Move TaskID import to runtime so DummyTask.id can be properly typed
- Remove stale type: ignore[union-attr] from strategy_evaluation

Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>
KRRT7 and others added 2 commits March 24, 2026 04:22
…command passthrough

- Wrap Future.result() in try/except in generate_tests_and_candidates to prevent
  plugin errors from aborting the entire optimization session
- Rename danom/_*.py files to remove leading underscores per project conventions
- Pass pytest_cmd through to TestConfig.test_command in resolve_test_config
Fixes TC003 ruff lint error.

Co-authored-by: Kevin Turcios <undefined@users.noreply.github.com>
@KRRT7 KRRT7 merged commit 6f6c039 into main Mar 24, 2026
25 of 27 checks passed
@KRRT7 KRRT7 deleted the codeflash-core branch March 24, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant