Skip to content

Conversation

@zomglings
Copy link
Contributor

@zomglings zomglings commented May 14, 2025

User description

Closes CF-637

This PR implements targeted formatting for AI-generated code snippets within the FunctionOptimizer. The goal is to format only the specific function being optimized (and any helpers modified by the AI) rather than entire files with tools like Black/Ruff. Import sorting (isort) is still applied at a file level for correctness after the formatted snippet is integrated.

The existing codeflash/code_utils/formatter.py remains unchanged. All new temporary file management for string formatting is handled within FunctionOptimizer.

Key Changes Implemented:

  • FunctionOptimizer.__init__ Modifications:

    • Imports tempfile (shutil and Path were already present).
    • Creates a base temporary directory for the optimizer instance: self.optimizer_temp_dir = Path(tempfile.mkdtemp(prefix="codeflash_opt_fmt_")). This directory is not cleaned up by FunctionOptimizer itself, allowing it to persist for potential inspection or higher-level cleanup.
  • FunctionOptimizer.determine_best_candidate Method Modifications:

    • When candidate.source_code (a string) needs formatting (i.e., self.args.formatter_cmds is configured):
      • A temporary file is created within self.optimizer_temp_dir.
      • candidate.source_code is written to this temp file.
      • format_code() is called with the temp file's path. (format_code() handles the "disabled" formatter case internally).
      • The individual temporary file used for formatting is deleted in a finally block.
    • candidate.source_code is updated with the formatted snippet (output of primary formatter like Black/Ruff).
    • Import sorting on the snippet itself is not performed here; it's deferred to a file-level operation later.
  • FunctionOptimizer.optimize_function Method Modifications:

    • The reformat_code_and_helpers method and its call site are removed.
    • New Import Sorting Logic: After replace_function_and_helpers_with_optimized_code (which splices the formatted snippet into files):
      • If not self.args.disable_imports_sorting, sort_imports() is applied to the full content of the main function's file.
      • Similarly, sort_imports() is applied to the full content of any helper files (from code_context.helper_functions) that could have been modified.
    • new_code (for the main file) and new_helper_code (for helper files, used in PR generation) are populated by reading the respective file contents after the above snippet replacement and potential file-level import sorting.

Note on Error Handling:
If format_code() encounters an issue while formatting the temporary file (e.g., formatter command not found, or the formatter errors out), it currently logs an error and returns the content of the temporary file (which might be unchanged or malformed). determine_best_candidate proceeds with this returned content. For this PR, we accept this behavior. Future improvements could involve more explicit error signaling or fallback mechanisms if desired.


PR Type

Enhancement, Tests


Description

  • Implement targeted formatting for optimized snippets

  • Apply import sorting after optimization

  • Support disabling telemetry via environment variable

  • Add bubble sort sample files preserving bad formatting


Changes walkthrough 📝

Relevant files
Tests
bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py
Add bubble sort class sample                                                         

code_to_optimize/bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py

  • Added bubble sort class preserving bad formatting
  • Defined lol function and BubbleSorter.sorter
  • Includes prints to stdout and stderr
  • +38/-0   
    bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py
    Add bubble sort function sample                                                   

    code_to_optimize/bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py

  • Added bubble sort function preserving bad formatting
  • Defined lol and standalone sorter function
  • Includes formatted print statements
  • +19/-0   
    Configuration changes
    main.py
    Support disabling telemetry by env var                                     

    codeflash/main.py

  • Imported os for environment variable access
  • Detect CODEFLASH_DISABLE_TELEMETRY in CLI flow
  • Adjust telemetry initialization logic
  • +4/-3     
    Enhancement
    function_optimizer.py
    Implement targeted formatting and import sorting                 

    codeflash/optimization/function_optimizer.py

  • Added optimizer_temp_dir for formatting temp files
  • Implemented targeted snippet formatting in candidates
  • Applied import sorting after replacements
  • Removed legacy reformat_code_and_helpers method
  • +54/-24 

    Need help?
  • Type /help how to ... in the comments thread for any questions about PR-Agent usage.
  • Check out the documentation for more information.
  • @github-actions
    Copy link

    github-actions bot commented May 14, 2025

    PR Reviewer Guide 🔍

    (Review updated until commit 85fd3c0)

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review

    Missing Import

    parse_config_file is used but not imported, leading to a NameError at runtime. Ensure the function is imported or replace it with the correct configuration parser.

    if (not disable_telemetry) and args.config_file and Path.exists(args.config_file):
        pyproject_config, _ = parse_config_file(args.config_file)
    Indentation Inconsistency

    The added telemetry logic uses inconsistent indentation compared to the surrounding code, which may cause linting errors and reduce readability. Align it with the existing style.

    if args.command:
        disable_telemetry = os.environ.get("CODEFLASH_DISABLE_TELEMETRY", "").lower() in {"true", "t", "1", "yes", "y"}
        if (not disable_telemetry) and args.config_file and Path.exists(args.config_file):
    Temp Dir Cleanup

    The self.optimizer_temp_dir directory created in __init__ is never cleaned up, potentially leading to leftover files. Consider adding a teardown or cleanup method.

        self.optimizer_temp_dir = Path(tempfile.mkdtemp(prefix="codeflash_opt_fmt_"))
    
    def optimize_function(self) -> Result[BestOptimization, str]:

    @github-actions
    Copy link

    github-actions bot commented May 14, 2025

    PR Code Suggestions ✨

    Latest suggestions up to 85fd3c0
    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix undefined helper code variable

    The variable original_helper_code is not defined in this scope, causing a NameError.
    Replace it with the helper functions context you have (e.g.
    code_context.helper_functions) or capture the original helper code earlier in the
    method before it’s overwritten.

    codeflash/optimization/function_optimizer.py [325-329]

    -for helper_file_path_key in original_helper_code:
    -    if helper_file_path_key.exists():
    -         new_helper_code[helper_file_path_key] = helper_file_path_key.read_text(encoding="utf8")
    +helper_paths = {hf.file_path for hf in code_context.helper_functions}
    +for helper_path in helper_paths:
    +    if helper_path.exists():
    +        new_helper_code[helper_path] = helper_path.read_text(encoding="utf8")
         else:
    -         logger.warning(f"Helper file {helper_file_path_key} not found after optimization. It will not be included in new_helper_code for PR.")
    +        logger.warning(f"Helper file {helper_path} not found after optimization; skipping.")
    Suggestion importance[1-10]: 9

    __

    Why: Referencing original_helper_code here causes a NameError since it’s never defined in optimize_function; replacing it with something like code_context.helper_functions is essential for functionality.

    High
    General
    Use TemporaryDirectory for cleanup

    Creating a temp directory with mkdtemp without cleanup can leak disk space. Use
    tempfile.TemporaryDirectory to manage cleanup automatically and store its reference
    for proper teardown.

    codeflash/optimization/function_optimizer.py [129]

    -self.optimizer_temp_dir = Path(tempfile.mkdtemp(prefix="codeflash_opt_fmt_"))
    +self._temp_dir_manager = tempfile.TemporaryDirectory(prefix="codeflash_opt_fmt_")
    +self.optimizer_temp_dir = Path(self._temp_dir_manager.name)
    Suggestion importance[1-10]: 6

    __

    Why: Switching to tempfile.TemporaryDirectory ensures automatic cleanup of the temp directory, preventing potential disk-space leaks without complicating teardown logic.

    Low
    Handle import-sorting exceptions

    A failure in sort_imports or file I/O will raise and halt the optimization pipeline.
    Wrap the import-sorting and file writes in a try/except to log and continue on
    errors.

    codeflash/optimization/function_optimizer.py [307-313]

     if not self.args.disable_imports_sorting:
    -    main_file_path = self.function_to_optimize.file_path
    -    if main_file_path.exists():
    -        current_main_content = main_file_path.read_text(encoding="utf8")
    -        sorted_main_content = sort_imports(current_main_content)
    -        if sorted_main_content != current_main_content:
    -            main_file_path.write_text(sorted_main_content, encoding="utf8")
    +    try:
    +        main_file_path = self.function_to_optimize.file_path
    +        if main_file_path.exists():
    +            current_main_content = main_file_path.read_text(encoding="utf8")
    +            sorted_main_content = sort_imports(current_main_content)
    +            if sorted_main_content != current_main_content:
    +                main_file_path.write_text(sorted_main_content, encoding="utf8")
    +    except Exception as e:
    +        logger.error(f"Failed to sort imports in {main_file_path}: {e}")
    Suggestion importance[1-10]: 5

    __

    Why: Wrapping the import-sorting and file writes in try/except prevents a single I/O or sorting failure from halting the entire optimization pipeline, improving robustness.

    Low

    Previous suggestions

    Suggestions up to commit 3cbd6b7
    CategorySuggestion                                                                                                                                    Impact
    Possible issue
    Fix undefined helper code reference

    The loop references original_helper_code, which isn't defined in this scope and will
    raise a NameError. Replace it with the helper functions provided by code_context to
    correctly gather updated helper files.

    codeflash/optimization/function_optimizer.py [323-328]

     new_helper_code: dict[Path, str] = {}
    -for helper_file_path_key in original_helper_code:
    -    if helper_file_path_key.exists():
    -         new_helper_code[helper_file_path_key] = helper_file_path_key.read_text(encoding="utf8")
    +for helper_source in code_context.helper_functions:
    +    helper_path = helper_source.file_path
    +    if helper_path.exists():
    +        new_helper_code[helper_path] = helper_path.read_text(encoding="utf8")
         else:
    -         logger.warning(f"Helper file {helper_file_path_key} not found after optimization. It will not be included in new_helper_code for PR.")
    +        logger.warning(f"Helper file {helper_path} not found after optimization.")
    Suggestion importance[1-10]: 9

    __

    Why: The variable original_helper_code is not defined in this scope and will raise a NameError; switching to code_context.helper_functions fixes the bug.

    High
    Use atomic file writes

    Directly writing to the target file can leave it in a partial state if an error
    occurs mid-write. Use an atomic write pattern: write to a temporary file and then
    atomically replace the original.

    codeflash/optimization/function_optimizer.py [311-312]

     if sorted_main_content != current_main_content:
    -    main_file_path.write_text(sorted_main_content, encoding="utf8")
    +    temp_path = main_file_path.with_suffix(main_file_path.suffix + ".tmp")
    +    temp_path.write_text(sorted_main_content, encoding="utf8")
    +    temp_path.replace(main_file_path)
    Suggestion importance[1-10]: 5

    __

    Why: Directly calling write_text can leave the file half-written on errors; writing to a temporary file and replacing it atomically improves safety.

    Low
    General
    Auto-clean temporary directory

    Using mkdtemp leaks temporary directories if they aren't cleaned up. Switch to
    TemporaryDirectory so the directory is automatically removed when no longer needed.

    codeflash/optimization/function_optimizer.py [128]

    -self.optimizer_temp_dir = Path(tempfile.mkdtemp(prefix="codeflash_opt_fmt_"))
    +self._temp_dir_obj = tempfile.TemporaryDirectory(prefix="codeflash_opt_fmt_")
    +self.optimizer_temp_dir = Path(self._temp_dir_obj.name)
    Suggestion importance[1-10]: 6

    __

    Why: Using tempfile.mkdtemp creates a directory that is never cleaned up, whereas TemporaryDirectory ensures automatic removal and reduces resource leaks.

    Low

    @zomglings
    Copy link
    Contributor Author

    I have tested this code using two optimization targets:

    1. code_to_optimize/bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py
    2. code_to_optimize/bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py

    Both of these are bubble sort implementations but they contain valid, poorly formatted Python code. The new formatting logic in the FunctionOptimizer optimizes the sorting function and method but leaves the badly formatted methods untouched. I have verified this through manual execution of codeflash:

    For code_to_optimize/bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py:

    python -m codeflash.main \
        --file code_to_optimize/bubble_sort_preserve_bad_formatting_for_nonoptimized_code.py \
        --module-root ./ \
        --function sorter \
        --test-framework pytest \
        --tests-root code_to_optimize/tests/pytest \
        --no-pr

    For code_to_optimize/bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py:

    python -m codeflash.main \
        --file code_to_optimize/bubble_sort_method_preserve_bad_formatting_for_nonoptimized_code.py \
        --module-root ./ \
        --function BubbleSorter.sorter \
        --test-framework pytest \
        --tests-root code_to_optimize/tests/pytest \
        --no-pr

    After running either/both of these commands, run git diff to see the changes they made when optimizing their respective bubble sort implementations.

    @zomglings zomglings marked this pull request as ready for review May 14, 2025 21:59
    @github-actions
    Copy link

    Persistent review updated to latest commit 85fd3c0

    tmp_file.write(candidate.source_code)
    temp_code_file_path = Path(tmp_file.name)

    formatted_candidate_code = format_code(
    Copy link
    Contributor

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    formatting code is depenedent on the cwd of the code, imports are grouped according to what module they belong to, for the project's own module they are grouped together. This determination of what is the module they belong to is determined by the cwd.
    So we should not format code in a temp directory, the results may not be the same. This is btw why your unit tests were failing today

    Copy link
    Contributor Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    I don't think that's true, as this is working.

    This is only formatting the function snippet and the helper snippets -- I don't think this has to do with why those unit tests are failing.

    @zomglings
    Copy link
    Contributor Author

    Decided to use a gentler approach using LibCST. Will open separate PR.

    @zomglings zomglings closed this May 16, 2025
    @zomglings zomglings deleted the targeted-formatting branch May 16, 2025 15:41
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants