Skip to content

Conversation

@mohammedahmed18
Copy link
Contributor

@mohammedahmed18 mohammedahmed18 commented Aug 27, 2025

PR Type

Enhancement


Description

  • Persist optimization patch metadata in JSON

  • Add LSP features to list and clear patches

  • Update patch creation to include metadata and file locking


Diagram Walkthrough

flowchart LR
  FTO["performFunctionOptimization"] -- "create_diff_patch" --> GEN["generate patch file"]
  GEN -- "save_patches_metadata" --> META["metadata.json"]
  META -- "retrieve/clear" --> LSP["LSP server features"]
Loading

File Walkthrough

Relevant files
Enhancement
git_utils.py
Persist and manage patch metadata                                               

codeflash/code_utils/git_utils.py

  • Added imports for json, lru_cache, and FileLock
  • Introduced get_git_project_id and directory helpers
  • Implemented get_patches_metadata, save, overwrite functions
  • Updated create_diff_patch to write patches and metadata
+70/-7   
beta.py
LSP endpoints for patch metadata                                                 

codeflash/lsp/beta.py

  • Defined OnPatchAppliedParams dataclass
  • Added retrieveSuccessfulOptimizations and onPatchApplied handlers
  • Enhanced performFunctionOptimization to pass metadata_input
  • Returned patch_id and patch_path in responses
+55/-8   
optimizer.py
Metadata integration in optimizer                                               

codeflash/optimization/optimizer.py

  • Adapted optimizer to use metadata_input for patches
  • Stored returned patch_path in optimizer state
  • Refactored unwrap logic for best_optimization
+5/-6     

@github-actions
Copy link

github-actions bot commented Aug 27, 2025

PR Reviewer Guide 🔍

(Review updated until commit cd7e1e1)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Unused Import

The import of cache is never used and should be removed to keep the imports clean.

import json
import os
import shutil
import subprocess
import sys
import tempfile
import time
from functools import cache, lru_cache
Unhandled Empty Diff

create_diff_patch_from_worktree returns an empty dict on no changes, but callers assume patch_path and id exist, leading to KeyError.

if not uni_diff_text:
    logger.warning("No changes found in worktree.")
    return {}
Missing Metadata Fields

create_diff_patch_from_worktree requires metadata_input["fto_name"], but here it's called with an empty dict, causing a KeyError.

metadata = create_diff_patch_from_worktree(
    self.current_worktree, relative_file_paths, metadata_input={}
)

@github-actions
Copy link

github-actions bot commented Aug 27, 2025

PR Code Suggestions ✨

Latest suggestions up to cd7e1e1
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Always return patch_path

Handle the case when metadata_input is empty so that the function always returns at
least the patch path. This avoids key errors in callers expecting a patch_path
field.

codeflash/code_utils/git_utils.py [334-339]

-final_metadata = {}
 if metadata_input:
     metadata_input["patch_path"] = str(patch_path)
-    final_metadata = save_patches_metadata(metadata_input)
+    return save_patches_metadata(metadata_input)
+return {"patch_path": str(patch_path)}
 
-return final_metadata
-
Suggestion importance[1-10]: 7

__

Why: Without this change, an empty metadata_input leads to returning an empty dict and missing patch_path, which can break callers expecting it.

Medium
Create patches directory if missing

Ensure the project patches directory exists before acquiring the lock and writing
metadata. This prevents errors when the cache directory is missing.

codeflash/code_utils/git_utils.py [286-292]

 project_patches_dir = get_patches_dir_for_project()
+project_patches_dir.mkdir(parents=True, exist_ok=True)
 meta_file = project_patches_dir / "metadata.json"
 lock_file = project_patches_dir / "metadata.json.lock"
 
-# we are not supporting multiple concurrent optimizations within the same process, but keep that in case we decide to do so in the future.
 with FileLock(lock_file, timeout=10):
     metadata = get_patches_metadata()
Suggestion importance[1-10]: 6

__

Why: Ensuring project_patches_dir exists before acquiring the lock prevents errors when the directory is missing, improving reliability.

Low
General
Guard empty commit list

Add a guard to check if the repository has any commits before accessing
root_commits[0], and raise a clear error if none are found.

codeflash/code_utils/git_utils.py [206-212]

 @lru_cache(maxsize=1)
 def get_git_project_id() -> str:
     """Return the first commit sha of the repo."""
     repo: Repo = git.Repo(search_parent_directories=True)
     root_commits = list(repo.iter_commits(rev="HEAD", max_parents=0))
+    if not root_commits:
+        raise RuntimeError("No commits found in repository")
     return root_commits[0].hexsha
Suggestion importance[1-10]: 6

__

Why: Adding a check for an empty root_commits list prevents an unhandled IndexError and provides a clear error message when no commits exist.

Low

Previous suggestions

Suggestions up to commit a664330
CategorySuggestion                                                                                                                                    Impact
Possible issue
Always return patch_path

Ensure that a patch path is always returned even when no metadata_input is provided
to avoid KeyError downstream. Return a minimal metadata dict with patch_path when
metadata_input is empty. This restores backward compatibility for callers expecting
a path.

codeflash/code_utils/git_utils.py [333-338]

-final_metadata = {}
-if metadata_input:
-    metadata_input["patch_path"] = str(patch_path)
-    final_metadata = save_patches_metadata(metadata_input)
-
+if not metadata_input:
+    return {"patch_path": str(patch_path)}
+metadata_input["patch_path"] = str(patch_path)
+final_metadata = save_patches_metadata(metadata_input)
 return final_metadata
Suggestion importance[1-10]: 8

__

Why: The function returns an empty dict when metadata_input is empty, leading to KeyError when accessing patch_path. Ensuring patch_path is always returned prevents downstream errors.

Medium
General
Ensure patches directory exists

Create the project patches directory before acquiring the lock to ensure the
metadata and lock files can be created without errors. This avoids race conditions
when the directory does not exist.

codeflash/code_utils/git_utils.py [285-291]

 def save_patches_metadata(patch_metadata: dict) -> dict:
     project_patches_dir = get_patches_dir_for_project()
+    project_patches_dir.mkdir(parents=True, exist_ok=True)
     meta_file = project_patches_dir / "metadata.json"
     lock_file = project_patches_dir / "metadata.json.lock"
 
     with FileLock(lock_file, timeout=10):
         metadata = get_patches_metadata()
Suggestion importance[1-10]: 7

__

Why: Without creating project_patches_dir, acquiring the lock or writing metadata may fail if the directory doesn't exist. Pre-creating the directory avoids this race condition.

Medium
Handle corrupted metadata file

Guard against a corrupted metadata.json by catching JSONDecodeError and resetting to
a default structure. This prevents crashes if the file becomes invalid.

codeflash/code_utils/git_utils.py [280-282]

 if meta_file.exists():
-    return json.loads(meta_file.read_text())
+    try:
+        return json.loads(meta_file.read_text())
+    except json.JSONDecodeError:
+        logger.warning("Corrupted metadata.json, resetting to defaults")
+        return {"id": get_git_project_id() or "", "patches": []}
 return {"id": get_git_project_id() or "", "patches": []}
Suggestion importance[1-10]: 6

__

Why: Catching JSONDecodeError prevents crashes when metadata.json is invalid and gracefully resets to defaults, improving robustness.

Low

@mohammedahmed18 mohammedahmed18 marked this pull request as ready for review August 27, 2025 15:41
@github-actions
Copy link

Persistent review updated to latest commit cd7e1e1

codeflash-ai bot added a commit that referenced this pull request Aug 27, 2025
…(`worktree/persist-optimization-patches`)

The optimization achieves a **98% speedup** by eliminating two unnecessary operations in the `get_patches_dir_for_project()` function:

**Key Changes:**
1. **Removed redundant `Path()` constructor**: The original code unnecessarily wrapped `patches_dir / project_id` with `Path()`, even though the `/` operator on a `Path` object already returns a `Path`. This eliminated an extra object creation.

2. **Removed unnecessary `or ""` fallback**: Since `get_git_project_id()` is guaranteed to return a valid string (or raise an exception if the repo doesn't exist), the `or ""` fallback was redundant and added extra evaluation overhead.

**Performance Impact:**
The line profiler shows the return statement improved from 593,800ns to 232,729ns (61% faster), while the overall function time decreased from 88.3μs to 44.4μs. The optimizations are particularly effective because:
- `Path()` constructor calls have overhead for validation and normalization
- The `or ""` operation requires evaluating truthiness of the string result
- These micro-optimizations compound when the function is called repeatedly

**Test Case Performance:**
All test cases show consistent 86-133% speedups, with the optimization being especially beneficial for:
- Repeated calls (cache hit scenarios): 86-105% faster
- Large repositories: 90.5% faster even with 500 commits
- Edge cases with symlinks/subdirectories: 87-105% faster

The optimization maintains identical behavior while eliminating unnecessary computational overhead in the path construction logic.
Comment on lines 273 to 274
project_id = get_git_project_id() or ""
return Path(patches_dir / project_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 99% (0.99x) speedup for get_patches_dir_for_project in codeflash/code_utils/git_utils.py

⏱️ Runtime : 88.3 microseconds 44.4 microseconds (best of 5 runs)

📝 Explanation and details

The optimization achieves a 98% speedup by eliminating two unnecessary operations in the get_patches_dir_for_project() function:

Key Changes:

  1. Removed redundant Path() constructor: The original code unnecessarily wrapped patches_dir / project_id with Path(), even though the / operator on a Path object already returns a Path. This eliminated an extra object creation.

  2. Removed unnecessary or "" fallback: Since get_git_project_id() is guaranteed to return a valid string (or raise an exception if the repo doesn't exist), the or "" fallback was redundant and added extra evaluation overhead.

Performance Impact:
The line profiler shows the return statement improved from 593,800ns to 232,729ns (61% faster), while the overall function time decreased from 88.3μs to 44.4μs. The optimizations are particularly effective because:

  • Path() constructor calls have overhead for validation and normalization
  • The or "" operation requires evaluating truthiness of the string result
  • These micro-optimizations compound when the function is called repeatedly

Test Case Performance:
All test cases show consistent 86-133% speedups, with the optimization being especially beneficial for:

  • Repeated calls (cache hit scenarios): 86-105% faster
  • Large repositories: 90.5% faster even with 500 commits
  • Edge cases with symlinks/subdirectories: 87-105% faster

The optimization maintains identical behavior while eliminating unnecessary computational overhead in the path construction logic.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 8 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path

import git
# imports
import pytest  # used for our unit tests
from codeflash.code_utils.git_utils import get_patches_dir_for_project
from git import Repo

# Simulate codeflash_cache_dir for testing purposes
codeflash_cache_dir = Path(tempfile.gettempdir()) / "codeflash_test_cache"
patches_dir = codeflash_cache_dir / "patches"
from codeflash.code_utils.git_utils import get_patches_dir_for_project


@pytest.fixture
def git_repo(tmp_path, monkeypatch):
    """
    Create a temporary git repo and set working directory to it.
    Returns the repo and the original cwd.
    """
    # Create temp repo
    repo_dir = tmp_path / "repo"
    repo_dir.mkdir()
    repo = Repo.init(repo_dir)
    # Add a file and commit
    file_path = repo_dir / "file.txt"
    file_path.write_text("hello")
    repo.index.add([str(file_path)])
    repo.index.commit("Initial commit")
    # Monkeypatch cwd and codeflash_cache_dir to simulate environment
    monkeypatch.chdir(repo_dir)
    monkeypatch.setattr(
        __import__(__name__), "codeflash_cache_dir", tmp_path / "cache", raising=False
    )
    monkeypatch.setattr(
        __import__(__name__), "patches_dir", (tmp_path / "cache" / "patches"), raising=False
    )
    # Clear lru_cache for get_git_project_id
    get_git_project_id.cache_clear()
    return repo, repo_dir

# ---- Basic Test Cases ----











#------------------------------------------------
from __future__ import annotations

import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path

import git
# imports
import pytest  # used for our unit tests
from codeflash.code_utils.git_utils import get_patches_dir_for_project

# Simulate the codeflash_cache_dir import for testability
codeflash_cache_dir = Path(tempfile.gettempdir()) / "codeflash_cache_test"

patches_dir = codeflash_cache_dir / "patches"
from codeflash.code_utils.git_utils import get_patches_dir_for_project


@pytest.fixture
def temp_git_repo(tmp_path, monkeypatch):
    """
    Create a temporary git repo and patch cwd to it.
    Returns (repo, repo_path)
    """
    repo_path = tmp_path / "repo"
    repo_path.mkdir()
    repo = git.Repo.init(repo_path)
    # Create a file and commit to ensure we have a root commit
    file = repo_path / "file.txt"
    file.write_text("hello world")
    repo.index.add([str(file)])
    repo.index.commit("Initial commit")
    # Patch cwd and codeflash_cache_dir for the duration of the test
    monkeypatch.chdir(repo_path)
    global codeflash_cache_dir, patches_dir
    codeflash_cache_dir = tmp_path / "codeflash_cache_test"
    patches_dir = codeflash_cache_dir / "patches"
    return repo, repo_path


# ---------------------- BASIC TEST CASES ----------------------

def test_returns_path_with_correct_commit_sha(temp_git_repo):
    """Test that the returned path contains the correct commit sha."""
    repo, repo_path = temp_git_repo
    expected_sha = next(repo.iter_commits(rev="HEAD", max_parents=0)).hexsha
    codeflash_output = get_patches_dir_for_project(); result_path = codeflash_output # 15.9μs -> 8.55μs (86.4% faster)

def test_returns_path_type(temp_git_repo):
    """Test that the function returns a Path object."""
    codeflash_output = get_patches_dir_for_project(); result_path = codeflash_output # 12.3μs -> 6.09μs (103% faster)

def test_path_does_not_create_directory(temp_git_repo):
    """Test that the function does not create the directory on disk."""
    codeflash_output = get_patches_dir_for_project(); result_path = codeflash_output # 13.5μs -> 5.80μs (133% faster)

# ---------------------- EDGE TEST CASES ----------------------




def test_symlinked_git_repo(tmp_path, monkeypatch):
    """Test behavior when cwd is a symlink to the git repo."""
    repo_path = tmp_path / "repo"
    repo_path.mkdir()
    repo = git.Repo.init(repo_path)
    (repo_path / "foo.txt").write_text("foo")
    repo.index.add(["foo.txt"])
    repo.index.commit("init")
    symlink_path = tmp_path / "repo_symlink"
    symlink_path.symlink_to(repo_path, target_is_directory=True)
    monkeypatch.chdir(symlink_path)
    # Should still work and return the correct sha
    expected_sha = next(repo.iter_commits(rev="HEAD", max_parents=0)).hexsha
    codeflash_output = get_patches_dir_for_project(); result_path = codeflash_output # 15.3μs -> 8.14μs (87.7% faster)

def test_git_repo_in_parent_directory(tmp_path, monkeypatch):
    """Test that function works when cwd is a subdirectory of the repo."""
    repo_path = tmp_path / "repo"
    repo_path.mkdir()
    repo = git.Repo.init(repo_path)
    (repo_path / "foo.txt").write_text("foo")
    repo.index.add(["foo.txt"])
    repo.index.commit("init")
    subdir = repo_path / "subdir" / "deeper"
    subdir.mkdir(parents=True)
    monkeypatch.chdir(subdir)
    expected_sha = next(repo.iter_commits(rev="HEAD", max_parents=0)).hexsha
    codeflash_output = get_patches_dir_for_project(); result_path = codeflash_output # 15.7μs -> 7.64μs (105% faster)

def test_cache_behavior(temp_git_repo):
    """Test that lru_cache is used and function is not recomputed unless cache is cleared."""
    repo, repo_path = temp_git_repo
    # Call once to populate cache
    codeflash_output = get_patches_dir_for_project(); result1 = codeflash_output
    # Remove the repo directory to simulate repo disappearance
    shutil.rmtree(repo_path)
    # Should still return the cached result (no error)
    codeflash_output = get_patches_dir_for_project(); result2 = codeflash_output
    # Clear cache, now it should error
    get_git_project_id.cache_clear()
    with pytest.raises(git.exc.InvalidGitRepositoryError):
        get_patches_dir_for_project()

# ---------------------- LARGE SCALE TEST CASES ----------------------

def test_large_git_repo_first_commit(tmp_path, monkeypatch):
    """
    Test with a repo with 500 commits to ensure performance and correctness.
    The function should still return the first commit's sha.
    """
    repo_path = tmp_path / "largerepo"
    repo_path.mkdir()
    repo = git.Repo.init(repo_path)
    (repo_path / "file.txt").write_text("0")
    repo.index.add(["file.txt"])
    repo.index.commit("commit 0")
    first_sha = next(repo.iter_commits(rev="HEAD", max_parents=0)).hexsha
    # Add 499 more commits
    for i in range(1, 500):
        (repo_path / "file.txt").write_text(str(i))
        repo.index.add(["file.txt"])
        repo.index.commit(f"commit {i}")
    monkeypatch.chdir(repo_path)
    codeflash_output = get_patches_dir_for_project(); result_path = codeflash_output # 15.6μs -> 8.19μs (90.5% faster)

To test or edit this optimization locally git merge codeflash/optimize-pr690-2025-08-27T15.52.13

Suggested change
project_id = get_git_project_id() or ""
return Path(patches_dir / project_id)
project_id = get_git_project_id()
return patches_dir / project_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it

…ree/persist-optimization-patches`)

The optimized code achieves a **44% speedup** through two key optimizations:

**1. Added `@lru_cache(maxsize=1)` to `get_patches_dir_for_project()`**
- This caches the Path object construction, avoiding repeated calls to `get_git_project_id()` and `Path()` creation
- The line profiler shows this function's total time dropped from 5.32ms to being completely eliminated from the hot path in `get_patches_metadata()`
- Since `get_git_project_id()` was already cached but still being called repeatedly, this second-level caching eliminates that redundancy

**2. Replaced `read_text()` + `json.loads()` with `open()` + `json.load()`**
- Using `json.load()` with a file handle is more efficient than reading the entire file into memory first with `read_text()` then parsing it
- This avoids the intermediate string creation and is particularly beneficial for larger JSON files
- Added explicit UTF-8 encoding for consistency

**Performance Impact by Test Type:**
- **Basic cases** (small/missing files): 45-65% faster - benefits primarily from the caching optimization
- **Edge cases** (malformed JSON): 38-47% faster - still benefits from both optimizations  
- **Large scale cases** (1000+ patches, large files): 39-52% faster - the file I/O optimization becomes more significant with larger JSON files

The caching optimization provides the most consistent gains across all scenarios since it eliminates repeated expensive operations, while the file I/O optimization scales with file size.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Aug 27, 2025

⚡️ Codeflash found optimizations for this PR

📄 45% (0.45x) speedup for get_patches_metadata in codeflash/code_utils/git_utils.py

⏱️ Runtime : 716 microseconds 495 microseconds (best of 39 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch worktree/persist-optimization-patches).

codeflash-ai bot added a commit that referenced this pull request Aug 27, 2025
…690 (`worktree/persist-optimization-patches`)

The optimized code achieves a **16% speedup** through two key optimizations in the `get_patches_metadata()` function:

**1. Caching expensive directory lookups**: The original code called `get_patches_dir_for_project()` on every invocation, which dominated 88.6% of execution time (26.9ms out of 30.3ms total). The optimization introduces `_cached_get_patches_dir_for_project()` with `@lru_cache(maxsize=1)`, eliminating repeated expensive Git operations. This reduces the directory lookup time from 26.9ms to 25.1ms while enabling reuse across multiple calls.

**2. More efficient JSON parsing**: Replaced `json.loads(meta_file.read_text())` with `json.load(f)` using a direct file handle. This avoids loading the entire file content into memory as a string before parsing, reducing JSON processing time from 2.3ms to 1.3ms (43% improvement).

The line profiler shows the optimization is most effective when `get_patches_metadata()` is called multiple times, as the cached directory lookup provides cumulative benefits. Test results demonstrate consistent 14-19% speedups across various scenarios, with particularly strong gains for large metadata files and repeated invocations. The caching is especially valuable in LSP server contexts where the same patches directory is accessed frequently during a session.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Aug 27, 2025

⚡️ Codeflash found optimizations for this PR

📄 16% (0.16x) speedup for retrieve_successful_optimizations in codeflash/lsp/beta.py

⏱️ Runtime : 1.95 milliseconds 1.68 milliseconds (best of 123 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch worktree/persist-optimization-patches).

codeflash-ai bot added a commit that referenced this pull request Aug 27, 2025
…/persist-optimization-patches`)

The key optimization is **conditional metadata writing** - only calling `overwrite_patch_metadata()` when a patch is actually found and needs to be removed.

**What changed:**
- Moved `overwrite_patch_metadata(new_patches)` inside the `if deleted_patch_file:` block
- This means metadata is only updated when a patch exists to be removed

**Why this is faster:**
In the original code, `overwrite_patch_metadata()` was called unconditionally for every request, even when the patch doesn't exist. This function involves expensive file I/O operations including:
- Acquiring a file lock (27% of overwrite function time)
- Reading metadata again via `get_patches_metadata()` 
- Writing JSON back to disk (53% of overwrite function time)

The line profiler shows the original `overwrite_patch_metadata()` call took ~31ms (50.9% of total time), while the optimized version eliminates this entirely for non-existent patches.

**Performance by test case type:**
- **Error cases** (patch not found): ~500% speedup - completely avoids expensive metadata write operations
- **Success cases** (patch found): ~470-490% speedup - still benefits from avoiding the redundant metadata read inside the lock
- **Large metadata files**: Maintains similar speedup ratios regardless of size

The optimization is most effective for error scenarios where patches don't exist, but provides consistent benefits across all use cases by eliminating unnecessary file operations.
Comment on lines 247 to 249
overwrite_patch_metadata(new_patches)
# then remove the patch file
if deleted_patch_file:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚡️Codeflash found 547% (5.47x) speedup for on_patch_applied in codeflash/lsp/beta.py

⏱️ Runtime : 11.7 milliseconds 1.81 milliseconds (best of 23 runs)

📝 Explanation and details

The key optimization is conditional metadata writing - only calling overwrite_patch_metadata() when a patch is actually found and needs to be removed.

What changed:

  • Moved overwrite_patch_metadata(new_patches) inside the if deleted_patch_file: block
  • This means metadata is only updated when a patch exists to be removed

Why this is faster:
In the original code, overwrite_patch_metadata() was called unconditionally for every request, even when the patch doesn't exist. This function involves expensive file I/O operations including:

  • Acquiring a file lock (27% of overwrite function time)
  • Reading metadata again via get_patches_metadata()
  • Writing JSON back to disk (53% of overwrite function time)

The line profiler shows the original overwrite_patch_metadata() call took ~31ms (50.9% of total time), while the optimized version eliminates this entirely for non-existent patches.

Performance by test case type:

  • Error cases (patch not found): ~500% speedup - completely avoids expensive metadata write operations
  • Success cases (patch found): ~470-490% speedup - still benefits from avoiding the redundant metadata read inside the lock
  • Large metadata files: Maintains similar speedup ratios regardless of size

The optimization is most effective for error scenarios where patches don't exist, but provides consistent benefits across all use cases by eliminating unnecessary file operations.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 28 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 64.3%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

import json
import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path
from types import SimpleNamespace
from typing import Any

import git
# imports
import pytest  # used for our unit tests
from codeflash.code_utils.compat import codeflash_cache_dir
from codeflash.code_utils.git_utils import (get_patches_metadata,
                                            overwrite_patch_metadata)
from codeflash.lsp.beta import on_patch_applied
from codeflash.lsp.server import (CodeflashLanguageServer,
                                  CodeflashLanguageServerProtocol)
from filelock import FileLock
from git import Repo

server = CodeflashLanguageServer("codeflash-language-server", "v1.0", protocol_cls=CodeflashLanguageServerProtocol)
from codeflash.lsp.beta import on_patch_applied


# Helper to create a patch metadata file and patch file
def setup_patch(tmp_path, patch_id="patch1", patch_content="patch content"):
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patch_file = patches_root / f"{patch_id}.patch"
    patch_file.write_text(patch_content)
    metadata = {
        "id": "test_project_id",
        "patches": [
            {"id": patch_id, "patch_path": str(patch_file)}
        ]
    }
    (patches_root / "metadata.json").write_text(json.dumps(metadata, indent=2))
    return patch_file, patches_root / "metadata.json"

# Helper to create arbitrary metadata
def setup_metadata(tmp_path, patches):
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    metadata = {
        "id": "test_project_id",
        "patches": patches
    }
    (patches_root / "metadata.json").write_text(json.dumps(metadata, indent=2))
    return patches_root / "metadata.json"

# Helper params object
def make_params(patch_id):
    return SimpleNamespace(patch_id=patch_id)

# -------------------------
# Basic Test Cases
# -------------------------

def test_remove_existing_patch(tmp_path):
    """Test removing a patch that exists in metadata and on disk."""
    patch_file, meta_file = setup_patch(tmp_path, patch_id="patch1")
    params = make_params("patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 454μs -> 79.3μs (473% faster)
    # Metadata should have no patches
    data = json.loads(meta_file.read_text())

def test_remove_patch_updates_metadata_only(tmp_path):
    """Test removing a patch updates only the target patch in metadata, not others."""
    # Setup two patches
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patch_file1 = patches_root / "patch1.patch"
    patch_file2 = patches_root / "patch2.patch"
    patch_file1.write_text("patch1 content")
    patch_file2.write_text("patch2 content")
    metadata = {
        "id": "test_project_id",
        "patches": [
            {"id": "patch1", "patch_path": str(patch_file1)},
            {"id": "patch2", "patch_path": str(patch_file2)},
        ]
    }
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps(metadata, indent=2))
    # Remove patch1
    params = make_params("patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 418μs -> 70.2μs (497% faster)
    # Only patch2 left in metadata
    data = json.loads(meta_file.read_text())

def test_remove_patch_file_missing(tmp_path):
    """Test removing a patch where the patch file is already missing (should still succeed)."""
    patch_file, meta_file = setup_patch(tmp_path, patch_id="patch1")
    patch_file.unlink()  # Remove the patch file before running
    params = make_params("patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 416μs -> 70.7μs (489% faster)
    # Metadata should have no patches
    data = json.loads(meta_file.read_text())

# -------------------------
# Edge Test Cases
# -------------------------

def test_remove_nonexistent_patch(tmp_path):
    """Test removing a patch that does not exist in metadata."""
    setup_patch(tmp_path, patch_id="patch1")
    params = make_params("not_a_patch")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 414μs -> 67.1μs (517% faster)

def test_remove_patch_empty_metadata(tmp_path):
    """Test removing a patch when metadata is empty."""
    setup_metadata(tmp_path, [])
    params = make_params("patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 414μs -> 68.3μs (507% faster)

def test_remove_patch_metadata_file_missing(tmp_path):
    """Test removing a patch when metadata.json does not exist at all."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    # No metadata.json created
    params = make_params("patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 418μs -> 71.9μs (482% faster)

def test_remove_patch_with_similar_id(tmp_path):
    """Test removing a patch where another patch has a similar but not identical id."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patch_file1 = patches_root / "patch1.patch"
    patch_file2 = patches_root / "patch1a.patch"
    patch_file1.write_text("patch1 content")
    patch_file2.write_text("patch1a content")
    metadata = {
        "id": "test_project_id",
        "patches": [
            {"id": "patch1", "patch_path": str(patch_file1)},
            {"id": "patch1a", "patch_path": str(patch_file2)},
        ]
    }
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps(metadata, indent=2))
    # Remove patch1
    params = make_params("patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 416μs -> 69.5μs (499% faster)
    data = json.loads(meta_file.read_text())

def test_remove_patch_with_non_string_id(tmp_path):
    """Test removing a patch with a numeric id (should work as long as ids match)."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patch_file = patches_root / "123.patch"
    patch_file.write_text("patch content")
    metadata = {
        "id": "test_project_id",
        "patches": [
            {"id": 123, "patch_path": str(patch_file)}
        ]
    }
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps(metadata, indent=2))
    params = make_params(123)
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 417μs -> 70.3μs (494% faster)
    data = json.loads(meta_file.read_text())


def test_remove_patch_when_patch_path_is_missing_in_metadata(tmp_path):
    """Test removing a patch where patch_path is missing in patch metadata."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    metadata = {
        "id": "test_project_id",
        "patches": [
            {"id": "patch1"}  # missing patch_path
        ]
    }
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps(metadata, indent=2))
    params = make_params("patch1")
    # Should not raise, but should return success (patch_path is None, so no file is deleted)
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 425μs -> 71.7μs (494% faster)
    data = json.loads(meta_file.read_text())

# -------------------------
# Large Scale Test Cases
# -------------------------

def test_remove_patch_from_large_metadata(tmp_path):
    """Test removing a patch from a large metadata file (1000 patches)."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patches = []
    patch_files = []
    for i in range(1000):
        patch_file = patches_root / f"patch{i}.patch"
        patch_file.write_text(f"patch {i} content")
        patches.append({"id": f"patch{i}", "patch_path": str(patch_file)})
        patch_files.append(patch_file)
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
    # Remove patch500
    params = make_params("patch500")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 444μs -> 77.9μs (471% faster)
    for i, pf in enumerate(patch_files):
        if i == 500:
            continue
    # patch500 removed from metadata
    data = json.loads(meta_file.read_text())
    ids = [p["id"] for p in data["patches"]]

def test_remove_all_patches_one_by_one(tmp_path):
    """Test removing all patches one by one from a large metadata file."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patches = []
    patch_files = []
    for i in range(10):
        patch_file = patches_root / f"patch{i}.patch"
        patch_file.write_text(f"patch {i} content")
        patches.append({"id": f"patch{i}", "patch_path": str(patch_file)})
        patch_files.append(patch_file)
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
    # Remove all patches one by one
    for i in range(10):
        params = make_params(f"patch{i}")
        codeflash_output = on_patch_applied(None, params); result = codeflash_output # 4.07ms -> 521μs (681% faster)
        data = json.loads(meta_file.read_text())
        ids = [p["id"] for p in data["patches"]]
    # After all, metadata should be empty
    data = json.loads(meta_file.read_text())

def test_remove_patch_from_large_metadata_file_missing(tmp_path):
    """Test removing a patch from a large metadata file where the patch file is missing."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patches = []
    for i in range(100):
        patch_file = patches_root / f"patch{i}.patch"
        if i == 50:
            # Don't create patch50 file
            patch_path = patch_file
        else:
            patch_file.write_text(f"patch {i} content")
            patch_path = patch_file
        patches.append({"id": f"patch{i}", "patch_path": str(patch_path)})
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
    # Remove patch50 (file missing)
    params = make_params("patch50")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 419μs -> 70.1μs (498% faster)
    # patch50 not in metadata
    data = json.loads(meta_file.read_text())
    ids = [p["id"] for p in data["patches"]]

def test_remove_patch_from_large_metadata_not_found(tmp_path):
    """Test removing a patch not present in a large metadata file."""
    patches_root = tmp_path / "cache" / "patches" / "test_project_id"
    patches_root.mkdir(parents=True, exist_ok=True)
    patches = []
    for i in range(100):
        patch_file = patches_root / f"patch{i}.patch"
        patch_file.write_text(f"patch {i} content")
        patches.append({"id": f"patch{i}", "patch_path": str(patch_file)})
    meta_file = patches_root / "metadata.json"
    meta_file.write_text(json.dumps({"id": "test_project_id", "patches": patches}, indent=2))
    # Remove patch100 (not present)
    params = make_params("patch100")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 419μs -> 70.0μs (500% faster)
    # All files remain
    for i in range(100):
        pass
    # Metadata unchanged
    data = json.loads(meta_file.read_text())
    ids = [p["id"] for p in data["patches"]]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

import json
import os
import shutil
import tempfile
from functools import lru_cache
from pathlib import Path
from types import SimpleNamespace
from typing import Any

import git
# imports
import pytest  # used for our unit tests
from codeflash.code_utils.compat import codeflash_cache_dir
from codeflash.code_utils.git_utils import (get_patches_metadata,
                                            overwrite_patch_metadata)
from codeflash.lsp.beta import on_patch_applied
from codeflash.lsp.server import (CodeflashLanguageServer,
                                  CodeflashLanguageServerProtocol)
from filelock import FileLock
from git import Repo

patches_dir = codeflash_cache_dir / "patches"



server = CodeflashLanguageServer("codeflash-language-server", "v1.0", protocol_cls=CodeflashLanguageServerProtocol)
from codeflash.lsp.beta import on_patch_applied


def write_metadata(tmp_path, patches):
    """Helper to write metadata.json with given patches."""
    meta = {
        "id": "test_project_id",
        "patches": patches
    }
    patches_dir = tmp_path / "patches" / "test_project_id"
    patches_dir.mkdir(parents=True, exist_ok=True)
    (patches_dir / "metadata.json").write_text(json.dumps(meta, indent=2))


def read_metadata(tmp_path):
    """Helper to read metadata.json."""
    patches_dir = tmp_path / "patches" / "test_project_id"
    return json.loads((patches_dir / "metadata.json").read_text())


def make_patch_file(tmp_path, name="patch.diff"):
    """Helper to create a dummy patch file."""
    patches_dir = tmp_path / "patches" / "test_project_id"
    patch_path = patches_dir / name
    patch_path.write_text("patch content")
    return str(patch_path)


# ------------------------
# 1. Basic Test Cases
# ------------------------


def test_patch_not_found_returns_error(tmp_path, monkeypatch):
    """
    Test that on_patch_applied returns error if patch_id not present.
    """
    patches = [
        {"id": "patch1", "patch_path": "patch1.diff"}
    ]
    write_metadata(tmp_path, patches)
    params = SimpleNamespace(patch_id="patch999")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 430μs -> 72.6μs (493% faster)
    # Metadata should be unchanged
    meta = read_metadata(tmp_path)


def test_empty_metadata(tmp_path, monkeypatch):
    """
    Test when metadata.json exists but has no patches.
    """
    write_metadata(tmp_path, [])
    params = SimpleNamespace(patch_id="patchX")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 429μs -> 72.1μs (496% faster)
    meta = read_metadata(tmp_path)


def test_patch_file_missing_on_disk(tmp_path, monkeypatch):
    """
    Test that on_patch_applied succeeds even if the patch file is already deleted.
    """
    patch_path = tmp_path / "patches" / "test_project_id" / "patch1.diff"
    # Do not create the file
    patches = [
        {"id": "patch1", "patch_path": str(patch_path)}
    ]
    write_metadata(tmp_path, patches)
    params = SimpleNamespace(patch_id="patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 426μs -> 72.5μs (489% faster)
    meta = read_metadata(tmp_path)



def test_patch_path_is_empty_string(tmp_path, monkeypatch):
    """
    Test that a patch with empty patch_path is removed from metadata and does not error.
    """
    patches = [
        {"id": "patch1", "patch_path": ""}
    ]
    write_metadata(tmp_path, patches)
    params = SimpleNamespace(patch_id="patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 439μs -> 74.0μs (493% faster)
    meta = read_metadata(tmp_path)

def test_patch_path_is_none(tmp_path, monkeypatch):
    """
    Test that a patch with None patch_path is removed from metadata and does not error.
    """
    patches = [
        {"id": "patch1", "patch_path": None}
    ]
    write_metadata(tmp_path, patches)
    params = SimpleNamespace(patch_id="patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 420μs -> 70.8μs (494% faster)
    meta = read_metadata(tmp_path)

def test_patch_path_is_nonexistent_file(tmp_path, monkeypatch):
    """
    Test that a patch_path pointing to a file that doesn't exist is handled gracefully.
    """
    patch_path = tmp_path / "patches" / "test_project_id" / "nonexistent.diff"
    patches = [
        {"id": "patch1", "patch_path": str(patch_path)}
    ]
    write_metadata(tmp_path, patches)
    params = SimpleNamespace(patch_id="patch1")
    codeflash_output = on_patch_applied(None, params); result = codeflash_output # 419μs -> 70.7μs (494% faster)
    meta = read_metadata(tmp_path)

# ------------------------
# 3. Large Scale Test Cases
# ------------------------

To test or edit this optimization locally git merge codeflash/optimize-pr690-2025-08-27T16.33.55

Suggested change
overwrite_patch_metadata(new_patches)
# then remove the patch file
if deleted_patch_file:
# then remove the patch file
if deleted_patch_file:
overwrite_patch_metadata(new_patches)

@mohammedahmed18 mohammedahmed18 requested a review from KRRT7 August 28, 2025 13:51
mohammedahmed18 and others added 3 commits August 28, 2025 18:11
…deflash-ai/codeflash into codeflash/optimize-pr690-2025-08-27T15.58.44
…25-08-27T15.58.44

⚡️ Speed up function `get_patches_metadata` by 45% in PR #690 (`worktree/persist-optimization-patches`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Aug 30, 2025

@Saga4 Saga4 merged commit 2a1096b into main Sep 2, 2025
17 of 19 checks passed
codeflash-ai bot added a commit that referenced this pull request Sep 2, 2025
…(`worktree/persist-optimization-patches`)

The optimization achieves a 112% speedup through two key changes:

1. **Replace `list(repo.iter_commits(...))` with `next(repo.iter_commits(...))`**: The original code materializes all root commits into a list just to access the first one. The optimized version uses `next()` to get only the first commit from the iterator, avoiding unnecessary memory allocation and iteration through all root commits. This is particularly beneficial for repositories with multiple root commits (though rare, they can occur in merged repositories).

2. **Remove redundant `Path()` wrapper**: The original code wraps `patches_dir / project_id` in `Path()`, but since `patches_dir` is already a `Path` object and the `/` operator returns a `Path`, the wrapper is unnecessary overhead.

The test results show consistent speedups across all scenarios (93-159% faster), with the optimization being especially effective for repositories with many commits (500 commits: 18.0μs → 9.09μs) and complex structures (unusual branches: 16.2μs → 8.28μs). The `next()` optimization provides the most significant performance gain since it eliminates the need to create intermediate list objects and stops iteration immediately after finding the first commit.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Sep 2, 2025

⚡️ Codeflash found optimizations for this PR

📄 112% (1.12x) speedup for get_patches_dir_for_project in codeflash/code_utils/git_worktree_utils.py

⏱️ Runtime : 137 microseconds 64.5 microseconds (best of 5 runs)

I created a new dependent PR with the suggested changes. Please review:

If you approve, it will be merged into this PR (branch worktree/persist-optimization-patches).

misrasaurabh1 added a commit that referenced this pull request Sep 2, 2025
…25-09-02T21.53.21

⚡️ Speed up function `get_patches_dir_for_project` by 112% in PR #690 (`worktree/persist-optimization-patches`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Sep 2, 2025

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants