Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Oct 2, 2025

⚡️ This pull request contains optimizations for PR #790

If you approve this dependent PR, these changes will be merged into the original PR branch fix/use-git-root-as-project-root-in-worktree.

This PR will be automatically closed if the original PR is merged.


📄 509% (5.09x) speedup for project_root_from_module_root in codeflash/cli_cmds/cli.py

⏱️ Runtime : 23.9 milliseconds 3.93 milliseconds (best of 239 runs)

📝 Explanation and details

The optimization replaces the expensive module_root.parent.resolve() call with a more efficient path handling approach that avoids unnecessary filesystem operations.

Key Changes:

  • Split the expensive single line return module_root.parent.resolve() into two operations:
    1. parent = module_root.parent - gets the parent path
    2. return parent if parent.is_absolute() else parent.absolute() - conditionally applies path resolution

Why This Is Faster:
The original code called resolve() on every parent path, which performs expensive filesystem operations to resolve symlinks and canonicalize paths. The profiler shows this line consumed 84.8% of the total runtime (74.4ms out of 87.8ms).

The optimized version uses is_absolute() (a fast string check) to determine if the path is already absolute, and only calls absolute() when needed. Unlike resolve(), absolute() doesn't hit the filesystem to resolve symlinks or verify path existence - it simply converts relative paths to absolute ones through string manipulation.

Performance Impact:

  • The critical path (line that was 84.8% of runtime) now splits into two much faster operations consuming only 31.8% total (16.5% + 15.3%)
  • Overall speedup of 508% (from 23.9ms to 3.93ms)
  • Most effective for test cases involving path resolution outside the early-return conditions, showing 600-7000% improvements in deeply nested scenarios and cases where pyproject_file_path.parent != module_root

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 1034 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Generated Regression Tests and Runtime
from __future__ import annotations

from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.cli_cmds.cli import project_root_from_module_root

# unit tests

# --- Basic Test Cases ---

def test_module_root_is_pyproject_parent():
    # pyproject.toml is directly inside the module root
    module_root = Path("/home/user/project/src")
    pyproject_file_path = Path("/home/user/project/src/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.79μs -> 4.46μs (7.42% faster)

def test_module_root_is_one_level_up_pyproject():
    # pyproject.toml is one level up from module root
    module_root = Path("/home/user/project/src")
    pyproject_file_path = Path("/home/user/project/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 40.5μs -> 5.02μs (706% faster)

def test_module_root_is_two_levels_up_pyproject():
    # pyproject.toml is two levels up from module root
    module_root = Path("/home/user/project/src/package")
    pyproject_file_path = Path("/home/user/project/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 41.6μs -> 5.03μs (728% faster)

def test_in_worktree_true_returns_git_root():
    # in_worktree is True, should return git_root_dir()
    module_root = Path("/home/user/project/src")
    pyproject_file_path = Path("/home/user/project/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path, in_worktree=True); result = codeflash_output # 263μs -> 267μs (1.31% slower)

# --- Edge Test Cases ---

def test_pyproject_file_is_module_root_itself():
    # pyproject.toml is the module_root itself (unlikely but possible)
    module_root = Path("/home/user/project/src/pyproject.toml")
    pyproject_file_path = Path("/home/user/project/src/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 43.5μs -> 5.58μs (680% faster)

def test_module_root_is_root_directory():
    # module_root is the filesystem root
    module_root = Path("/")
    pyproject_file_path = Path("/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.31μs -> 4.24μs (1.68% faster)

def test_pyproject_file_path_is_relative():
    # Relative paths should be resolved correctly
    module_root = Path("src")
    pyproject_file_path = Path("pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 27.8μs -> 17.6μs (58.1% faster)

def test_module_root_and_pyproject_are_same_directory():
    # Both paths point to the same directory
    module_root = Path("/home/user/project")
    pyproject_file_path = Path("/home/user/project/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.27μs -> 4.20μs (1.69% faster)

def test_module_root_is_symlink():
    # Simulate a symlinked module_root
    module_root = Path("/tmp/symlinked_src")
    pyproject_file_path = Path("/tmp/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 27.3μs -> 4.89μs (458% faster)

def test_pyproject_file_is_in_hidden_directory():
    module_root = Path("/home/user/project/.hidden")
    pyproject_file_path = Path("/home/user/project/.hidden/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.13μs -> 4.19μs (1.43% slower)

def test_pyproject_file_is_in_nested_hidden_directory():
    module_root = Path("/home/user/project/.hidden/src")
    pyproject_file_path = Path("/home/user/project/.hidden/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 43.3μs -> 4.94μs (776% faster)

# --- Large Scale Test Cases ---

def test_deeply_nested_module_root_large_scale():
    # Module root is deeply nested (100 levels)
    base = Path("/home/user/project")
    module_root = base
    for i in range(100):
        module_root = module_root / f"level{i}"
    pyproject_file_path = base / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 337μs -> 5.19μs (6407% faster)

def test_large_number_of_sibling_directories():
    # Simulate a module_root with many siblings
    base = Path("/home/user/project")
    module_root = base / "src"
    pyproject_file_path = base / "pyproject.toml"
    # Create many sibling directories (not actually creating them, just simulating)
    siblings = [base / f"dir{i}" for i in range(999)]
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 42.9μs -> 6.11μs (602% faster)

def test_large_scale_in_worktree_true():
    # Large scale test with in_worktree True
    base = Path("/home/user/project")
    module_root = base
    for i in range(500):
        module_root = module_root / f"level{i}"
    pyproject_file_path = base / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path, in_worktree=True); result = codeflash_output # 264μs -> 269μs (1.80% slower)

def test_large_scale_module_root_is_pyproject_parent():
    # Large scale: pyproject.toml is inside a deeply nested module_root
    base = Path("/home/user/project")
    module_root = base
    for i in range(250):
        module_root = module_root / f"level{i}"
    pyproject_file_path = module_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 5.71μs -> 5.86μs (2.58% slower)

# --- Determinism and Path Normalization ---

def test_path_with_trailing_slash():
    # Paths with trailing slashes should be normalized
    module_root = Path("/home/user/project/src/")
    pyproject_file_path = Path("/home/user/project/src/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.49μs -> 4.52μs (0.686% slower)

def test_path_with_dot_and_dotdot():
    # Paths with '.' and '..' should resolve correctly
    module_root = Path("/home/user/project/./src/../src")
    pyproject_file_path = Path("/home/user/project/src/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root.resolve(), pyproject_file_path); result = codeflash_output # 4.12μs -> 4.22μs (2.35% slower)

# --- Miscellaneous ---

def test_pyproject_file_not_in_module_root_or_parent():
    # pyproject.toml is not in module_root or its parent
    module_root = Path("/home/user/project/src/package")
    pyproject_file_path = Path("/other/location/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 42.1μs -> 5.28μs (697% faster)

def test_module_root_and_pyproject_are_same_file():
    # Both point to the same file
    module_root = Path("/home/user/project/pyproject.toml")
    pyproject_file_path = Path("/home/user/project/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 38.0μs -> 4.79μs (693% faster)

def test_in_worktree_true_with_relative_paths():
    module_root = Path("src")
    pyproject_file_path = Path("pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path, in_worktree=True); result = codeflash_output # 263μs -> 266μs (1.05% slower)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from __future__ import annotations

from pathlib import Path

# imports
import pytest  # used for our unit tests
from codeflash.cli_cmds.cli import project_root_from_module_root

# unit tests

# ---------------------------
# BASIC TEST CASES
# ---------------------------

def test_module_root_is_project_root():
    # pyproject.toml is directly inside module_root
    module_root = Path("/home/user/project")
    pyproject_file_path = module_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.19μs -> 4.48μs (6.48% slower)

def test_module_root_is_subdir_of_project_root():
    # pyproject.toml is one level above module_root
    project_root = Path("/home/user/project")
    module_root = project_root / "src"
    pyproject_file_path = project_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 40.0μs -> 4.85μs (726% faster)

def test_pyproject_in_nested_subdir():
    # pyproject.toml is in a nested subdirectory of module_root
    module_root = Path("/home/user/project")
    pyproject_file_path = module_root / "nested" / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 34.8μs -> 4.64μs (650% faster)

# ---------------------------
# EDGE TEST CASES
# ---------------------------

def test_in_worktree_returns_git_root():
    # Should return git root when in_worktree is True
    module_root = Path("/home/user/project")
    pyproject_file_path = module_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path, in_worktree=True); result = codeflash_output # 260μs -> 264μs (1.36% slower)

def test_module_root_is_root_directory():
    # Edge case: module_root is filesystem root
    module_root = Path("/")
    pyproject_file_path = module_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.37μs -> 4.49μs (2.67% slower)

def test_pyproject_file_not_in_module_or_parent():
    # pyproject.toml is not in module_root or its parent
    module_root = Path("/home/user/project/src")
    pyproject_file_path = Path("/home/user/other/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 39.2μs -> 5.12μs (667% faster)

def test_module_root_is_symlink():
    # Edge case: module_root is a symlink
    # Simulate by using a path with 'symlink' in its name
    module_root = Path("/home/user/symlink_project")
    pyproject_file_path = module_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.03μs -> 4.04μs (0.248% slower)

def test_pyproject_file_is_symlink():
    # Edge case: pyproject.toml is a symlink (simulate)
    module_root = Path("/home/user/project")
    pyproject_file_path = module_root / "symlink_pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 3.94μs -> 3.98μs (1.01% slower)

def test_module_root_is_relative_path():
    # Edge case: module_root is a relative path
    module_root = Path("project")
    pyproject_file_path = module_root / "pyproject.toml"
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 4.05μs -> 3.97μs (2.04% faster)

def test_pyproject_file_is_relative_path():
    # Edge case: pyproject.toml is a relative path
    module_root = Path("/home/user/project")
    pyproject_file_path = Path("project/pyproject.toml")
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 36.3μs -> 5.09μs (613% faster)

def test_module_root_and_pyproject_are_same_file():
    # Edge case: module_root and pyproject_file_path are the same file
    module_root = Path("/home/user/project/pyproject.toml")
    pyproject_file_path = module_root
    codeflash_output = project_root_from_module_root(module_root, pyproject_file_path); result = codeflash_output # 38.2μs -> 4.63μs (724% faster)

# ---------------------------
# LARGE SCALE TEST CASES
# ---------------------------

def test_large_number_of_nested_directories():
    # Large scale: module_root is deeply nested
    base = Path("/home/user/project")
    # Create a deep nested path
    nested = base
    for i in range(100):
        nested = nested / f"dir{i}"
    pyproject_file_path = base / "pyproject.toml"
    codeflash_output = project_root_from_module_root(nested, pyproject_file_path); result = codeflash_output # 334μs -> 5.12μs (6438% faster)

def test_large_number_of_sibling_directories():
    # Large scale: simulate many sibling directories
    base = Path("/home/user/project")
    siblings = [base / f"dir{i}" for i in range(1000)]
    pyproject_file_path = base / "pyproject.toml"
    for sibling in siblings:
        codeflash_output = project_root_from_module_root(sibling, pyproject_file_path); result = codeflash_output # 19.8ms -> 2.27ms (770% faster)

def test_large_scale_in_worktree():
    # Large scale: in_worktree with many directories
    base = Path("/home/user/project")
    nested = base
    for i in range(500):
        nested = nested / f"dir{i}"
    pyproject_file_path = base / "pyproject.toml"
    codeflash_output = project_root_from_module_root(nested, pyproject_file_path, in_worktree=True); result = codeflash_output # 270μs -> 270μs (0.041% slower)

def test_large_scale_relative_paths():
    # Large scale: relative paths with many elements
    nested = Path("a")
    for i in range(500):
        nested = nested / f"b{i}"
    pyproject_file_path = Path("a/pyproject.toml")
    codeflash_output = project_root_from_module_root(nested, pyproject_file_path); result = codeflash_output # 1.57ms -> 172μs (810% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr790-2025-10-02T11.22.07 and push.

Codeflash

The optimization replaces the expensive `module_root.parent.resolve()` call with a more efficient path handling approach that avoids unnecessary filesystem operations.

**Key Changes:**
- Split the expensive single line `return module_root.parent.resolve()` into two operations:
  1. `parent = module_root.parent` - gets the parent path
  2. `return parent if parent.is_absolute() else parent.absolute()` - conditionally applies path resolution

**Why This Is Faster:**
The original code called `resolve()` on every parent path, which performs expensive filesystem operations to resolve symlinks and canonicalize paths. The profiler shows this line consumed 84.8% of the total runtime (74.4ms out of 87.8ms).

The optimized version uses `is_absolute()` (a fast string check) to determine if the path is already absolute, and only calls `absolute()` when needed. Unlike `resolve()`, `absolute()` doesn't hit the filesystem to resolve symlinks or verify path existence - it simply converts relative paths to absolute ones through string manipulation.

**Performance Impact:**
- The critical path (line that was 84.8% of runtime) now splits into two much faster operations consuming only 31.8% total (16.5% + 15.3%)
- Overall speedup of 508% (from 23.9ms to 3.93ms)
- Most effective for test cases involving path resolution outside the early-return conditions, showing 600-7000% improvements in deeply nested scenarios and cases where `pyproject_file_path.parent != module_root`
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 2, 2025
@misrasaurabh1 misrasaurabh1 requested a review from KRRT7 October 2, 2025 19:45
@KRRT7
Copy link
Contributor

KRRT7 commented Oct 2, 2025

https://github.com/python/cpython/blob/3.10/Lib/pathlib.py#L1050-L1063

    def absolute(self):
        """Return an absolute version of this path.  This function works
        even if the path doesn't point to anything.

        No normalization is done, i.e. all '.' and '..' will be kept along.
        Use resolve() to get the canonical path to a file.
        """
        # XXX untested yet!
        if self.is_absolute():
            return self
        # FIXME this must defer to the specific flavour (and, under Windows,
        # use nt._getfullpathname())
        return self._from_parts([self._accessor.getcwd()] + self._parts)

this doesn't feel safe lol

@KRRT7
Copy link
Contributor

KRRT7 commented Oct 2, 2025

it's fixed on the newer versions of cpython but not sure if we want to rely on it in this PR

@codeflash-ai codeflash-ai bot closed this Oct 3, 2025
@codeflash-ai
Copy link
Contributor Author

codeflash-ai bot commented Oct 3, 2025

This PR has been automatically closed because the original PR #790 by mohammedahmed18 was closed.

@codeflash-ai codeflash-ai bot deleted the codeflash/optimize-pr790-2025-10-02T11.22.07 branch October 3, 2025 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant