Skip to content

Conversation

@misrasaurabh1
Copy link
Contributor

@misrasaurabh1 misrasaurabh1 commented Nov 15, 2025

PR Type

Enhancement, Bug fix


Description

  • Add fuzzy match for missing function

  • Improve error message guidance

  • Type hints for functions map

  • Minor language update


Diagram Walkthrough

flowchart LR
  A["User specifies function"] -- "not found" --> B["closest_matching_file_function_name"]
  B -- "Levenshtein distance" --> C["Suggest closest function"]
  C -- "exit_with_message" --> D["Helpful suggestion shown"]
  E["Config parsing"] -- "missing codeflash block" --> F["Clearer init guidance"]
Loading

File Walkthrough

Relevant files
Documentation
config_parser.py
Clearer guidance when codeflash config missing                     

codeflash/code_utils/config_parser.py

  • Refine missing config error message.
  • Clarify running codeflash init and target file.
+1/-1     
Enhancement
functions_to_optimize.py
Fuzzy function lookup with suggestion on miss                       

codeflash/discovery/functions_to_optimize.py

  • Add Levenshtein-based closest function matcher.
  • Suggest alternative function on not found.
  • Improve types for find_all_functions_in_file.
  • Import Tuple for new helper return type.
+60/-3   

@github-actions
Copy link

github-actions bot commented Nov 15, 2025

PR Reviewer Guide 🔍

(Review updated until commit 626cec1)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Type Hint Consistency

The return annotation for closest_matching_file_function_name uses Tuple[...] but typing.Tuple is not imported; consider using tuple[...] or importing Tuple for consistency with other PEP 585 hints.

    qualified_fn_to_find: str, found_fns: dict[Path, list[FunctionToOptimize]]
) -> Tuple[Path, FunctionToOptimize] | None:
    """Find closest matching function name using Levenshtein distance.

    Args:
        qualified_fn_to_find: Function name to find in format "Class.function" or "function"
        found_fns: Dictionary of file paths to list of functions

    Returns:
        Tuple of (file_path, function) for closest match, or None if no matches found
    """
    min_distance = 4
    closest_match = None
    closest_file = None

    qualified_fn_to_find = qualified_fn_to_find.lower()

    for file_path, functions in found_fns.items():
        for function in functions:
            # Compare either full qualified name or just function name
            fn_name = function.qualified_name.lower()
            dist = levenshtein_distance(qualified_fn_to_find, fn_name)

            if dist < min_distance:
                min_distance = dist
                closest_match = function
                closest_file = file_path

    if closest_match is not None:
        return closest_file, closest_match
    return None
Variable Naming

In levenshtein_distance, variable newDistances uses camelCase in an otherwise snake_case codebase; align naming for readability.

def levenshtein_distance(s1: str, s2: str):
    if len(s1) > len(s2):
        s1, s2 = s2, s1
    distances = range(len(s1) + 1)
    for index2, char2 in enumerate(s2):
        newDistances = [index2 + 1]
        for index1, char1 in enumerate(s1):
            if char1 == char2:
                newDistances.append(distances[index1])
            else:
                newDistances.append(1 + min((distances[index1], distances[index1 + 1], newDistances[-1])))
        distances = newDistances
    return distances[-1]
Threshold Tuning

The fuzzy-match min_distance=4 is hardcoded; confirm this threshold avoids over-eager suggestions on short names and consider exposing it or scaling by length.

min_distance = 4
closest_match = None
closest_file = None

qualified_fn_to_find = qualified_fn_to_find.lower()

for file_path, functions in found_fns.items():
    for function in functions:
        # Compare either full qualified name or just function name
        fn_name = function.qualified_name.lower()
        dist = levenshtein_distance(qualified_fn_to_find, fn_name)

        if dist < min_distance:
            min_distance = dist
            closest_match = function
            closest_file = file_path

if closest_match is not None:

@github-actions
Copy link

github-actions bot commented Nov 15, 2025

PR Code Suggestions ✨

Latest suggestions up to 626cec1
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Stabilize Levenshtein iteration

Convert distances to a list to avoid issues from using a range object across
iterations, and use consistent snake_case naming for new_distances. This prevents
subtle bugs in Python 3 where range is not a list and improves readability.

codeflash/discovery/functions_to_optimize.py [304-316]

 def levenshtein_distance(s1: str, s2: str):
     if len(s1) > len(s2):
         s1, s2 = s2, s1
-    distances = range(len(s1) + 1)
+    distances = list(range(len(s1) + 1))
     for index2, char2 in enumerate(s2):
-        newDistances = [index2 + 1]
+        new_distances = [index2 + 1]
         for index1, char1 in enumerate(s1):
             if char1 == char2:
-                newDistances.append(distances[index1])
+                new_distances.append(distances[index1])
             else:
-                newDistances.append(1 + min((distances[index1], distances[index1 + 1], newDistances[-1])))
-        distances = newDistances
+                new_distances.append(1 + min(distances[index1], distances[index1 + 1], new_distances[-1]))
+        distances = new_distances
     return distances[-1]
Suggestion importance[1-10]: 7

__

Why: Converting range to a list and using consistent snake_case improves correctness and readability; although current code likely works, the change avoids potential iteration pitfalls and is a safe enhancement.

Medium
Fix incompatible dict annotation

Avoid using the dict[...] type hint subscript for runtime assignment on Python
versions <3.9 or when from future import annotations changes evaluation timing.
Use typing.Dict (or drop the annotation) to prevent potential runtime TypeError.
This keeps compatibility consistent with the rest of the file's annotations.

codeflash/discovery/functions_to_optimize.py [204]

-functions: dict[Path, list[FunctionToOptimize]] = find_all_functions_in_file(file)
+functions: Dict[Path, List[FunctionToOptimize]] = find_all_functions_in_file(file)
Suggestion importance[1-10]: 2

__

Why: The file already uses from __future__ import annotations, making the dict[...] annotation safe; switching to Dict/List adds no clear benefit and would require extra imports, so impact is minimal.

Low
General
Prevent parameter shadowing

Avoid reassigning the file parameter when unpacking the closest match; this can
mislead subsequent logic and messages. Use distinct local names for the suggested
match to prevent shadowing and keep error outputs accurate.

codeflash/discovery/functions_to_optimize.py [225-237]

 if found_function is None:
     if is_lsp:
         return functions, 0, None
     found = closest_matching_file_function_name(only_get_this_function, functions)
     if found is not None:
-        file, found_function = found
+        suggested_file, suggested_function = found
         exit_with_message(
             f"Function {only_get_this_function} not found in file {file}\nor the function does not have a 'return' statement or is a property.\n"
-            f"Did you mean {found_function.qualified_name} instead?"
+            f"Did you mean {suggested_function.qualified_name} in {suggested_file} instead?"
         )
 
     exit_with_message(
         f"Function {only_get_this_function} not found in file {file}\nor the function does not have a 'return' statement or is a property"
     )
Suggestion importance[1-10]: 6

__

Why: Avoiding reassignment of file prevents confusion and keeps error messages precise; it's a reasonable maintainability improvement though not critical to functionality.

Low

Previous suggestions

Suggestions up to commit 28125fc
CategorySuggestion                                                                                                                                    Impact
General
Avoid hardcoded file name in error

The message is overly specific to pyproject.toml and may mislead users if a
different config file is used. Reference the actual config_file_path and keep
instructions generic to avoid confusion.

codeflash/code_utils/config_parser.py [108]

-msg = f"Could not find the 'codeflash' block in the config file {config_file_path}. Please run 'codeflash init' to add Codeflash config in the pyproject.toml config file."
+msg = (
+    f"Could not find the 'codeflash' block in the config file {config_file_path}. "
+    "Please run 'codeflash init' to add Codeflash configuration to the config file."
+)
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly targets line 108 in the new hunk and proposes a clearer, file-agnostic error message using config_file_path. This improves usability but is a minor wording change, not a functional fix.

Low

@misrasaurabh1 misrasaurabh1 changed the title language update Language update and more helpful error message Nov 17, 2025
@misrasaurabh1 misrasaurabh1 marked this pull request as ready for review November 17, 2025 17:11
@github-actions
Copy link

Persistent review updated to latest commit 626cec1

The optimized version achieves an **11% speedup** through several key memory and algorithmic optimizations:

**Primary Optimizations:**

1. **Pre-allocated buffer reuse**: Instead of creating a new `newDistances` list on every iteration (16,721 allocations in the profiler), the optimized version uses two pre-allocated lists (`previous` and `current`) that are swapped via reference assignment. This eliminates ~16K list allocations per call.

2. **Eliminated tuple construction in min()**: The original code creates a 3-element tuple for `min((a, b, c))` 8+ million times. The optimized version uses inline comparisons (`a if a < b else b`), avoiding tuple overhead entirely.

3. **Direct indexing over enumerate**: Replaced `enumerate(s1)` and `enumerate(s2)` with `range(len1)` and direct indexing, eliminating tuple unpacking overhead in the inner loops.

4. **Cached string lengths**: Pre-computing `len1` and `len2` avoids repeated `len()` calls.

**Performance Impact by Test Case:**
- **Medium-length strings** (6-10 chars): 20-30% faster - best case for the optimizations
- **Large identical/similar strings** (1000+ chars): 20-25% faster for different strings, but slower for identical strings due to overhead 
- **Very short strings** (1-2 chars): Often 10-20% slower due to setup overhead outweighing benefits
- **Empty string cases**: Consistently slower due to initialization costs

**Context Impact:**
The function is used in `closest_matching_file_function_name()` for fuzzy matching function names. Since this involves comparing many short-to-medium function names, the optimization should provide measurable benefits in code discovery workflows where hundreds of function name comparisons occur.

The optimization is most effective for the common case of comparing function names (typically 5-20 characters), where memory allocation savings outweigh setup costs.
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 17, 2025

⚡️ Codeflash found optimizations for this PR

📄 12% (0.12x) speedup for levenshtein_distance in codeflash/discovery/functions_to_optimize.py

⏱️ Runtime : 1.91 seconds 1.71 seconds (best of 6 runs)

A dependent PR with the suggested changes has been created. Please review:

If you approve, it will be merged into this PR (branch small-fixes).

Static Badge

Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 17, 2025

This PR is now faster! 🚀 Saurabh Misra accepted my code suggestion above.

…25-11-17T17.24.40

⚡️ Speed up function `levenshtein_distance` by 12% in PR #924 (`small-fixes`)
@codeflash-ai
Copy link
Contributor

codeflash-ai bot commented Nov 17, 2025

This PR is now faster! 🚀 @misrasaurabh1 accepted my optimizations from:

# Conflicts:
#	codeflash/discovery/functions_to_optimize.py
@aseembits93 aseembits93 merged commit 848faa5 into main Nov 17, 2025
21 of 22 checks passed
@aseembits93 aseembits93 deleted the small-fixes branch November 17, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants