Skip to content

S0-AG-02: Fix Patch Path Normalization for LLM-Generated Diffs #12

@ezeanyicollins

Description

@ezeanyicollins

S0-AG-02: Fix Patch Path Normalization for LLM-Generated Diffs

Epic: Sprint-0 (Team PLG_5)
Component: Agent (AG)
Priority: High
Blocked by: None
Blocks: Full local dev workflow completion

Problem Statement

Currently, when PatchPro's LLM agent generates patches to fix code issues, the patch files contain incorrect file paths that prevent git apply from working:

Current behavior:

  • Findings contain absolute paths: /opt/andela/genai/patchpro-bot-agent-dev/src/patchpro_bot/cli.py
  • LLM receives these absolute paths and generates patches with partial/shortened paths: cli.py or diff/generator.py
  • git apply fails with: error: patch fragment without header at line 21

Expected behavior:

  • Findings should contain relative paths from git root: src/patchpro_bot/cli.py
  • LLM receives clean relative paths and generates patches with correct paths
  • git apply successfully applies patches

Impact:

  • Pre-push hook's "fix" action fails to apply patches automatically
  • Users cannot get automatic fixes, defeating the purpose of the workflow
  • Manual intervention required for every fix attempt

Root Cause Analysis

  1. Ruff and Semgrep output absolute paths in their findings
  2. RuffNormalizer and SemgrepNormalizer create Location objects with these absolute paths unchanged
  3. Findings are saved to JSON with absolute paths
  4. LLM prompt builder reads findings with absolute paths
  5. LLM tries to shorten/normalize paths but produces incorrect partial paths
  6. DiffGenerator.generate_diff_from_patch() receives incorrect paths and generates broken patches

Scope

Normalize file paths at the source (when findings are first created) to ensure all downstream components receive correct relative paths.

Tasks

  • Add _normalize_path() helper method to RuffNormalizer class

    • Use git rev-parse --show-toplevel to find git root
    • Convert absolute paths to relative from git root
    • Handle edge cases (already relative, outside git repo, no git repo)
  • Update RuffNormalizer.normalize() to use path normalization

    • Apply _normalize_path() to ruff_finding["filename"] before creating Location
    • Line ~261 in src/patchpro_bot/analyzer.py
  • Add _normalize_path() helper method to SemgrepNormalizer class

    • Same logic as RuffNormalizer
    • Handle Semgrep's path format
  • Update SemgrepNormalizer.normalize() to use path normalization

    • Apply _normalize_path() to path_obj.value before creating Location
    • Line ~380 in src/patchpro_bot/analyzer.py
  • Test the fix end-to-end

    • Delete .patchpro/ artifacts
    • Make a commit to trigger analysis
    • Verify findings.json has relative paths (not absolute)
    • Verify generated patches have correct paths matching git structure
    • Verify git apply --check patch_combined_*.diff succeeds
    • Verify pre-push hook "fix" action successfully applies patches
  • Add unit tests for path normalization

    • Test absolute → relative conversion
    • Test already-relative paths pass through
    • Test paths outside git repo
    • Test no git repo fallback

Definition of Done

  • All findings in findings.json have relative paths from git root
  • LLM-generated patches have correct relative paths in headers
  • git apply --check validates patches successfully
  • Pre-push hook "fix" action applies patches and amends commit
  • Unit tests cover path normalization edge cases
  • Code committed and pushed to agent-dev branch
  • PR feat: Complete local dev workflow with git hooks and fix diff generator #11 updated with fix

Technical Notes

Files to modify:

  • src/patchpro_bot/analyzer.py (RuffNormalizer and SemgrepNormalizer classes)

Key insight:
Path normalization must happen at finding creation time, not at patch generation time. By the time patches are generated, the LLM has already seen the wrong paths and produced incorrect output.

Implementation pattern:

def _normalize_path(self, file_path: str) -> str:
    """Convert absolute path to relative from git root."""
    if not Path(file_path).is_absolute():
        return file_path
    
    try:
        result = subprocess.run(
            ["git", "rev-parse", "--show-toplevel"],
            cwd=Path(file_path).parent,
            capture_output=True, text=True, check=True
        )
        git_root = Path(result.stdout.strip())
        return str(Path(file_path).relative_to(git_root))
    except (subprocess.CalledProcessError, ValueError):
        return file_path.lstrip('/')

Related Issues

  • Depends on: S0-AN-01 (Findings Normalization & Schema) - ensures schema supports relative paths
  • Related to: S0-AG-01 (Prompt & Guardrails for Minimal Diffs) - clean paths improve LLM context
  • Blocks: Complete local dev workflow (pre-push hook auto-fix feature)

Metadata

Metadata

Labels

pod:agent-coreAgent Core pod (prompts & guardrails)

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions