S0-AG-02: Fix Patch Path Normalization for LLM-Generated Diffs
Epic: Sprint-0 (Team PLG_5)
Component: Agent (AG)
Priority: High
Blocked by: None
Blocks: Full local dev workflow completion
Problem Statement
Currently, when PatchPro's LLM agent generates patches to fix code issues, the patch files contain incorrect file paths that prevent git apply from working:
Current behavior:
- Findings contain absolute paths:
/opt/andela/genai/patchpro-bot-agent-dev/src/patchpro_bot/cli.py
- LLM receives these absolute paths and generates patches with partial/shortened paths:
cli.py or diff/generator.py
git apply fails with: error: patch fragment without header at line 21
Expected behavior:
- Findings should contain relative paths from git root:
src/patchpro_bot/cli.py
- LLM receives clean relative paths and generates patches with correct paths
git apply successfully applies patches
Impact:
- Pre-push hook's "fix" action fails to apply patches automatically
- Users cannot get automatic fixes, defeating the purpose of the workflow
- Manual intervention required for every fix attempt
Root Cause Analysis
- Ruff and Semgrep output absolute paths in their findings
RuffNormalizer and SemgrepNormalizer create Location objects with these absolute paths unchanged
- Findings are saved to JSON with absolute paths
- LLM prompt builder reads findings with absolute paths
- LLM tries to shorten/normalize paths but produces incorrect partial paths
DiffGenerator.generate_diff_from_patch() receives incorrect paths and generates broken patches
Scope
Normalize file paths at the source (when findings are first created) to ensure all downstream components receive correct relative paths.
Tasks
Definition of Done
Technical Notes
Files to modify:
src/patchpro_bot/analyzer.py (RuffNormalizer and SemgrepNormalizer classes)
Key insight:
Path normalization must happen at finding creation time, not at patch generation time. By the time patches are generated, the LLM has already seen the wrong paths and produced incorrect output.
Implementation pattern:
def _normalize_path(self, file_path: str) -> str:
"""Convert absolute path to relative from git root."""
if not Path(file_path).is_absolute():
return file_path
try:
result = subprocess.run(
["git", "rev-parse", "--show-toplevel"],
cwd=Path(file_path).parent,
capture_output=True, text=True, check=True
)
git_root = Path(result.stdout.strip())
return str(Path(file_path).relative_to(git_root))
except (subprocess.CalledProcessError, ValueError):
return file_path.lstrip('/')
Related Issues
- Depends on: S0-AN-01 (Findings Normalization & Schema) - ensures schema supports relative paths
- Related to: S0-AG-01 (Prompt & Guardrails for Minimal Diffs) - clean paths improve LLM context
- Blocks: Complete local dev workflow (pre-push hook auto-fix feature)
S0-AG-02: Fix Patch Path Normalization for LLM-Generated Diffs
Epic: Sprint-0 (Team PLG_5)
Component: Agent (AG)
Priority: High
Blocked by: None
Blocks: Full local dev workflow completion
Problem Statement
Currently, when PatchPro's LLM agent generates patches to fix code issues, the patch files contain incorrect file paths that prevent
git applyfrom working:Current behavior:
/opt/andela/genai/patchpro-bot-agent-dev/src/patchpro_bot/cli.pycli.pyordiff/generator.pygit applyfails with:error: patch fragment without header at line 21Expected behavior:
src/patchpro_bot/cli.pygit applysuccessfully applies patchesImpact:
Root Cause Analysis
RuffNormalizerandSemgrepNormalizercreateLocationobjects with these absolute paths unchangedDiffGenerator.generate_diff_from_patch()receives incorrect paths and generates broken patchesScope
Normalize file paths at the source (when findings are first created) to ensure all downstream components receive correct relative paths.
Tasks
Add
_normalize_path()helper method toRuffNormalizerclassgit rev-parse --show-toplevelto find git rootUpdate
RuffNormalizer.normalize()to use path normalization_normalize_path()toruff_finding["filename"]before creatingLocationsrc/patchpro_bot/analyzer.pyAdd
_normalize_path()helper method toSemgrepNormalizerclassUpdate
SemgrepNormalizer.normalize()to use path normalization_normalize_path()topath_obj.valuebefore creatingLocationsrc/patchpro_bot/analyzer.pyTest the fix end-to-end
.patchpro/artifactsfindings.jsonhas relative paths (not absolute)git apply --check patch_combined_*.diffsucceedsAdd unit tests for path normalization
Definition of Done
findings.jsonhave relative paths from git rootgit apply --checkvalidates patches successfullyagent-devbranchTechnical Notes
Files to modify:
src/patchpro_bot/analyzer.py(RuffNormalizer and SemgrepNormalizer classes)Key insight:
Path normalization must happen at finding creation time, not at patch generation time. By the time patches are generated, the LLM has already seen the wrong paths and produced incorrect output.
Implementation pattern:
Related Issues