## Week 03 - Auto-Review

This week, you will explore the application of large language models (LLMs) to **code review and quality assessment**, a critical component of the software development lifecycle. You will implement an "Auto-Review" system that automatically analyzes code and identifies potential issues, weaknesses, or areas for improvement.

## Goal: Automated Code Assessment
Your primary task is to implement an `auto_review` function that forms the core of this system. This function will act as a **code quality oracle**, using an LLM to generate a structured and actionable list of potential problems based on a full codebase or file.

**Input**: A filesystem path (to a file or a folder) containing the code to be reviewed.

**Output**: A list of potential issues, where each issue is an object containing at least a `description` (of the issue), a `severity` (e.g., 'Critical', 'Major', 'Minor', 'Suggestion'), and the relevant `file_path` and `line_number`.

Focus on designing an effective LLM prompt and context strategy that allows the model to comprehensively analyze the entire provided codebase or file and return a reliable, structured list of high-quality review comments. Consider how to manage context window limitations when analyzing large files or multiple files.

## Provided Setup & Resources
This setup provides a few helper foundations:

`hslu/dlm03/util/filesystem.py`: This file provides utilities for reading code from a given path. It includes functions to recursively list files in a directory and to read file content, which will be essential for feeding the code into your LLM.

`hslu/dlm03/devops/language_server.py`: This file defines the **Pydantic class** for the output issue list. Use this schema for **structured output generation** (e.g., via `response_format` in the API call) to ensure your LLM reliably returns the required list of issue objects.

`hslu/dlm03/common/backend.py`: This file includes helper functions for using various LLM providers (including OpenAI, Gemini or local models using LLama.cpp). Feel free to use the provider you are most familiar with, or add new ones!

## Environment Variables & Imports

Use the following cells to add any needed environment variables (like API keys) **before** loading any of the python modules.

In [None]:
import initialize_notebook # noqa

In [None]:

import pathlib
import random

import jinja2

from hslu.dlm03.common import backend as backend_lib
from hslu.dlm03.tools import lint
from hslu.dlm03.util import ipython_utils, unified_diff

In [None]:
BACKEND_CONFIG = backend_lib.Gemini2p5Flash()
BACKEND = BACKEND_CONFIG.get_backend()

## Auto-review Implementation

In [None]:

AUTOREVIEW_PROMPT_TEMPLATE = """---
## Role

You're a senior software engineer conducting a thorough code review. Provide constructive, actionable feedback.

## Review Areas

Analyze the selected code for:

1. **Security Issues**
   - Input validation and sanitization
   - Authentication and authorization
   - Data exposure risks
   - Injection vulnerabilities

2. **Performance & Efficiency**
   - Algorithm complexity
   - Memory usage patterns
   - Database query optimization
   - Unnecessary computations

3. **Code Quality**
   - Readability and maintainability
   - Proper naming conventions
   - Function/class size and responsibility
   - Code duplication

4. **Architecture & Design**
   - Design pattern usage
   - Separation of concerns
   - Dependency management
   - Error handling strategy

5. **Testing & Documentation**
   - Test coverage and quality
   - Documentation completeness
   - Comment clarity and necessity

## Output Format

You should output a list of JSON objects with the following schema:
{
  "title": "Issue",
  "type": "object",
  "properties": {
    "file": {
      "type": "string",
      "description": "The path to the file where the error occurred."
    },
    "line": {
      "type": "integer",
      "description": "The line number where the error occurred."
    },
    "column": {
      "type": "integer",
      "description": "The column number where the error occurred."
    },
    "message": {
      "type": "string",
      "description": "A description of the error."
    },
    "hint": {
      "type": ["string", "null"],
      "description": "An optional hint to resolve the error."
    },
    "code": {
      "type": ["string", "null"],
      "description": "The error code (if any)."
    },
    "severity": {
      "type": "string",
      "description": "The severity of the error (e.g., 'error', 'note')."
    }
  },
  "required": [
    "file",
    "line",
    "column",
    "message",
    "severity"
  ],
  "additionalProperties": false
}

Provide feedback as:

**ðŸ”´ Critical Issues** - Must fix before merge
**ðŸŸ¡ Suggestions** - Improvements to consider
**âœ… Good Practices** - What's done well

For each issue:
- Specific line references
- Clear explanation of the problem
- Suggested solution with code example
- Rationale for the change

Be constructive and educational in your feedback.

## Code
{% for filename, content in files.items() %}
# {{ filename }}:
{{ content }}

{% endfor %}
"""
AUTO_REVIEW_PROMPT = jinja2.Template(AUTOREVIEW_PROMPT_TEMPLATE, undefined=jinja2.StrictUndefined)

In [None]:
def auto_review(backend: backend_lib.LLMBackend, path: pathlib.Path, auto_review_prompt_template: jinja2.Template) -> list[lint.Issue]:
    files = [base / file for base, _, filenames in path.walk()  for file in filenames]
    files = [filename for filename in files if filename.name.endswith(".py")]
    files_content = {str(filename): filename.read_text() for filename in files}
    response = backend(
        messages=[
            dict(role="system", content=auto_review_prompt_template.render(files=files_content)),
            dict(role="user", content=" "),
        ],
        response_format=lint.Issues,
    )
    issues = random.choice(response.choices).message.parsed
    return issues.issues

In [None]:
FILE_PATH = pathlib.Path("/Users/vincent/Development/Valinor/valinor/hslu/dlm03/common/")
issues = auto_review(BACKEND, FILE_PATH, AUTO_REVIEW_PROMPT)
ipython_utils.display_autofix(lambda: issues, lambda issue: issue.fix.to_unified_diff(issue.filename, pathlib.Path(issue.filename).read_text().splitlines()) if issue.fix else unified_diff.UnifiedDiff(from_file=issue.filename, to_file=issue.filename, hunks=[]), None, False)