<a href="https://www.kaggle.com/code/mrbasm/google-capstone-java-ai-reviewer?scriptVersionId=234674876" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Automated Java Code Review with Generative AI

This notebook demonstrates an automated approach to Java code review using the power of generative AI. It leverages the Gemini AI model to analyze Java source code, identify potential issues, and generate comprehensive review reports.

The primary goal is to streamline the code review process, enhance code quality, and provide developers with actionable feedback. This is achieved through the following key steps:

1.  **Code Acquisition:** Fetching Java source code files from a specified location.
2.  **AI-Powered Review:** Utilizing the Gemini model to analyze the code and generate detailed reviews.
3.  **Review Evaluation:** Employing AI to assess the quality and effectiveness of the generated reviews.
4.  **Report Generation:** Creating structured Markdown reports summarizing the review findings.

This notebook provides a practical implementation of how generative AI can be applied to improve software development workflows, specifically in the context of code review. It includes code examples, explanations, and evaluation metrics to illustrate the process and its benefits.

## Alignment with Project Requirements and Real-World Application

This notebook directly addresses the core requirements of the Generative AI Capstone project by demonstrating the application of generative AI to a real-world software engineering problem: code review. This project is particularly motivated by the developer's own experience, where frequent Java code reviews consume a significant amount of time and effort. By automating this process, the notebook aims to provide a solution that is both academically relevant and practically beneficial.

Here's how it aligns with the key objectives and demonstrates at least three required generative AI capabilities, while also showcasing various prompting techniques:

* **Utilization of Generative AI:** The notebook prominently features the use of the Gemini generative AI model to analyze and provide feedback on Java code. This goes beyond simple code analysis tools by leveraging AI to understand the code's context and generate human-like review comments.
* **Improvement of Software Development Workflows:** The automated code review process implemented in the notebook significantly streamlines the traditional manual review process. It reduces the time and effort required for code reviews, freeing up developers (including the project creator) to focus on other critical tasks. This directly improves the efficiency of software development workflows in real-world scenarios.
* **Focus on a Specific Software Engineering Challenge:** Code review is a crucial but often time-consuming aspect of software development. This notebook tackles this specific challenge by providing an automated solution that enhances the accuracy, consistency, and speed of the review process. This is especially valuable in contexts where developers, like the project creator, spend a substantial portion of their time on code reviews.
* **Demonstrated Gen AI Capabilities and Prompting Techniques:**
    * **Document understanding (Code Files):** The notebook analyzes and understands Java code *provided as a file*. The AI model interprets the structure, syntax, and semantics of the code within the file to generate review feedback. This demonstrates the model's ability to process and understand code in a file-based context, which is essential for practical code review applications.
    * **Prompting Techniques:** The notebook demonstrates a range of prompting techniques:
        * **Few-shot prompting:** The `review_summary` function uses few-shot prompting by providing examples of how to summarize code reviews.
        * **One-shot prompting:** The `review_score` function uses one-shot prompting by giving one example of how to score the code.
        * **Zero-shot prompting:** The `review_file` function uses zero-shot prompting, relying on the model's inherent knowledge to perform the code review without explicit examples.
    * **Gen AI evaluation:** The `evaluate_review_by_llm` function demonstrates Gen AI evaluation. The AI model itself assesses the quality of its generated code reviews, providing scores and explanations for various criteria. This helps ensure the reliability and usefulness of the automated review process.
    * **Structured output / JSON mode / controlled generation:**  
  This project makes use of Gemini’s native JSON generation mode by setting `response_mime_type = "application/json"`. This allows for fully structured, machine-readable output that can be safely parsed and integrated into other tools or systems. It ensures consistency and reduces the need for post-processing or fragile text parsing.
* **Evaluation and Metrics:** The notebook includes an evaluation mechanism where the AI model itself assesses the quality of its code reviews. This demonstrates a focus on evaluating the effectiveness of the AI-driven solution, which is essential for any AI project. Metrics such as relevance score, correctness score, etc., are used to quantify the evaluation.
* **Practical Implementation:** The notebook provides a complete and practical implementation of the proposed solution. It includes code for fetching files, interacting with the AI model, generating reports, and evaluating the reviews. This demonstrates the feasibility and real-world applicability of the approach.

In summary, this notebook effectively leverages generative AI to address a key software engineering challenge, providing a practical and evaluated solution that aligns with the core goals of the Generative AI Capstone project. It demonstrates multiple required Gen AI capabilities, showcases various prompting techniques, and offers a pathway for further enhancement, directly addressing the time-consuming nature of code reviews faced by developers in their daily work.

## 🔧 Setup  
Install Gemini SDK and other required libraries for prompt generation. Load essential libraries including `google.generativeai`, `pandas`, and file helpers.

In [1]:
# Install the required version of the Google GenAI SDK
!pip install -Uq "google-genai==1.7.0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m8.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# Import required libraries for file handling, datetime operations, and the Google GenAI SDK
import random, os, glob, datetime, re, json
import pandas as pd
from kaggle_secrets import UserSecretsClient
from google import genai
from IPython.display import Markdown, display
from google.api_core import retry

# Define constants for the project setup and model configuration
PROJECT_PATH = '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles'
MODEL = 'gemini-2.0-flash'
SOURCE_FILE_EXTENSION = ".java"
DATETIME_FILENAME = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
DATETIME_HEADER = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Authenticate the GenAI client using the API key from Kaggle Secrets
client = genai.Client(api_key=UserSecretsClient().get_secret("GOOGLE_API_KEY"))

# Setup a retry helper, so we don't have to worry about the per minute quota
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

genai.models.Models.generate_content = retry.Retry(
    predicate=is_retriable)(genai.models.Models.generate_content)

# See version of Google Genai
genai.__version__

'1.7.0'

## 📂 Input Data

Clone the GitHub repo and gather Java files to be reviewed. These files will be passed to Gemini for analysis.

In [3]:
# Load review content (Java files) from the project directory
# Possibility to expand: compare a Git branch with main and only review the adjusted files
!git clone https://github.com/MrBasM/Google-Capstone-Java-AI-Reviewer.git

# Search for the Java files in the project_path and its subfolders.
files = glob.glob(os.path.join(PROJECT_PATH, "**", "*.java"), recursive=True)
if not files:
  print("⚠️ No Java files found.")

filenames = [os.path.relpath(path, PROJECT_PATH) for path in files]
java_files = dict(zip(filenames, files))

java_files

Cloning into 'Google-Capstone-Java-AI-Reviewer'...
remote: Enumerating objects: 19, done.[K
remote: Counting objects: 100% (19/19), done.[K
remote: Compressing objects: 100% (14/14), done.[K
remote: Total 19 (delta 2), reused 14 (delta 2), pack-reused 0 (from 0)[K
Receiving objects: 100% (19/19), 17.93 KiB | 4.48 MiB/s, done.
Resolving deltas: 100% (2/2), done.


{'AccountManager.java': '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/AccountManager.java',
 'User.java': '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/User.java',
 'orderservice/OrderService.java': '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/orderservice/OrderService.java'}

## 🚀 Main: Generate Review

This section contains the core logic of the project: how we generate structured code reviews using Gemini.

### 🧠 Prompt Engineering

We define carefully constructed prompts using both zero-shot and few-shot strategies. These prompts instruct Gemini to analyze Java code across key dimensions such as structure, readability, security, and Java best practices.

The summary prompt uses few-shot examples to guide Gemini toward producing concise, focused output.

### 📄 Document Understanding

We upload full `.java` files to Gemini using `client.files.upload(...)`. This enables the model to understand the file in its entirety — including class structure, method signatures, and logic flow — which is crucial for meaningful code review.

### 🧱 Native JSON Output Mode

To ensure reliable and machine-readable output, we make use of Gemini's native JSON generation capability by setting:
```python
response_mime_type = "application/json"
```


In [4]:
# Submit the prompt to the model and return its generated response
# Reads the Java source file (document understanding) 
def get_llm_response(prompt, config, filename=None):
    if filename:
        file = client.files.upload(file=filename)
        contents = [prompt, file]
    else:
        contents = [prompt]

    response = client.models.generate_content(
      model=MODEL,
      config=config,
      contents=contents,
    )
    return response.text

In [5]:
# Define a reusable zero-shot prompt for the Java code review, used also in the evaluation
date = datetime.datetime.now().strftime("%d.%m.%Y")
review_prompt = f"""You are an expert Java code reviewer.
    
        Please analyze the following Java class(es) and provide constructive feedback in markdown format. Focus on the following aspects:

        - Code quality and structure
        - Readability, naming conventions, spelling mistakes
        - Adherence to Java best practices
        - Performance and security considerations
        - Potential bugs or edge cases

        Response output:
        - Output format is a formal document
        - Format your response using bullet points under each category,
        - Use clear, concise, and professional language.
        - Each point must begin with a line number.  
        - Add today's date to the document, {date}
        - Provide a codeblock with an example of optimized code
        - Add a summary of changes in optimized Code
        
    """

# Submit the review prompt along with a Java file and return Gemini's response.
# Optionally request structured output in JSON format for downstream processing.
def review_file(prompt, filename, json_output=False):   
    # Temperature 0.6 found to balance coherence and variability in summaries
    temperature = 0.6
    output_format = "application/json" if json_output else "text/plain"
    
    config = genai.types.GenerateContentConfig(
            temperature=temperature, 
            response_mime_type=output_format
    )
    
    return get_llm_response(prompt=prompt, config=config, filename=filename)

In [6]:
# Use a one-shot prompt to obtain an overall quality score for a given Java file
def review_score(filename):
    prompt = """
        Give the code a overal score. 
        A score from 1 to 10 where 10 is perfect. 
        Give only the score without any justification, like score: 8

        here is an example>

        Code:
        ```java
        import java.util.ArrayList;
        import java.util.List;
    
        public class DataProcessor {
            public List<String> process(List data) { // Raw List type used
                List<String> results = new ArrayList<>();
                for (Object item : data) {
                    results.add(item.toString().trim());
                }
                return results;
            }
        }
        ```
        Score: 6
    """

    # Temperature 0.6 gives the most stable and useful results for scoring
    config = genai.types.GenerateContentConfig(temperature=0.6)
    return get_llm_response(prompt=prompt, config=config, filename=filename)

In [7]:
# Generate a concise summary of the review using a few-shot prompting strategy
def review_summary(code_review):
    prompt = f"""
        Summarize the following code review in 1 or 2 concise and informative sentences, directly stating the most important issues and suggested improvements.

        Example 1:
        Code review:
        - Line 15: The variable 'count' is not used.
        - Line 22: Consider a more descriptive name for the function 'process'.
        Summary: Unused variable 'count' on line 15; suggest a clearer name for the 'process' function on line 22.

        Example 2:
        Code review:
        - Line 8: Potential risk of a NullPointerException.
        - Lines 30-35: The logic for handling errors could be more robust.
        Summary: Potential NullPointerException risk on line 8; error handling logic on lines 30-35 could be more robust.

        Code review: {code_review}
        Summary:
        """
    
    # Temperature 0.6 gives the most stable and useful results for scoring
    config = genai.types.GenerateContentConfig(temperature=0.6)
    
    return get_llm_response(prompt=prompt, config=config)

In [8]:
# Initialize a DataFrame to store the results of each file's review
df_review = pd.DataFrame(columns=["filename","review", "review_json", "summary", "score"])

# Loop over all Java source files, run the review, and store results in the DataFrame
for filename, full_path in java_files.items():
    code_review = review_file(review_prompt, full_path)
    code_review_json = review_file(review_prompt, full_path, json_output=True)
    df_review.loc[len(df_review)] = {
        "filename" : filename,
        "review" : code_review,
        "review_json" : code_review_json,
        "summary" : review_summary(code_review),
        "score" : review_score(full_path)
    }

In [9]:
# Display the full review dataframe
df_review

Unnamed: 0,filename,review,review_json,summary,score
0,AccountManager.java,# Code Review - AccountManager Class\n**Date:*...,"{\n ""reviewDate"": ""18.04.2025"",\n ""className...",The `AccountManager` class lacks encapsulation...,Score: 4\n
1,User.java,# Code Review - User Class\n**Date:** 18.04.20...,"{\n ""reviewDate"": ""18.04.2025"",\n ""className...",The `setAge` method requires more robust valid...,Score: 8\n
2,orderservice/OrderService.java,# Java Code Review - OrderService\n\n**Date:**...,"{\n ""reviewDate"": ""18.04.2025"",\n ""className...",The code review identifies a need for more rob...,Score: 7\n


In [10]:
# Show the first code review
display(Markdown(df_review["review"][0]))

# Code Review - AccountManager Class
**Date:** 18.04.2025

This document provides a detailed code review of the provided `AccountManager` Java class, focusing on code quality, readability, best practices, performance, security, and potential issues.

## Code Quality and Structure

*   **L1:** The class structure is basic but functional. However, it lacks proper encapsulation and documentation.
*   **L1:** The class lacks Javadoc documentation explaining its purpose.
*   **L2, L3:** Instance variables `n` and `a` should be private to enforce encapsulation.
*   **L6-L9:** The constructor is simple but adequate. Consider adding validation for input parameters.
*   **L11-L17:** The `d()` method lacks a descriptive name.
*   **L19-L25:** The `p()` method lacks a descriptive name and its purpose is unclear.
*   **L27-L33:** The `x()` method lacks a descriptive name, and the magic number `1234` should be replaced with a named constant.

## Readability and Naming Conventions

*   **L2:** Variable `n` is not descriptive. It should be renamed to `name`.
*   **L3:** Variable `a` is not descriptive. It should be renamed to `age`.
*   **L1:** Class name `AccountManager` is appropriate.
*   **L11:** Method name `d()` is not descriptive. Rename it to something meaningful like `checkAccess` or `displayWelcomeMessage`.
*   **L19:** Method name `p()` is not descriptive. Rename it to something meaningful like `performCheck`.
*   **L27:** Method name `x()` is not descriptive. Rename it to something meaningful like `validateAccessCode`.
*   **L30:** Consider using a constant for the magic number `1234` to improve readability and maintainability.

## Adherence to Java Best Practices

*   **L2, L3:** Instance variables should be declared as `private` to adhere to encapsulation principles. Use getter methods if external access is required.
*   The class lacks Javadoc comments, which are essential for maintainability and API documentation.
*   Magic numbers, like `1234`, should be avoided. Use named constants instead.

## Performance and Security Considerations

*   The provided code doesn't exhibit any immediate performance bottlenecks.
*   The access code validation in the `x()` method is vulnerable to brute-force attacks. Consider implementing more robust authentication mechanisms for real-world scenarios.

## Potential Bugs or Edge Cases

*   The constructor does not validate the input `age`. A negative age or an age that is unrealistically high could lead to unexpected behavior.
*   The `x()` method uses a hardcoded access code. This poses a significant security risk.

## Optimized Code Example

```java
/**
 * Manages user accounts and their access permissions.
 */
public class AccountManager {

    private String name;
    private int age;
    private static final int ACCESS_CODE = 1234;

    /**
     * Constructs an AccountManager object.
     *
     * @param name The name of the account.
     * @param age  The age of the account holder.
     * @throws IllegalArgumentException if age is negative.
     */
    public AccountManager(String name, int age) {
        if (age < 0) {
            throw new IllegalArgumentException("Age cannot be negative.");
        }
        this.name = name;
        this.age = age;
    }

    /**
     * Checks if the account holder has access based on their age.
     */
    public void checkAccess() {
        if (age < 18) {
            System.out.println("No access");
        } else {
            System.out.println("Welcome " + name);
        }
    }

    /**
     * Performs a checking operation.
     */
    public void performCheck() {
        for (int i = 0; i < 100; i++) {
            if (i % 5 == 0) {
                System.out.println("Checking " + i);
            }
        }
    }

    /**
     * Validates the provided access code.
     *
     * @param accessCode The access code to validate.
     */
    public void validateAccessCode(int accessCode) {
        if (accessCode == ACCESS_CODE) {
            System.out.println("Access granted");
        } else {
            System.out.println("Denied");
        }
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }
}
```

### Summary of changes in optimized Code:

*   Added Javadoc comments to the class and methods.
*   Renamed variables `n` to `name` and `a` to `age` for better readability.
*   Renamed methods `d()` to `checkAccess()`, `p()` to `performCheck()`, and `x()` to `validateAccessCode()` for better clarity.
*   Made instance variables `name` and `age` private and added getter methods.
*   Introduced a constant `ACCESS_CODE` for the magic number `1234`.
*   Added input validation to the constructor to prevent negative ages.


In [11]:
# Attempt to parse the output from Gemini as JSON.
# This helps verify whether the model returned well-structured, machine-readable data.

data = df_review["review_json"][0]

try:
    parsed = json.loads(data)
    print("✅ Valid JSON!\n")
except json.JSONDecodeError as e:
    print("❌ Invalid JSON:\n")
    print(e)

✅ Valid JSON!



In [12]:
# Display the first summary
display(Markdown(df_review["summary"][0]))

The `AccountManager` class lacks encapsulation (variables `n`, `a` should be private), Javadoc documentation, and descriptive names for methods `d()`, `p()`, and `x()`; also, the magic number `1234` in `x()` should be replaced with a named constant, and input validation should be added to the constructor. The `x()` method is also vulnerable to brute-force attacks.


In [13]:
# Display the overall score assigned to the first reviewed Java file
display(Markdown(df_review["score"][0]))

Score: 4


## 📤 Output: Review Results

Results include the review, summary, and score. This section also explores structured output to files.

In [14]:
# Export structured code review results to both JSON and Markdown
# For each reviewed Java file, save:
# - the full structured JSON response (machine-readable)
# - the Markdown-formatted review (human-readable)
# Output filenames include a timestamp to keep results organized.

for index, row in df_review.iterrows():
    base_filename = java_files[row['filename']][:-len(SOURCE_FILE_EXTENSION)]
    timestamp = f"_{DATETIME_FILENAME}"
    
    # Export JSON
    json_path = base_filename + timestamp + ".json"
    with open(json_path, "w") as file:
        file.write(row['review_json'])

    # Export Markdown
    md_path = base_filename + timestamp + ".md"
    with open(md_path, "w") as file:
        file.write(row['review'])

# List all exported files
json_files = glob.glob(os.path.join(PROJECT_PATH, "**", "*.json"), recursive=True)
md_files = glob.glob(os.path.join(PROJECT_PATH, "**", "*.md"), recursive=True)

json_files, md_files

(['/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/User_20250418_151709.json',
  '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/AccountManager_20250418_151709.json',
  '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/orderservice/OrderService_20250418_151709.json'],
 ['/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/AccountManager_20250418_151709.md',
  '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/User_20250418_151709.md',
  '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/orderservice/OrderService_20250418_151709.md'])

In [15]:
# Locate and load the Markdown file corresponding to the first reviewed Java file

# Extract base filename (without extension)
filename_without_extension = df_review["filename"][0][:-len(SOURCE_FILE_EXTENSION)]

# Search for the corresponding .md file in the project directory
files = glob.glob(os.path.join(PROJECT_PATH, "**", f"{filename_without_extension}*.md"), recursive=True)
if not files:
  print("⚠️ No MD files found.")

# Read the content of the first matching Markdown file
with open(files[0], 'r') as f:
    markdown_content = f.read()

# Display the Markdown-formatted review
display(Markdown(markdown_content))

# Code Review - AccountManager Class
**Date:** 18.04.2025

This document provides a detailed code review of the provided `AccountManager` Java class, focusing on code quality, readability, best practices, performance, security, and potential issues.

## Code Quality and Structure

*   **L1:** The class structure is basic but functional. However, it lacks proper encapsulation and documentation.
*   **L1:** The class lacks Javadoc documentation explaining its purpose.
*   **L2, L3:** Instance variables `n` and `a` should be private to enforce encapsulation.
*   **L6-L9:** The constructor is simple but adequate. Consider adding validation for input parameters.
*   **L11-L17:** The `d()` method lacks a descriptive name.
*   **L19-L25:** The `p()` method lacks a descriptive name and its purpose is unclear.
*   **L27-L33:** The `x()` method lacks a descriptive name, and the magic number `1234` should be replaced with a named constant.

## Readability and Naming Conventions

*   **L2:** Variable `n` is not descriptive. It should be renamed to `name`.
*   **L3:** Variable `a` is not descriptive. It should be renamed to `age`.
*   **L1:** Class name `AccountManager` is appropriate.
*   **L11:** Method name `d()` is not descriptive. Rename it to something meaningful like `checkAccess` or `displayWelcomeMessage`.
*   **L19:** Method name `p()` is not descriptive. Rename it to something meaningful like `performCheck`.
*   **L27:** Method name `x()` is not descriptive. Rename it to something meaningful like `validateAccessCode`.
*   **L30:** Consider using a constant for the magic number `1234` to improve readability and maintainability.

## Adherence to Java Best Practices

*   **L2, L3:** Instance variables should be declared as `private` to adhere to encapsulation principles. Use getter methods if external access is required.
*   The class lacks Javadoc comments, which are essential for maintainability and API documentation.
*   Magic numbers, like `1234`, should be avoided. Use named constants instead.

## Performance and Security Considerations

*   The provided code doesn't exhibit any immediate performance bottlenecks.
*   The access code validation in the `x()` method is vulnerable to brute-force attacks. Consider implementing more robust authentication mechanisms for real-world scenarios.

## Potential Bugs or Edge Cases

*   The constructor does not validate the input `age`. A negative age or an age that is unrealistically high could lead to unexpected behavior.
*   The `x()` method uses a hardcoded access code. This poses a significant security risk.

## Optimized Code Example

```java
/**
 * Manages user accounts and their access permissions.
 */
public class AccountManager {

    private String name;
    private int age;
    private static final int ACCESS_CODE = 1234;

    /**
     * Constructs an AccountManager object.
     *
     * @param name The name of the account.
     * @param age  The age of the account holder.
     * @throws IllegalArgumentException if age is negative.
     */
    public AccountManager(String name, int age) {
        if (age < 0) {
            throw new IllegalArgumentException("Age cannot be negative.");
        }
        this.name = name;
        this.age = age;
    }

    /**
     * Checks if the account holder has access based on their age.
     */
    public void checkAccess() {
        if (age < 18) {
            System.out.println("No access");
        } else {
            System.out.println("Welcome " + name);
        }
    }

    /**
     * Performs a checking operation.
     */
    public void performCheck() {
        for (int i = 0; i < 100; i++) {
            if (i % 5 == 0) {
                System.out.println("Checking " + i);
            }
        }
    }

    /**
     * Validates the provided access code.
     *
     * @param accessCode The access code to validate.
     */
    public void validateAccessCode(int accessCode) {
        if (accessCode == ACCESS_CODE) {
            System.out.println("Access granted");
        } else {
            System.out.println("Denied");
        }
    }

    public String getName() {
        return name;
    }

    public int getAge() {
        return age;
    }
}
```

### Summary of changes in optimized Code:

*   Added Javadoc comments to the class and methods.
*   Renamed variables `n` to `name` and `a` to `age` for better readability.
*   Renamed methods `d()` to `checkAccess()`, `p()` to `performCheck()`, and `x()` to `validateAccessCode()` for better clarity.
*   Made instance variables `name` and `age` private and added getter methods.
*   Introduced a constant `ACCESS_CODE` for the magic number `1234`.
*   Added input validation to the constructor to prevent negative ages.


In [16]:
# Create a summary DataFrame containing only filename, summary, and score columns
df_summary = df_review.drop(["review"], axis=1)

# Generate a Markdown header with the current datetime
header = f"# Review summary {DATETIME_HEADER}\n\n"

# Construct the output path for the summary Markdown file
summary_filename = PROJECT_PATH + f"/review_summary_{DATETIME_FILENAME}.md"

# Write the header and summary table to the Markdown file
with open(summary_filename, "w") as file:
  file.write(header + df_summary.to_markdown(index=False))

In [17]:
# Recursively list all generated Markdown (.md) files in the project directory
glob.glob(os.path.join(PROJECT_PATH, "**", f"*.md"), recursive=True)

['/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/AccountManager_20250418_151709.md',
 '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/review_summary_20250418_151709.md',
 '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/User_20250418_151709.md',
 '/kaggle/working/Google-Capstone-Java-AI-Reviewer/exampleFiles/orderservice/OrderService_20250418_151709.md']

## 🧪 Evaluation
Ask Gemini to reflect on the quality of its review based on five key criteria.

**Five evalution criteria:**
* Relevance
* Correctness
* Completeness
* Consistency
* Usefulness

In [18]:
# Evaluate the quality of a model-generated code review using a structured scoring rubric
# The model self-assesses its output across five dimensions and returns individual and overall scores
def evaluate_review_by_llm(prompt, generated_review, filename):
    evaluation_prompt = f"""
        You are an expert in code review, prompt engineering, and evaluating LLM responses.
        Given the following code review prompt and the LLM's response, evaluate the response based on these criteria:
    
        -   **Relevance:** Does the review focus on the most important aspects of the code?
        -   **Correctness:** Is the review accurate in identifying problems and providing suggestions?
        -   **Completeness:** How many of the relevant problems does the review identify?
        -   **Consistency:** Is the review consistent in style, tone, and level of detail?
        -   **Usefulness:** How useful are the suggestions in the review for the developer?
    
        Provide a score from 1 to 10 (1 = very poor, 10 = excellent) for each criterion and give a brief explanation.
        
        Format the scores as follows, making it easy to parse:**
        * **Relevance Score:** [score]
        * **Correctness Score:** [score]
        * **Completeness Score:** [score]
        * **Consistency Score:** [score]
        * **Usefulness Score:** [score]
        * **Overall Score:** [score]

        Prompt:
        {prompt}
    
        Code:
        See attached document
    
        Response:
        {generated_review}
    
        Evaluation:
    """

    config = genai.types.GenerateContentConfig(temperature=0.4)  # Lower temperature for consistency
    return get_llm_response(prompt=evaluation_prompt, config=config, filename=filename)

In [19]:
# Initialize a DataFrame to store LLM evaluation scores for each reviewed file
df_evaluation_by_llm = pd.DataFrame(columns = ["filename", 
                                               "evaluation", 
                                               "overall_score", 
                                               "relevance_score", 
                                               "correctness_score", 
                                               "completeness_score",
                                               "consistency_score",
                                               "usefulness_score"])

# Loop over each reviewed file and evaluate the quality of the review
for index, row in df_review.iterrows():
    # Retrieve the generated review
    generated_review = row["review"]

    # Determine the full file path of the source code
    full_path = PROJECT_PATH + "/" + row["filename"]
    
    # Request evaluation of the review via the LLM
    evaluation = evaluate_review_by_llm(review_prompt, generated_review, full_path)
    
    # Extract individual scores from the response using regular expressions
    relevance_score = re.search(r"\*\*Relevance Score:\*\* (\d+)", evaluation)
    correctness_score = re.search(r"\*\*Correctness Score:\*\* (\d+)", evaluation)
    completeness_score = re.search(r"\*\*Completeness Score:\*\* (\d+)", evaluation)
    consistency_score = re.search(r"\*\*Consistency Score:\*\* (\d+)", evaluation)
    usefulness_score = re.search(r"\*\*Usefulness Score:\*\* (\d+)", evaluation)
    overall_score = re.search(r"\*\*Overall Score:\*\* (\d+)", evaluation)
    
    # Convert score matches to integers (or None if not found)
    relevance = int(relevance_score.group(1)) if relevance_score else None
    correctness = int(correctness_score.group(1)) if correctness_score else None
    completeness = int(completeness_score.group(1)) if completeness_score else None
    consistency = int(consistency_score.group(1)) if consistency_score else None
    usefulness = int(usefulness_score.group(1)) if usefulness_score else None
    overall = int(overall_score.group(1)) if overall_score else None

    # Print a warning for low-scoring reviews (under 8)
    if overall is not None and overall < 8:
        print(f"⚠️ WARNING: Low overall score: {overall}")
    
    # Append the scores and raw evaluation to the output DataFrame
    df_evaluation_by_llm.loc[len(df_evaluation_by_llm)] = {
        "filename": row["filename"],
        "evaluation": evaluation,
        "overall_score": overall,
        "relevance_score": relevance, 
        "correctness_score": correctness,
        "completeness_score": completeness,
        "consistency_score": consistency,
        "usefulness_score": usefulness
    }

df_evaluation_by_llm

Unnamed: 0,filename,evaluation,overall_score,relevance_score,correctness_score,completeness_score,consistency_score,usefulness_score
0,AccountManager.java,* **Relevance Score:** 10\n* **Correctness Sco...,10,10,10,10,10,10
1,User.java,* **Relevance Score:** 9\n* **Correctness Scor...,9,9,10,8,10,9
2,orderservice/OrderService.java,* **Relevance Score:** 9\n* **Correctness Scor...,9,9,10,8,10,9


In [20]:
# Let's have a look at one of the evaluations
display(Markdown(df_evaluation_by_llm["evaluation"][0]))

* **Relevance Score:** 10
* **Correctness Score:** 10
* **Completeness Score:** 10
* **Consistency Score:** 10
* **Usefulness Score:** 10
* **Overall Score:** 10

**Explanation:**

*   **Relevance:** The review directly addresses all the key areas specified in the prompt, including code quality, readability, best practices, performance, security, and potential bugs. It focuses on the most pertinent issues in the provided code.
*   **Correctness:** The review accurately identifies problems such as poor naming, lack of encapsulation, missing Javadoc comments, use of magic numbers, and security vulnerabilities. The suggestions for improvement are also correct and appropriate.
*   **Completeness:** The review is very thorough and identifies nearly all of the relevant issues in the code. It covers a wide range of aspects, from naming conventions to security considerations.
*   **Consistency:** The review maintains a consistent style and tone throughout. The level of detail is also consistent across all sections. Each point starts with a line number as requested.
*   **Usefulness:** The suggestions provided in the review are highly useful for the developer. They offer concrete steps that can be taken to improve the code's quality, readability, and security. The inclusion of an optimized code example further enhances the usefulness of the review. The summary of changes is also helpful.
