# Installing Required Libraries
The following commands install the necessary libraries:
- `torch`: A framework for deep learning tasks.
- `langchain-ollama`: A library for integrating LangChain with Ollama models.

We also verify if CUDA (GPU support) is available for PyTorch, which is essential for leveraging GPU acceleration.


In [None]:

# %pip install torch
# %pip install langchain-ollama

In [None]:
import torch
print(torch.version.cuda)
print(torch.cuda.is_available())

# Code Refactoring Detection with LangChain and Ollama

This section demonstrates how to use the LangChain library with the Ollama model to identify and highlight refactored sections of code. 

## Key Components:
1. **LangChain Core**:
   - Used for defining a `ChatPromptTemplate` that structures the comparison prompt.
2. **Ollama LLM**:
   - A large language model configured with `qwen2.5-coder:3b` and `qwen2.5-coder:latest` to analyze and compare code snippets.
3. **Example Input**:
   - Two versions of a function `calculate_total_price` (original and refactored) are compared.
4. **Expected Output**:
   - The refactored sections are highlighted, with both the original and refactored portions formatted as:
     ```
     - Original Section: {{original_section}}
     - Refactored Section: {{refactored_section}}
     ```

## Workflow:
1. **Setup**: Import necessary libraries and initialize the Ollama model.
2. **Define Template**: Create a prompt to guide the model in comparing the two snippets.
3. **Execution**: Invoke the model chain with sample input code and print the response.



## Qwen2.5-Coder Summary

**Qwen2.5-Coder** is a cutting-edge series of code-specific large language models, formerly known as CodeQwen. Designed to support developers across various needs, it offers six model sizes—ranging from 0.5B to 32B parameters. For our initial tests, the **3B model (3B parameters)** will be used, as a first test to demonstrate its strong performance in analyzing and comparing code snippets.

## Key Features
- **Enhanced Code Abilities**:
  - Significant improvements in code generation, reasoning, and fixing.
  - Trained on 5.5 trillion tokens, including source code and synthetic data.
  - The 32B version matches GPT-4o in coding capabilities.

- **Broader Applications**:
  - Supports real-world use cases such as **Code Agents**.
  - Excels in mathematical and general reasoning tasks.

- **Long-Context Support**:
  - Handles up to **128K tokens** using techniques like **YaRN** for extended contexts.


In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

# Model setup: test qwencoder 3b
model_name = "qwen2.5-coder:3b"

# Optimal, Precise Prompt for detecting refactored parts
template = """
You are a code assistant specializing in identifying refactored parts of code. 
Compare the two code snippets below and highlight only the parts that were refactored.

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Output the refactored sections only in this format:
- Original Section: {{original_section}}
- Refactored Section: {{refactored_section}}
"""

# Create a ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(template)

# Initialize the Ollama model
model = OllamaLLM(model=model_name)

# Combine the prompt and model into a chain
chain = prompt | model

# Example of two code versions
original_code = """
def calculate_total_price(prices, tax_rate):
    total = 0
    for price in prices:
        total += price
    total = total * (1 + tax_rate)
    return total
"""

refactored_code = """
def calculate_total_price(prices, tax_rate):
    subtotal = sum(prices)
    total = subtotal * (1 + tax_rate)
    return total
"""

# Invoke the chain with the two code snippets
response = chain.invoke({
    "original_code": original_code,
    "refactored_code": refactored_code
})

# Print the response
print(response)


Original Section: 
```python
total = 0
for price in prices:
    total += price
```

Refactored Section: 
```python
subtotal = sum(prices)
```


In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM

# Test the latest version of QwenCoder2.5
model_name = "qwen2.5-coder:latest"

# Optimal, Precise Prompt detecting refactored partse
template = """
You are a code assistant specializing in identifying refactored parts of code. 
Compare the two code snippets below and highlight only and only the parts that were refactored.

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Output the refactored sections only in this format:
- Original Section: {{original_section}}
- Refactored Section: {{refactored_section}}
"""

# Create a ChatPromptTemplate
prompt = ChatPromptTemplate.from_template(template)

# Initialize the Ollama model
model = OllamaLLM(model=model_name)

# Combine the prompt and model into a chain
chain = prompt | model

# Example of two code versions
original_code = """
def calculate_total_price(prices, tax_rate):
    total = 0
    for price in prices:
        total += price
    total = total * (1 + tax_rate)
    return total
"""

refactored_code = """
def calculate_total_price(prices, tax_rate):
    subtotal = sum(prices)
    total = subtotal * (1 + tax_rate)
    return total
"""

# Invoke the chain with the two code snippets
response = chain.invoke({
    "original_code": original_code,
    "refactored_code": refactored_code
})

# Print the response
print(response)


- Original Section: `total = 0\nfor price in prices:\n    total += price`
- Refactored Section: `subtotal = sum(prices)`


# Qwen2.5-Coder-7B (Latest) Highlights
- **Type**: Causal Language Model  
- **Parameters**: 7.61B (6.53B non-embedding)  
- **Architecture**: Transformers with RoPE, SwiGLU, RMSNorm, and Attention QKV bias  
- **Context Length**: Full 131,072 tokens  

## Processing Long Texts
To enable processing of inputs exceeding 32,768 tokens, use **YaRN** by adding the following to `config.json`:
```json
{
  "rope_scaling": {
    "type": "yarn",
    "factor": 4.0,
    "original_max_position_embeddings": 32768
  }
}
```

### Next Steps: Testing **QwEncoder** on RefactoringMiner Pairs

In the next section, we will evaluate **Qwen2.5-Coder-7B (Latest)** performance using **QwEncoder** on the **RefactoringMiner** pairs. Here’s the planned approach:

1. **Initial Test**:  
   We will begin by testing the model on the **first pair** to see how well it performs with the extracted files, checking if it can handle the size and structure of the data appropriately.

2. **Exepriment Tests and Prompt Evaluation**:  
   Once we validate that the model performs well on the initial pair, we will run a loop to apply **QwEncoder** on multiple pairs. During this process, we will experiment with different **prompts** to evaluate the impact of prompt variations on **Qwen2.5-Coder-7B**’s performance. This will help us understand the importance of prompts in enhancing the model’s output and efficiency for refactoring tasks.


*Note: Some basic  helper functions will be implemented to perform necessary tasks, such as reading the pair files from the correct folder.*

In [9]:
import os

def read_all_files(directory_path):
    """
    Reads all files in the specified directory and returns their content.

    Args:
        directory_path (str): The directory to read files from.

    Returns:
        list: A list of tuples containing (file_name, file_content).
    """
    all_files_content = []  # Initialize an empty list to store file names and their content.

    try:
        # Iterate over each file name in the specified directory.
        for file_name in os.listdir(directory_path):  
            # Construct the full file path by combining the directory path and file name.
            file_path = os.path.join(directory_path, file_name)

            # Check if the path corresponds to a file (and not a subdirectory or other object).
            if os.path.isfile(file_path):  
                # Open the file in read mode with UTF-8 encoding.
                with open(file_path, 'r', encoding='utf-8') as file:
                    # Read the content of the file.
                    content = file.read()
                    # Append a tuple (file_name, content) to the list.
                    all_files_content.append((file_name, content))

    except Exception as e:
        # Handle any errors that occur during the directory reading process.
        print(f"Error reading files from {directory_path}: {e}")

    # Return the list of file names and their content.
    return all_files_content

# Test the function 
directory_path = "code_pairs/_pair1"  # Read the files of the first folder
all_files = read_all_files(directory_path)  # Call the function to read all files in the directory.

# Print the content of all files (for testing purposes).
for file_name, content in all_files:
    print(f"File: {file_name}")  # Print the file name.
    print(content)  # Print the file content.
    print("--------------------------------------------------------------")  # Separator for better readability.

File: evolved_1_src_org_DogManager.java
package org;
import org.animals.Dog;


public class DogManager {

	private Dog dog;
	public DogManager(Dog aDog) {
		this.dog = aDog;
	}
	
	public void doStuff() {
		barkBark(this.dog);
		
		int age = dog.getAge();
		int sum = 0;
		for (int i = 0; i < age; i++) {
			System.out.println(i);
			sum += i;
		}
		sum -= dog.magicNumber;
		dog.takeABreath();
	}

	public void barkBark(Dog dog) {
		System.out.println("ruff");
		System.out.println("ruff");
		dog.takeABreath();
		System.out.println("ruff");
		System.out.println("ruff");
		System.out.println("ruff");
	}

}

--------------------------------------------------------------
File: original_1_src_org_animals_Dog.java
package org.animals;

import org.DogManager;

public class Dog {

	private int age = 0;
	public int magicNumber = 17;

	public int getAge() {
		return this.age;
	}

	public void barkBark(DogManager manager) {
		System.out.println("ruff");
		System.out.println("ruff");
		takeABreath();


# **Code Modularization Plan**

The following parts of the code will be modularized to improve organization, reusability, and maintainability:

---

## **1. Refactoring Prompt Template Initialization**
- **Current Function**: `initialize_refactoring_prompt_template`
- **Modularization Goal**:  
  This function is responsible for creating and returning a `ChatPromptTemplate` for detecting code refactorings. It will be modularized into its own module to isolate template-related logic and allow for easier customization of prompts.  
- **Inputs**: Template string.  
- **Output**: A `ChatPromptTemplate` object.

---

## **2. Model Initialization**
- **Current Function**: `initialize_model`
- **Modularization Goal**:  
  This function initializes the Ollama LLM model with optional parameters (e.g., `temperature`, `top_p`). It will be separated into its own module to handle different configurations and simplify model management.  
- **Inputs**: Model name and configuration parameters (e.g., `temperature`, `top_p`).  
- **Output**: An initialized language model.

---

## **3. Code Refactoring Analysis**
- **Current Function**: `analyze_code_refactoring`
- **Modularization Goal**:  
  This function processes the original and refactored code using the template and the language model. It will be modularized to separate code analysis from the rest of the codebase, ensuring a cleaner structure for processing code pairs.  
- **Inputs**: Original code, refactored code, prompt template, and model.  
- **Output**: Response from the model, typically a dictionary or a string.

---

## **4. Response Saving**
- **Current Function**: `save_response`
- **Modularization Goal**:  
  This function saves the model's response to a file in the specified folder with the pair number. It will be modularized to handle file saving logic separately, enabling easier testing and reuse in other workflows.  
- **Inputs**: Model response, output folder path, and pair number.  
- **Output**: A saved text file containing the model's response.

---

## **5. Execution Function**
- **Current Function**: `execute`
- **Modularization Goal**:  
  The `execute` function drives the overall logic, selecting random code pairs, processing them using the model, and saving the results. Modularizing this function will provide a reusable entry point for various scenarios, allowing configurations like the number of pairs to process, range of pair numbers, and output directories.  
- **Inputs**:  
  - Model instance  
  - Prompt template  
  - Directory path for code pairs  
  - Output folder path  
  - Number of pairs to process (`num_pairs`)  
  - Range of pair numbers (`range_start`, `range_end`)  
  - Previously processed pairs (`processed_pairs`)  
- **Output**: Processed pairs saved in the output folder.

---

## **Conclusion of Modularization**
By modularizing the code into the above components, we achieve:
- **Separation of Concerns**: Each function handles a distinct responsibility.  
- **Reusability**: Functions like `save_response` and `execute` can be reused in similar workflows.  
- **Testability**: Isolated modules are easier to unit test.  
- **Maintainability**: Modular code is easier to extend and refactor in the future.

In [10]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_ollama.llms import OllamaLLM
import os 
import random

def initialize_refactoring_prompt_template(template):
    """
    Creates and returns a ChatPromptTemplate for detecting code refactorings between two Java classes.

    Args:
        template (str): A custom template for the refactoring task.

    Returns:
        ChatPromptTemplate: A template for the refactoring task.

    Raises:
        ValueError: If no template is provided.
    """
    if not template:
        raise ValueError("Template cannot be None. Please provide a valid template.")
    
    return ChatPromptTemplate.from_template(template)



def initialize_model(model_name="qwen2.5-coder:latest", **model_parameters):
    """
    Initializes the Ollama LLM model with optional parameters.

    Args:
        model_name (str): The name of the model to initialize.
        **model_parameters: Additional keyword arguments for model configuration 
                            (e.g., temperature, top_p, etc.).

    Returns:
        OllamaLLM: The initialized model.
    """
    return OllamaLLM(model=model_name, **model_parameters)


def analyze_code_refactoring(original_code, refactored_code, prompt_template, model):
    """
    Analyzes code refactoring between two code snippets.

    Args:
        original_code (str): The original version of the code.
        refactored_code (str): The refactored version of the code.
        prompt_template (ChatPromptTemplate): The prompt template for the task.
        model (OllamaLLM): The language model to use.

    Returns:
        str: The response from the model detailing the detected refactorings.
    """
    chain = prompt_template | model
    response = chain.invoke({
        "original_code": original_code,
        "refactored_code": refactored_code
    })
    return response



def save_response(response, folder, pair_num):
    """
    Saves the model's response to a file in the specified folder with the pair number.
    """
    response_text = str(response) if isinstance(response, dict) else response
    os.makedirs(folder, exist_ok=True)
    with open(f"{folder}/model_output_pair{pair_num}.txt", "w") as f:
        f.write(response_text)


# Execution Function
def execute(model, prompt_template, directory_path, output_folder, num_pairs=10, range_start=2, range_end=95, processed_pairs=set()):
    """
    Executes the refactoring analysis for a random selection of code pairs.

    Parameters:
    - model: The initialized language model.
    - prompt_template: The template for generating the prompt.
    - directory_path: Path to the code pairs directory.
    - output_folder: Folder to save the output files.
    - num_pairs: Number of pairs to process.
    - range_start: Start of the range for selecting pairs.
    - range_end: End of the range for selecting pairs.
    - processed_pairs: Set of already processed pair numbers to avoid duplicates.
    """
    all_pairs = list(set(range(range_start, range_end + 1)) - processed_pairs)
    random_pairs = random.sample(all_pairs, num_pairs)

    for pair_num in random_pairs:
        directory_pair = f"{directory_path}/_pair{pair_num}"

        # Read files for the current pair
        all_files = read_all_files(directory_pair)
        original_code = next((content for name, content in all_files if name.startswith("original")), None)
        refactored_code = next((content for name, content in all_files if name.startswith("evolved")), None)

        if original_code and refactored_code:
            # Invoke the chain with the two code snippets
            response = analyze_code_refactoring(original_code, refactored_code, prompt_template, model)
            # Save the response to a file
            save_response(response, output_folder, pair_num)
            # Add the pair number to the processed pairs set
            processed_pairs.add(pair_num)

In [14]:

# Test the model on the one pair (pair1) using a Guided Prompt
template_1 =  """
You are a code refactoring assistant. Your task is to compare two Java classes and identify all refactorings between them. 
Refactorings can include (but are not limited to):

    - Moving methods between two different classes (From one class to another).
    - Changing method signatures (e.g., parameter addition/removal or type of parameter changes).
    - Modifying access levels of attributes or methods (e.g., public to private).
    - Adding, removing, or modifying method calls or logic.
    - Adding encapsulation (e.g., replacing direct field access with getter/setter methods).

The trick is to carefully analyze sections in the provided content. Look for elements that are similar but have undergone a change. 

Be thorough in identifying refactorings. If you believe there is **no refactoring**, explicitly state: "There is no refactoring."

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change (e.g., "Moved Method", "Changed Parameter").
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""
# Example of two code versions
original_code = next((content for name, content in all_files if name.startswith("original")), None)
refactored_code = next((content for name, content in all_files if name.startswith("evolved")), None)

# Ensure we have the necessary files
if not original_code or not refactored_code:
    print("Error: Original or refactored code is missing.")
else:
    # Get the prompt template and model
    prompt_template = initialize_refactoring_prompt_template(template_1)
    model = initialize_model()

    # Analyze refactoring
    response = analyze_code_refactoring(original_code, refactored_code, prompt_template, model)

    # Print the response
    print(response)

### Refactored Code Analysis

#### Refactoring 1: Moved Method

**Original Section:**
```java
public class Dog {
    // ...
    public void barkBark(DogManager manager) {
        System.out.println("ruff");
        System.out.println("ruff");
        takeABreath();
        System.out.println("ruff");
        System.out.println("ruff");
        System.out.println("ruff");
    }
    
    public void takeABreath() {
        System.out.println("...");
    }
}
```

**Refactored Section:**
```java
public class DogManager {
    // ...
    public void barkBark(Dog dog) {
        System.out.println("ruff");
        System.out.println("ruff");
        dog.takeABreath();
        System.out.println("ruff");
        System.out.println("ruff");
        System.out.println("ruff");
    }
    
    public void doStuff() {
        // ...
    }
}
```

**Refactoring Type:** Moved Method  
**Description:** The `barkBark` method was moved from the `Dog` class to the `DogManager` class. It now takes a `Dog` o

In [15]:

# Try a a second type of Prompt (Flexible) on the same pair

template_2 = """
You are a code assistant specializing in identifying refactored parts of code. 

Your task is to compare the two Java classes below and highlight only the parts that were refactored.
Original Code:
{original_code}

Refactored Code:
{refactored_code}


Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change (e.g., "Moved Method", "Changed Parameter").
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""

# Ensure we have the necessary files
if not original_code or not refactored_code:
    print("Error: Original or refactored code is missing.")
else:
    # Get the prompt template and model
    prompt_template = initialize_refactoring_prompt_template(template_2)
    model = initialize_model()

    # Analyze refactoring
    response = analyze_code_refactoring(original_code, refactored_code, prompt_template, model)

    # Print the response
    print(response)


### Refactored Code Analysis

1. **Refactoring Type:** Moved Method  
   - **Original Section:**
     ```java
     public void barkBark(DogManager manager) {
         System.out.println("ruff");
         System.out.println("ruff");
         takeABreath();
         System.out.println("ruff");
         System.out.println("ruff");
         System.out.println("ruff");
     }
     ```
   - **Refactored Section:**
     ```java
     public void barkBark(Dog dog) {
         System.out.println("ruff");
         System.out.println("ruff");
         dog.takeABreath();
         System.out.println("ruff");
         System.out.println("ruff");
         System.out.println("ruff");
     }
     ```
   - **Description:** The `barkBark` method has been moved from the `Dog` class to the `DogManager` class. The parameter type of `barkBark` was changed from `DogManager` to `Dog`.

2. **Refactoring Type:** Moved Class  
   - **Original Section:**
     ```java
     import org.DogManager;
     
     public cla

### Model Refactoring Detection: First Test Result Observation (RefactoringMiner Saved Pairs)

In the first test cell, the first prompt, which was more detailed and guided the model more explicitly, resulted in better confidence in the output. The model's responses were more consistent and confident in identifying the refactorings. However, the more detailed prompt produced less flexibilty (e.g., Added Method Logic, Modified Method Calls). Additionally, the increased detail also led to faster execution times, with results taking over **3 to 5 minutes** per code pair.

In contrast, using a less detailed and guided prompt in the second cell, the model also successfully detected one refactoring applied to the code. It was able to identify a key transformation which is "Class Encapsulation". The results were provided longer taking over **4 to 8 minutes** per code pair. Also, the model did not provide aclear explanation of its results, which led us to conclude that while it might successfully detect some refactorings, it might miss some refactorings or create unexistant ones and be less certain about certain changes.


This indicates that while a more guided prompt enhances the model's confidence and accuracy, it may also introduce the potential for less flexibility and creativity. Further optimization, refinement and ressources are needed to balance accuracy speed.

# Running LLM: QwenCoder on 20 Random Code Pairs Saved from RefactoringMiner

In this section, we will run our LLM, the latest **QwenCoder**, on 20 random code pairs selected from the `RefactoringMiner` extracted file pairs (Task 2). Although we have a total of 95 code pairs, we will process only 20 pairs. We will use the same prompts as before: **GUIDED** and **UNGUIDED**. The model's responses for each prompt type will be saved into files for later analysis, allowing us to review and evaluate the results in more detail at a later stage.

#### File Paths:
- **UNGUIDED Prompt Responses**:  
  The model's responses for the **UNGUIDED** prompt will be saved in the following file:  
  `/test_UNGUIDEDPROMPT`

- **GUIDED Prompt Responses**:  
  The model's responses for the **GUIDED** prompt will be saved in the following file:  
  `/test_GUIDEDPROMPT`

As done in the paper [Assessing LLMs in Detecting Code Refactorings](https://arxiv.org/pdf/2408.16151), we will set the temperature to 0 to make the model more deterministic and reduce variability in its answers.

In [None]:
# Intilialization of the model setup, prompt templates and directory paths 

# Directory path and model setup
directory_path = "code_pairs"
model_name = "qwen2.5-coder:latest"

# Initialize GUIDED prompt template and output folder
folder_GUIDED = "test_GUIDEDPROMPT/"
template_GUIDED = """
You are a code refactoring assistant. Your task is to compare two Java classes and identify all refactorings between them. 
To help you, Refactorings can include (but are not limited to):

   - Moving methods between two different classes (From one class to another).
   - Changing method signatures (e.g., parameter addition/removal or type of parameter changes).
   - Modifying access levels of attributes or methods (e.g., public to private).
   - Adding, removing, or modifying method calls or logic.
   - Adding encapsulation (e.g., replacing direct field access with getter/setter methods).

The trick is to carefully analyze sections in the provided content. Look for elements that are similar but have undergone a change. 

Be thorough in identifying refactorings. If you believe there is **no refactoring**, explicitly state: "There is no refactoring."

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change (e.g., "Moved Method", "Changed Parameter").
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""


#  Initialize UNGUIDED prompt template and output folder
folder_UNGUIDED = "test_UNGUIDEDPROMPT/"
template_UNGUIDED = """
You are a code refactoring assistant. Your task is to compare two Java classes and identify all refactorings between them. 
The trick is to carefully analyze sections in the provided content. Look for elements that are similar but have undergone a change. 

Be thorough in identifying refactorings.
Do not fabricate changes; focus only on elements that have genuinely been refactored.

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change.
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""

In [None]:
# Guided Prompt Execution tests: 

prompt_template = initialize_refactoring_prompt_template(template_GUIDED)
model = initialize_model(model_name=model_name, temperature=0)

# Set range and number of pairs
range_start = 2
range_end = 95
num_pairs = 20

# Execute the workflow
execute(
    model=model,
    prompt_template=prompt_template,
    directory_path=directory_path,
    output_folder=folder_GUIDED,
    num_pairs=num_pairs,
    range_start=range_start,
    range_end=range_end,
)

### Table Showing Some Results Of GUIDED PROMPT:

<table>
  <tr>
    <th>Pair Number</th>
    <th>Original</th>
    <th>Evolved</th>
    <th>Model Response</th>
  </tr>
  <tr>
    <td>8</td>
    <td>

  ```java
    package org.felines;

    public interface Animal {
        void action();
    }
  ```
  </td>
  <td>
        
   ```java
    package org.felines;

    public interface AnimalSuper {
        void action();
    }
  ```
  </td>
    <td><b>There is no refactoring.</b> The original code and the refactored code are identical. The only difference is that the interface name has been changed from Animal to AnimalSuper. This does not constitute a refactoring according to the provided guidelines, as it does not involve any changes to methods, parameters, access levels, or logic within the class.</td>
  </tr>
  <tr>
    <td>90</td>
    <td>
  
   ```java
    package org.animals;

    public class Dog {
      private int age = 0;

    public int getAge() {
        return this.age;
    }

    public void bark() {
        System.out.println("ruff");
        System.out.println("ruff");
        takeABreath();
        System.out.println("ruff");
        System.out.println("ruff");
        System.out.println("ruff");
    }

    public void takeABreath() {
        System.out.println("...");
    }
  }
  ``` 
  </td>
  <td>
  
  ```java
  package org.animals;

  public class Dog {
    private int age = 0;

    public int getAge() {
        return this.age;
    }

    public void bark() {
      System.out.println("ruff");
      System.out.println("ruff");
      System.out.println("...");
      System.out.println("ruff");
      System.out.println("ruff");
      System.out.println("ruff");
    }
  }
  ``` 
  </td>
    <td><b>There is no refactoring.</b> The provided code snippets show a single class Dog with two methods: getAge() and bark(). The only difference between the original and refactored code is that the takeABreath() method has been added to the bark() method in the refactored version. This change does not constitute a refactoring but rather an addition of functionality within an existing method. Refactoring typically involves changes such as moving methods, changing method signatures, modifying access levels, or altering logic and structure in a way that improves code quality without changing its external behavior.</td>
  </tr>
  <tr>
    <td>7</td>
  <td>

  ```java
  package org.felines;

  public interface Animal {
    void action();
  }
  ``` 
  </td>
  <td>
    
  ```java
  package org.felines;

  public interface Animal extends AnimalSuper {
  }
  ``` 
  </td>
    <td><b>There is no refactoring.</b> The original code and the refactored code are identical. The Animal interface remains unchanged with only a minor addition of extending another interface named AnimalSuper. This does not constitute a refactoring but rather an extension or modification to the existing interface, which could be considered as an enhancement rather than a refactoring in the traditional sense.</td>
  </tr>
</table>

---

### Explanation:

We noticed that the model detected the change, but did not consider it a refactoring. The model has been designed to detect refactorings such as changes in method signatures, class inheritance, and logic. However, due to the provided prompt’s guidelines, the model didn't categorize the change in the interface name as a refactoring. 

For example, in **Pair 7**, the model detected **Interface Extanding**  between the `Animal` class and `AnimalSuper`, but this was not considered a refactoring because the methods and logic remained unchanged. 

This detailed prompt guide directs the model to classify only certain types of changes as refactorings. Therefore, despite noticing the structural difference (the change in interface name), the model concluded that it didn’t qualify as a refactoring, as no logic, method signatures, or access levels were altered.

Pair 90 is particularly interesting because the model was able to detect that **no change in functionality or logic** occurred — the method still prints the same output. However, the model also correctly identified that the `takeABreath()` method **replaces a direct print statement** in the `bark()` method.

But here’s where it gets interesting: while the model recognized this improvement, **the prompt's limitations** may have prevented it from classifying this as a refactoring. Although the change to use the `takeABreath()` method improves code organization, the model did not consider this a refactoring because it didn’t meet the exact criteria outlined by the prompt, such as changes in logic or the structure in a significant way.

In summary, this example highlights how the model’s understanding of refactoring is **shaped by the provided prompt**. Despite recognizing a good practice in the refactored code, the model could not classify it as a refactoring due to the narrow scope of what constitutes a "refactoring" based on the prompt’s instructions.

---

### Conclusion:

This shows how the model adheres strictly to its prompt's definition of what constitutes a refactoring and doesn't consider some common changes, like renaming classes or interfaces, as refactorings.


---


In [None]:
# Unguided Prompt Execution tests: 

prompt_template = initialize_refactoring_prompt_template(template_UNGUIDED)
model = initialize_model(model_name=model_name, temperature=0)

# Set range and number of pairs
range_start = 2
range_end = 95
num_pairs = 20

# Execute the workflow
execute(
    model=model,
    prompt_template=prompt_template,
    directory_path=directory_path,
    output_folder=folder_UNGUIDED,
    num_pairs=num_pairs,
    range_start=range_start,
    range_end=range_end,
)

### Table Showing Some Results Of UNGUIDED PROMPT:

<table>
  <tr>
    <th>Pair Number</th>
    <th>Original</th>
    <th>Evolved</th>
    <th>Model Response</th>
  </tr>
  <tr>
    <td>63</td>
    <td>  
    
  ```java
  package org.felines;

  public class Cat extends Feline implements Animal{

      public void sleep(){
      int s1 = 1;
      int s2 = 2;
      int s3 = 3;
      int s4 = 4;
      int s5 = 5;
      int s6 = 6;
      int s7 = 7;
      int s8 = 8;
      
    }

  }
  ```
  </td>

  <td>  

  ```java
  package org.felines;

  public class Cat extends Feline implements Animal{

    public void sleep(){
      int s1 = 1;
      int s2 = 2;
      sleepNight();
      int s8 = 8;
      
    }

    private void sleepNight() {
      int s3 = 3;
      int s4 = 4;
      int s5 = 5;
      int s6 = 6;
      int s7 = 7;
    }

  }
  ```
  </td> 


  <td>   

  ### Refactoring Detected:

  1. **Refactoring Type:** Method Extraction
    
  - **Original Section:**
      ```java
      public void sleep(){
        int s1 = 1;
        int s2 = 2;
        int s3 = 3;
        int s4 = 4;
        int s5 = 5;
        int s6 = 6;
        int s7 = 7;
        int s8 = 8;
      }
      ```

  - **Refactored Section:**
      ```java
      public void sleep(){
        int s1 = 1;
        int s2 = 2;
        sleepNight();
        int s8 = 8;
      }

      private void sleepNight() {
          int s3 = 3;
          int s4 = 4;
          int s5 = 5;
          int s6 = 6;
          int s7 = 7;
      }
      ```
  - **Description:** The `sleep` method has been refactored by extracting the repetitive code into a new private method named `sleepNight`. This reduces redundancy and improves code readability.</td>
</tr>
<tr>
  <td>88</td>
  <td>
    
  ```java
  package org.animals;

  public class Chicken {
    public void cackle() {
        System.out.println("cackle");
    }
              
    public void layEgg() {}
    }    
  ``` 
  </td>

  <td>
  
  ```java
  package org.animals;

  public class Chicken extends Bird {
    public void cackle() {
      System.out.println("cackle");
      }

    public void layEgg() {}
  }
  ```
   </td>

   <td>

  - **Original Section:** 
      ```java
      public class Chicken {}
      ```
      
  - **Refactored Section:**
      ```java
      public class Chicken extends Bird {}
      ```
      
  - **Refactoring Type:** The `Chicken` class now extends the `Bird` class, indicating that it is a subclass of `Bird`. This change suggests that `Chicken` inherits properties and methods from `Bird`, potentially adding more functionality or overriding existing ones. 
    
   
   </td>
</tr>
  
<tr>
  <td>8</td>
  <td>

  ```java
    package org.felines;

    public interface Animal {
	  void action();
    }
  ```
  </td>
  <td>

  ```java
    package org.felines;

    public interface Animal {
	  void action();
    }
  ```
  </td>
  <td>
  
  - **Original Section:**   
  
    ```java
    package org.felines;

    public interface Animal {
      void action();
    }
    ```
  - **Refactored Section:**
    ```java
    package org.felines;

    public interface Animal {
      void action();
    }
    ```
  - **Refactoring Type:** Renaming of the interface from `Animal` to `AnimalSuper`. This is a simple renaming refactor that does not change the functionality or structure of the code, but rather its identifier.</td>
  </tr>
</table>

---
### Explanation:

Despite the model detecting some valid refactorings (See table above), there were a lot of false positives due to the lack of specific guidance. This freedom allowed the model to be more flexible in its detection of changes, which led to inaccurate conclusions in some cases.

#### Pair 63 (Row 1) – **Method Extraction**:
The refactoring detected seems valid, as it involves method extraction, which is a clear and useful improvement to the code structure. The `sleep` method was refactored to extract part of its functionality into a new `sleepNight` method. This is a classic example of refactoring to improve code readability and reuse.

#### False Positive case:
However, in other examples (such as in **Pair 15**), the model incorrectly flagged class and method name changes or additions as refactorings. For example:

- **Class Name Change**:
    - **Original Section**:
      ```java
      public class AnimalMarilho {}
      ```
    - **Refactored Section**:
      ```java
      public class Reptile extends AnimalMarilho {}
      ```
    - **Refactoring Type**: This is more of a change in the inheritance hierarchy, not a refactoring in the class name. 

- **Method Name Change**:
    - **Original Section**:
      ```java
      public int hashCode() {}
      ```
    - **Refactored Section**:
      ```java
      public boolean equals(Object obj) {}
      ```
    - **Refactoring Type**: This is a method name change that was detected as a refactoring, but this is not a refactoring. 


#### Conclusion:
Therefore, while **Pair 63** involves a legitimate refactoring (method extraction), other changes detected by the model might not be considered refactorings in the traditional sense.

In conclusion, while the model detected some valid refactorings, its flexibility and lack of guidance led to some inaccurate or inappropriate detections, highlighting the importance of a moderate amount of guidence and constraints for automatic code analysis.

---


## Adding a Bonus Prompt: Guided and Documented

To enhance the functionality, we will introduce a **Bonus Prompt** that is both **guided** and **well-documented**. 

### Purpose of the Bonus Prompt
This additional prompt will focus specifically on detecting **refactorings similar to those identified by RefactoringMiner**. The goal is to ensure comprehensive detection of common refactoring types while aligning with the precise conventions and patterns used by RefactoringMiner.

### Key Features of the Bonus Prompt
- **Exemple Refactoring Types**: Emphasis on identifying changes that closely match RefactoringMiner's taxonomy, such as:
  - Method extraction and inlining
  - Class renaming
  - Attribute movement between classes
  - Method signature changes
- **Comprehensive Documentation**: Outputs will include clear descriptions, explanations, and classifications for each detected refactoring.

In [17]:

template_BONUS = """
You are a code refactoring assistant. Analyze the provided Original Code and Refactored Code to identify genuine refactorings.
Common refactorings you may encounter include:

  - Methods: Extract, Inline, Rename, Move, Pull Up, Push Down, Extract and Move, Inline with Move.
  - Attributes: Move, Pull Up, Push Down, Extract, Rename, Replace with Variable, Split, Merge, Change Type.
  - Classes/Packages: Move, Rename, Extract Superclass/Interface, Split, Merge, Change Type Declaration, Collapse Hierarchy.
  - Variables/Parameters: Rename, Extract, Inline, Split, Merge, Replace, Parameterize.
  - Modifiers: Add/Remove/Change (final, static, abstract, synchronized, etc.).
  - Annotations: Add, Remove, Modify (Method, Attribute, Class, Parameter, Variable).
    
Be thorough in identifying refactorings.
Do not fabricate changes; focus only on elements that have genuinely been refactored.

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change.
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""

In [19]:

# Test the model on the one pair (pair1) using the BONUS Prompt
directory_path = "code_pairs/_pair1"  # Read the files of the first folder
all_files = read_all_files(directory_path)  # Call the function to read all files in the directory.

# Example of two code versions
original_code = next((content for name, content in all_files if name.startswith("original")), None)
refactored_code = next((content for name, content in all_files if name.startswith("evolved")), None)

# Ensure we have the necessary files
if not original_code or not refactored_code:
    print("Error: Original or refactored code is missing.")
else:
    # Get the prompt template and model
    prompt_template = initialize_refactoring_prompt_template(template_BONUS)
    model = initialize_model()

    # Analyze refactoring
    response = analyze_code_refactoring(original_code, refactored_code, prompt_template, model)

    # Print the response
    print(response)

1. **Class Renaming**:
   - **Original Section**: `public class Dog`
   - **Refactored Section**: `public class DogManager`
   - **Refactoring Type**: Rename Class

2. **Method Extraction and Parameterization**:
   - **Original Section**: 
     ```java
     public void barkBark(DogManager manager) {
         System.out.println("ruff");
         System.out.println("ruff");
         takeABreath();
         System.out.println("ruff");
         System.out.println("ruff");
         System.out.println("ruff");
     }
     ```
   - **Refactored Section**: 
     ```java
     public void doStuff() {
         barkBark(this.dog);
         
         int age = dog.getAge();
         int sum = 0;
         for (int i = 0; i < age; i++) {
             System.out.println(i);
             sum += i;
         }
         sum -= dog.magicNumber;
         dog.takeABreath();
     }

     public void barkBark(Dog dog) {
         System.out.println("ruff");
         System.out.println("ruff");
         dog.take

# Comparing Refactoring Prompts: Guided vs. Unguided vs. Bonus

In this section, we will compare the performance of three different prompts in identifying code refactorings. The goal is to assess how each prompt detects changes in 5 randomly selected file pairs and identify which prompt performs best in terms of accuracy, clarity, and efficiency.

## Prompts Overview


1. **Guided Prompt**:
>
>*You are a code refactoring assistant. Your task is to compare two Java classes and identify all refactorings between them. The trick is to carefully analyze sections in the provided content. Look for elements that are similar but have undergone a change.*
>
>*Be thorough in identifying refactorings. Do not fabricate changes; focus only on elements that have genuinely been refactored.*
>
>Original Code: {original_code}
>
>Refactored Code: {refactored_code}
>
>*Follow these guidelines in your output:*
>* For each refactoring detected, provide the following information:
>    - Original Section: {{original_section}}
>    - Refactored Section: {{refactored_section}}
>    - Refactoring Type: A brief description of the type of change.
>* If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
>
3. **Unguided Prompt**:

>    *You are a code assistant specializing in identifying refactored parts of code.*
>
>    *Your task is to compare the two Java classes below and highlight only the parts that were refactored.*
>
>    Original Code: {original_code}
>    
>    Refactored Code: {refactored_code}
>
>    *Follow these guidelines in your output:*
>    
>    * For each refactoring detected, provide the following information:
>        - Original Section: {{original_section}}
>        - Refactored Section: {{refactored_section}}
>        - Refactoring Type: A brief description of the type of change (e.g., "Moved Method", "Changed Parameter").
>    
>    * If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
>
3. **Bonus Prompt**:

>*You are a code refactoring assistant. Analyze the provided Original Code and Refactored Code to identify genuine refactorings.*
>*Common refactorings you may encounter include:*
>
>  - Methods: Extract, Inline, Rename, Move, Pull Up, Push Down, Extract and Move, Inline with Move.
>  - Attributes: Move, Pull Up, Push Down, Extract, Rename, Replace with Variable, Split, Merge, Change Type.
>  - Classes/Packages: Move, Rename, Extract Superclass/Interface, Split, Merge, Change Type Declaration, Collapse Hierarchy.
>  - Variables/Parameters: Rename, Extract, Inline, Split, Merge, Replace, Parameterize.
>  - Modifiers: Add/Remove/Change (final, static, abstract, synchronized, etc.).
>  - Annotations: Add, Remove, Modify (Method, Attribute, Class, Parameter, Variable).
>
>*Be thorough in identifying refactorings. Do not fabricate changes; focus only on elements that have genuinely been refactored.*
>
>Original Code: {original_code}
>
>Refactored Code: {refactored_code}
>
>*Follow these guidelines in your output:*
>
>* For each refactoring detected, provide the following information:
>    - Original Section: {{original_section}}
>    - Refactored Section: {{refactored_section}}
>    - Refactoring Type: A brief description of the type of change.
>
>* If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
>

## Evaluation Criteria

To determine which prompt performs best, we will evaluate the following:

- **Accuracy**: How precisely the prompt identifies the refactorings in the code.
- **Clarity**: How clearly the changes are presented in the output.
- **Efficiency**: How quickly the prompt provides results.



| Aspect               | **Bonus Prompt**                                             | **Guided Prompt**                                          | **Unguided Prompt**                                        |
|----------------------|--------------------------------------------------------------|-----------------------------------------------------------|-----------------------------------------------------------|
| **Detail Level**      | Documented with an exhaustive list of refactoring types   | Detailed but less documented                               | Minimal detail, focuses on identifying refactorings        |
| **Refactoring Scope** | Covers a broad range of refactorings (methods, attributes, classes, etc.) | Focused on comparing similar sections of code              | Focuses only on highlighting refactored parts without specific guidance |
| **Instructions**      | Very explicit, clear instructions for categorizing changes   | Clear instructions but focuses on comparison | General instructions, less specific guidance               |
| **Comparison Focus**  | Comprehensive identification of all types of changes in the code | Comparison of original vs. refactored code sections        | Highlight only the parts that were refactored              |
| **Best Use Case**     | Detailed analysis where multiple types of refactorings need to be identified | Focused analysis where comparison of similar code is key  | Quick identification of refactored sections without deep analysis |
| **Examples of Refactorings Covered** | Methods: Extract, Inline, Rename, Move, Pull Up, Push Down, etc. | Focus on changes and provided instructions, without a broad list of examples | Focus on refactorings with minimal classification or examples |

In [None]:

# Re-Initialize IMPORTANT VARIABLES

# Directory path and model setup
directory_path = "code_pairs"
model_name = "qwen2.5-coder:latest"
output_folder = "test_COMPAREPROMPTS/"


# Re-Initialize GUIDED prompt template 
template_GUIDED = """
You are a code refactoring assistant. Your task is to compare two Java classes and identify all refactorings between them. 
To help you, Refactorings can include (but are not limited to):

   - Moving methods between two different classes (From one class to another).
   - Changing method signatures (e.g., parameter addition/removal or type of parameter changes).
   - Modifying access levels of attributes or methods (e.g., public to private).
   - Adding, removing, or modifying method calls or logic.
   - Adding encapsulation (e.g., replacing direct field access with getter/setter methods).

The trick is to carefully analyze sections in the provided content. Look for elements that are similar but have undergone a change. 

Be thorough in identifying refactorings. If you believe there is **no refactoring**, explicitly state: "There is no refactoring."

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change (e.g., "Moved Method", "Changed Parameter").
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""


#  Re-Initialize UNGUIDED prompt template 
template_UNGUIDED = """
You are a code refactoring assistant. Your task is to compare two Java classes and identify all refactorings between them. 
The trick is to carefully analyze sections in the provided content. Look for elements that are similar but have undergone a change. 

Be thorough in identifying refactorings.
Do not fabricate changes; focus only on elements that have genuinely been refactored.

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change.
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""


template_BONUS = """
You are a code refactoring assistant. Analyze the provided Original Code and Refactored Code to identify genuine refactorings.
Common refactorings you may encounter include:

  - Methods: Extract, Inline, Rename, Move, Pull Up, Push Down, Extract and Move, Inline with Move.
  - Attributes: Move, Pull Up, Push Down, Extract, Rename, Replace with Variable, Split, Merge, Change Type.
  - Classes/Packages: Move, Rename, Extract Superclass/Interface, Split, Merge, Change Type Declaration, Collapse Hierarchy.
  - Variables/Parameters: Rename, Extract, Inline, Split, Merge, Replace, Parameterize.
  - Modifiers: Add/Remove/Change (final, static, abstract, synchronized, etc.).
  - Annotations: Add, Remove, Modify (Method, Attribute, Class, Parameter, Variable).
    
Be thorough in identifying refactorings.
Do not fabricate changes; focus only on elements that have genuinely been refactored.

Original Code:
{original_code}

Refactored Code:
{refactored_code}

Follow these guidelines in your output:
    For each refactoring detected, provide the following information:
        - Original Section: {{original_section}}
        - Refactored Section: {{refactored_section}}
        - Refactoring Type: A brief description of the type of change.
    If multiple refactorings are detected, separate them in the output. Do not group them all together. Clearly identify each refactoring as an individual change.
"""



# Templates for each type of prompt
templates = {
    "BONUS":template_BONUS,
    "UNGUIDED": template_UNGUIDED,
    "GUIDED": template_GUIDED
}

## Simplifying Future Tasks: Automating the Comparison of Code Prompts

To streamline and simplify the task of comparing code pairs across multiple templates, we can create a function like `compare_prompts`. This function will serve as a convenient entry point for processing and comparing the original and refactored code using different prompt templates.

### Purpose:
The goal of `compare_prompts` is to handle the task of selecting random code pairs, processing them using the specified templates, and saving the results (including execution times) in an organized manner.

### Benefits:
- **Efficiency**: Automates the process.
- **Scalability**: You can easily adjust the number of pairs to process, the range of pair numbers, and add more templates without changing the logic.
- **Simplicity**: With just a single function call, the entire process of comparing the prompts can be executed seamlessly.

In [4]:
import os
import random
import time

def process_pair(pair_num, original_code, refactored_code, template_name, template):
    """
    Process a single pair of original and refactored code using a specified prompt template.
    Saves the result and execution time in a structured output folder.
    
    Args:
        pair_num (int): The identifier for the code pair being processed.
        original_code (str): The original version of the code.
        refactored_code (str): The refactored version of the code.
        template_name (str): Name of the template for identification.
        template (str): The prompt template content.

    Returns:
        float: The time taken to process the pair in seconds.
    """
    # Create a ChatPromptTemplate from the provided template
    prompt = initialize_refactoring_prompt_template(template)
    
    # Initialize the Ollama language model with the specified configuration
    model = initialize_model(model_name = model_name, temperature = 0)
    
    # Record the start time for measuring execution time
    start_time = time.time()
    
    # Invoke the processing chain with the original and refactored code
    response = analyze_code_refactoring(original_code, refactored_code,prompt,model)
    
    # Calculate the total elapsed time
    elapsed_time = time.time() - start_time

    # Convert the response to a string if it is a dictionary
    response_text = str(response) if isinstance(response, dict) else response

    # Create a folder specific to this pair to save results
    pair_folder = os.path.join(output_folder, f"pair{pair_num}")
    os.makedirs(pair_folder, exist_ok=True)

    # Save the output response to a text file with the template name included
    output_file = os.path.join(pair_folder, f"pair{pair_num}_{template_name}_output.txt")
    with open(output_file, "w") as f:
        f.write(response_text)

    # Log the execution time in a dedicated file
    execution_time_file = os.path.join(pair_folder, "execution_times.txt")
    with open(execution_time_file, "a") as f:
        f.write(f"{template_name}: {elapsed_time:.2f} seconds\n")

    # Return the time taken to process the pair
    return elapsed_time


def compare_prompts(directory_path, templates, max_pairs=10, start_num=2, end_num=95):
    """
    Compare multiple code pairs using various templates, processing each pair and saving the results.

    Args:
        directory_path (str): The path to the directory containing the code pairs.
        templates (dict): A dictionary of template names and their corresponding prompt templates.
        output_folder (str): The path to the folder where results will be saved.
        max_pairs (int, optional): The maximum number of pairs to process. Defaults to 10.
        start_num (int, optional): The starting number for generating random pairs. Defaults to 2.
        end_num (int, optional): The ending number for generating random pairs. Defaults to 95.

    """
    # Select a random subset of pairs from the specified range
    all_pairs = list(range(start_num, end_num + 1))  
    random_pairs = random.sample(all_pairs, min(max_pairs, len(all_pairs)))

    # Process each selected pair for each template
    for pair_num in random_pairs:
        directory_pair = f"{directory_path}/_pair{pair_num}"
        all_files = read_all_files(directory_pair)
        
        # Extract original and refactored code from the files
        original_code = next((content for name, content in all_files if name.startswith("original")), None)
        refactored_code = next((content for name, content in all_files if name.startswith("evolved")), None)

        if original_code and refactored_code:
            # Process each template for the current code pair
            for template_name, template in templates.items():
                process_pair(pair_num, original_code, refactored_code, template_name, template)

In [None]:
# Run the comparison command! 
compare_prompts(directory_path=directory_path,templates = templates)

# Evaluation of Accuracy, Clarity, and Efficiency

## 1. Accuracy Evaluation (Correctness)

To evaluate the **correctness** of the outputs, we compare them side-by-side. For each code pair, the output is either deemed **correct** or **incorrect** based on human Software Engineer evaluation.

The correctness of the outputs was evaluated based on whether the method at least detected one refactoring in the file. If no refactoring was detected, or if a detected refactoring was misclassified as not being a refactoring, the result was marked as **incorrect**.


### Accuracy Table:
| **Code Pair** |  **Correctness (Bonus)** | **Correctness (Unguided)** | **Correctness (Guided)** |
|----------------|------------------------|---------------------------|-------------------------|
| Pair 2         | The Most Correct       | Mostly Correct            | Least Correct           |
| Pair 16        | The Most Correct       | Mostly Correct            | Incorrect               |
| Pair 28        | Incorrect              | Incorrect                 | Incorrect               |
| Pair 37        | Incorrect              | Incorrect                 | Incorrect               |
| Pair 48        | The Most Correct       | Almost Correct            | Incorrect               |
| Pair 59        | Correct                | Correct                   | Incorrect               |
| Pair 61        | Incorrect              | Incorrect                 | Incorrect               |
| Pair 63        | Correct                | Correct                   | Correct                 |
| Pair 74        | The Most Correct       | Incorrect                 | Incorrect               |
| Pair 83        | Correct                | Correct                   | Correct                 |


### Accurcy Conclusion
Based on the evaluation of **correctness**, the following observations can be made:

1. **Bonus Prompt**: The **Bonus** Prompt demonstrated the highest accuracy overall, with multiple instances where it was classified as **The Most Correct** or simply **Correct**. This suggests that the Bonus prompt is more reliable in detecting refactorings effectively.

2. **Unguided Prompt**: The **Unguided** Prompt showed decent performance, frequently achieving **Mostly Correct** results. However, it did not surpass the **Bonus** Prompt in terms of consistent accuracy.

3. **Guided Prompt**: The **Guided** Prompt generally lagged behind, with several instances classified as **Incorrect**. While it was not bad at detecting refactorings, it consistently labeled them as non-refactorings, leading to incorrect classifications. Only in two cases (Pair 63, Pair83) did the **Guided** Prompt achieve correctness equal to the other two Prompts.

4. **General Observations**: 
   - Refactorings were sometimes missed or misclassified, leading to **Incorrect** evaluations for all Prompts (e.g., Pairs 28, 37, and 61).
   - The **Bonus** Prmopt excelled in detecting and accurately classifying refactorings across various scenarios, indicating a stronger alignment with human evaluation criteria.


---

## 2. Clarity Evaluation


To evaluate the **clarity** of the outputs, we assess the readability and comprehensibility of the model's responses. Clear outputs should accurately describe the detected refactorings and their details in a way that is easy for developers to interpret and use.

| **Code Pair** | **Clarity (Bonus)**      | **Clarity (Unguided)** | **Clarity (Guided)** |
|---------------|--------------------------|-------------------------|-----------------------|
| Pair 2        | Clear                    | Unclear                | Clear                |
| Pair 16       | Clear                    | Unclear                | -                    |
| Pair 28       | Less Clear               | Unclear                | -                    |
| Pair 37       | Clear                    | Clear                  | -                    |
| Pair 48       | Clear                    | Clear                  | -                    |
| Pair 59       | Clear                    | Clear                  | -                    |
| Pair 61       | Clear                    | Unclear                | -                    |
| Pair 63       | Clear                    | Unclear                | Clear                |
| Pair 74       | Clear                    | Clear                  | Unclear              |
| Pair 83       | Clear                    | Unclear                | Clear                |

### Clarity Conclusion

In this evaluation, we added "-" in the **Guided** column for the cases where the refactorings were detected as "no refactoring." Since we cannot judge clarity when no refactoring is detected (as it only provides two sentences stating if it's refactored or not), we excluded these from the clarity assessment.

1. **Bonus Prompt**: The **Bonus** Prompt consistently produced clear and understandable outputs across most of the code pairs. 

2. **Unguided Prompt**: The **Unguided** Prompt, in contrast, showed more variability in its clarity. While it was clear in some cases, it was often deemed unclear in others, indicating inconsistency in output.

3. **Guided Prompt**: For the **Guided** prompt, the clarity evaluation is skewed due to its frequent tendency to classify results as "no refactoring," marked as "-." Since we cannot judge clarity when no refactoring is detected (as it only provides two sentences stating if it's refactored or not), the assessment of clarity is limited in this case. However, based on the 20 tested executions, most of the answers can be considered **more or less** clear.


---

## 3. Efficiency Evaluation (Execution Time)

To evaluate the **efficiency** of the different Prompts, we compare the time taken by each method to process each code pair and calculate the total time for each Prompt. This allows us to determine which method is the fastest and most efficient.


| **Total Bonus Time (seconds)** | **Total Unguided Time (seconds)** | **Total Guided Time (seconds)** |
|--------------------------------|----------------------------------|--------------------------------|
| 1792.65                        | 2611.10                          | 1548.43                        |


### Efficiency Conclusion:  
After comparing the times, we can conclude that:
- The **Guided Prompt** method was the fastest, with the lowest average execution time.
- The **Unguided Prompt** method was the slowest.

---

### Final Evaluation Summary

| **Prompt**    | **Accuracy**          | **Clarity**           | **Efficiency**        | **Overall Best** |
|---------------|-----------------------|-----------------------|-----------------------|------------------|
| **Bonus**     | The Most Correct      | Clear                 | Moderate Time         | **Yes**          |
| **Unguided**  | Mostly Correct        | Unclear               | Slowest Time          | No               |
| **Guided**    | Least Correct         | Skewed (mostly no refactoring) | Fastest Time          | No               |

### Final Conclusion:
Based on the evaluation across the three criteria (Accuracy, Clarity, and Efficiency), the **Bonus Prompt** stands out as the overall best-performing approach. It achieved the highest accuracy and clarity, although its execution time was slightly longer than the **Guided Prompt**. The **Guided Prompt**, despite being the fastest, performed poorly in terms of accuracy and clarity due to frequently classifying results as "no refactoring."

In summary:
- The **Bonus Prompt** was the most reliable and balanced approach across all criteria.
- The **Guided Prompt**, while efficient, lacked in accuracy and clarity, making it less useful overall.
- The **Unguided Prompt** was slower and inconsistent in clarity, making it the least favorable option.
