
# RefModel – LLM Refactoring Evaluation

This Colab notebook lets you **choose the backend** used to evaluate refactorings:

* **`ollama`** – runs a local model served by Ollama (e.g. *phi4*, *llama3*, etc.).
* **`claude`** – sends requests to Anthropic’s Claude (e.g. *claude‑3‑5‑sonnet*).

Just set `BACKEND` to `"ollama"` or `"claude"` in the **Base configuration** cell and adjust the remaining parameters (model names, API key, CSV/definitions paths, etc.).

The notebook supports both *complete* (two full program versions) and *diff* (patch) modes, re‑using the same prompt template across back‑ends so you get identical behaviour whichever LLM you pick.


In [1]:
# @title Install dependencies
%pip install -q --no-deps lightrag[ollama]
%pip install -q langchain langchain-ollama langchain-community
%pip install -q requests tqdm

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m159.1/159.1 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m46.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m45.2/45.2 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m4.7 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
lightrag 0.1.0b6 requires backoff<3.0.0,>=2.2.1, which is not installed.
lightrag 0.1.0b6 requires jsonlines<5.0.0,>=4.0.0, which is not installed.
lightrag 0.1.0b6 requires numpy<2.0.0,>=1.26.4, but you have numpy 2.0.2 which is incompatible.
lightrag 0.1.0b6 requires tiktoken<0.8.0,>=0.7.0, but you have tiktoken 0.9.0 which is incompatib

In [30]:

# @title Base configuration
# ==== choose backend ====
BACKEND = "claude"          # @param ["ollama", "claude"]

# ---- Ollama parameters ----
OLLAMA_MODEL = "phi4"       # @param {type:"string"}
TEMPERATURE   = 0.6
BASE_URL      = "http://localhost:11434"

# Claude parameters
CLAUDE_MODEL  = "claude-3-5-sonnet-20241022"  # @param {type:"string"}
API_KEY       = ""                            # @param {type:"string"}

# General I/O
MODE      = "complete"       # @param ["complete", "diff"]
CSV_PATH  = "test-sample-synthetic.csv"    # @param ['test-sample-synthetic.csv', 'test-sample-real-diff.csv']
TXT_PATH  = "refactoring_definitions.txt"  # list of refactorings
OUTPUT_CSV = f"results-{BACKEND}-{MODE}.csv"

assert BACKEND in ("ollama", "claude"), "BACKEND must be 'ollama' or 'claude'"
assert MODE    in ("complete", "diff"), "MODE must be 'complete' or 'diff'"


In [31]:
# @title Imports & prompt builder
import os, subprocess, threading, time, re, json, requests, sys, platform, textwrap
from datetime import datetime
import pandas as pd
from tqdm.auto import tqdm

from langchain.prompts import PromptTemplate
if BACKEND == "ollama":
    from langchain_ollama import OllamaLLM
    from langchain.chains import LLMChain

# load refactoring definitions
if not os.path.isfile(TXT_PATH):
    raise FileNotFoundError(f"{TXT_PATH} not found")
with open(TXT_PATH, "r", encoding="utf-8") as f:
    REF_DEF = f.read().strip()

P1    = "{program1}"
P2    = "{program2}"
PDIFF = "{diff}"

def build_template(mode: str) -> PromptTemplate:
    if mode == "complete":
        template = f"""You are an expert coding assistant specialized in software refactoring, with many years of experience analyzing code transformations.

You will be given two versions of a program:

- **Original Version:**
{P1}

- **Transformed Version:**
{P2}

Your task is to identify which refactoring type(s) have been applied in transforming the original program into the new version. Use only the following list of predefined refactorings:

{REF_DEF}

**Instructions:**
1. Begin your response with a bullet‑point list of the refactoring type(s) applied.
2. Then, briefly justify each identified refactoring with reference to the specific code changes.
3. Only include refactorings from the list above.
4. Be concise but precise in your explanations.

Do not generate explanations unrelated to the given transformation."""
        return PromptTemplate(input_variables=["program1", "program2"], template=template)

    template = f"""You are an expert coding assistant specialized in software refactoring, with many years of experience analyzing code transformations.

You will be given the diffs of a commit:

- **Diffs:**
{PDIFF}

Your task is to identify which refactoring type(s) have been applied in transforming the original program into the new version. Use only the following list of predefined refactorings:

{REF_DEF}

**Instructions:**
1. Begin your response with a bullet‑point list of the refactoring type(s) applied.
2. Then, briefly justify each identified refactoring with reference to the specific code changes.
3. Only include refactorings from the list above.
4. Be concise but precise in your explanations.

Do not generate explanations unrelated to the transformation."""
    return PromptTemplate(input_variables=["diff"], template=template)


In [32]:
# Sanity-check – shows the template ONLY for the chosen MODE
tpl = build_template(MODE)      # MODE already defined in the configuration cell

print(f"=== {MODE.upper()} mode preview ===\n")
print(tpl.template[:6000])      # shows the first 6000 characters
print("\nInput vars:", tpl.input_variables)

=== COMPLETE mode preview ===

You are an expert coding assistant specialized in software refactoring, with many years of experience analyzing code transformations.

You will be given two versions of a program:

- **Original Version:**
{program1}

- **Transformed Version:**
{program2}

Your task is to identify which refactoring type(s) have been applied in transforming the original program into the new version. Use only the following list of predefined refactorings:

(Add Method Parameter) – Introduces a new parameter to an existing method.
(Remove Method Parameter) – Eliminates an existing parameter from a method signature.
(Rename Method) – Changes the name of a method while preserving its behavior.
(Rename Class) – Changes the name of a class without altering its structure.
(Rename Package) – Changes the name of a package declaration.
(Rename Field) – Changes the name of a class or instance variable.
(Extract Class) – Moves a group of related fields and methods from an existing clas

In [6]:

# @title (Optional) Start Ollama server
if BACKEND == "ollama":
    def _start_ollama():
        os.environ["OLLAMA_HOST"] = "0.0.0.0:11434"
        os.environ["OLLAMA_ORIGINS"] = "*"
        subprocess.Popen(["ollama", "serve"], stdout=subprocess.DEVNULL, stderr=subprocess.STDOUT)

    # install pciutils so ollama install script runs inside Colab VM
    !sudo apt-get -qq update && sudo apt-get -y -qq install pciutils
    !curl -fsSL https://ollama.com/install.sh | sh

    threading.Thread(target=_start_ollama, daemon=True).start()
    time.sleep(2)
    # pull model (idempotent)
    !ollama pull $OLLAMA_MODEL


W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 78, <> line 3.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 
Selecting previously unselected package pci.ids.
(Reading database ... 126281 files and directories currently installed.)
Preparing to unpack .../pci.ids_0.0~2022.01.22-1ubuntu0.1_all.deb ...
Unpacking pci.ids (0.0~2022.01.22-1ubuntu0.1) ...
Selecting previously unselected package libpci3:amd64.
Preparing to unpack .../libpci3_1%3a3.7.0-6_amd64.deb ...
Unpacki

In [33]:

# @title Run evaluation
df = pd.read_csv(CSV_PATH)
now = datetime.now().date()

prompt = build_template(MODE)

if BACKEND == "ollama":
    llm   = OllamaLLM(model=OLLAMA_MODEL, base_url=BASE_URL, temperature=TEMPERATURE)
    chain = LLMChain(llm=llm, prompt=prompt)

    for idx in tqdm(range(len(df)), desc="Evaluating with Ollama"):
        if MODE == "complete":
            res = chain.run(program1=df.loc[idx, "input"],
                            program2=df.loc[idx, "output"])
        else:
            res = chain.run(diff=df.loc[idx, "Diff"])

        df.loc[idx, "LLM"]        = OLLAMA_MODEL
        df.loc[idx, "Date"]       = now
        df.loc[idx, "LLM Output"] = res
        df.loc[idx, "LLM Answer"] = res.split('\n')[0]

else:  # Claude backend
    API_URL = "https://api.anthropic.com/v1/messages"
    headers = {
        "x-api-key": API_KEY,
        "anthropic-version": "2023-06-01",
        "content-type": "application/json",
    }

    def _call_claude(payload: str) -> str:
        data = {
            "model": CLAUDE_MODEL,
            "max_tokens": 1024,
            "messages": [{"role": "user", "content": payload}],
        }
        r = requests.post(API_URL, headers=headers, json=data, timeout=90)
        if r.status_code != 200:
            raise RuntimeError(f"Claude API error {r.status_code}: {r.text}")
        return r.json()["content"][0]["text"]

    for idx in tqdm(range(len(df)), desc="Evaluating with Claude"):
        if MODE == "complete":
            payload = prompt.template.format(program1=df.loc[idx, "input"],
                                             program2=df.loc[idx, "output"])
        else:
            payload = prompt.template.format(diff=df.loc[idx, "Diff"])

        res = _call_claude(payload)

        df.loc[idx, "LLM"]        = CLAUDE_MODEL
        df.loc[idx, "Date"]       = now
        df.loc[idx, "LLM Output"] = res
        df.loc[idx, "LLM Answer"] = res.split('\n')[0]


Evaluating with Claude:   0%|          | 0/6 [00:00<?, ?it/s]

  df.loc[idx, "Date"]       = now


In [34]:
# @title Save & download results
df.to_csv(OUTPUT_CSV, index=False)
print(f"Saved CSV: {OUTPUT_CSV}")
from google.colab import files
files.download(OUTPUT_CSV)

Saved CSV: results-claude-complete.csv


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>