# 💬 Code Comment Generator  
**LLM-Powered Code Comment Generator**  
**Author:** Khalil Kurdi  

---

## 🧠 Overview

This project explores automatic generation of concise, meaningful code comments using **pre-trained language models** (LLMs) such as **Phi-1** and **CodeLlama**.

Rather than relying on fine-tuning — which proved ineffective and logically inconsistent in early experiments — we leverage **instruction-tuned LLMs** for zero-shot comment generation. The final output includes:

- A simple **Gradio interface** for interactive use  
- Support for **Phi-1** and **CodeLlama-7B-Instruct**  
- Evaluation using **BLEU Score** to benchmark output quality  

---


# Model Loading

In [1]:
!pip install -q transformers accelerate bitsandbytes
!pip install gradio
import gradio as gr
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline



In [2]:
# Load CodeLlama
llama_model_id = "codellama/CodeLlama-7b-Instruct-hf"
llama_tokenizer = AutoTokenizer.from_pretrained(llama_model_id)
llama_model = AutoModelForCausalLM.from_pretrained(
    llama_model_id,
    device_map="auto",
    load_in_4bit=True
)
llama_generator = pipeline(
    "text-generation",
    model=llama_model,
    tokenizer=llama_tokenizer,
    max_new_tokens=128,
    temperature=0.7,
    top_p=0.9,
    repetition_penalty=1.1
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda:0


In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

model_id = "microsoft/phi-1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
phi_generator = pipeline("text-generation", model=model, tokenizer=tokenizer)


Device set to use cuda:0


# Functions to Use

In [4]:
# Generate comment functions
def generate_comment_codellama(code_snippet: str) -> str:
    prompt = f"[INST]generate a summarized comment for the following code:\n\n{code_snippet}\n[/INST]"
    result = llama_generator(prompt)[0]['generated_text']
    return result.split("[/INST]")[-1].strip() if "[/INST]" in result else result.strip()

def generate_comment_phi(code_snippet):
    prompt = f"# Python function:\n{code_snippet}\n# This function"
    output = phi_generator(prompt, max_new_tokens=60, do_sample=True, temperature=0.7)[0]["generated_text"]
    return output.split(" This function")[-1].strip().split("\n")[0]

# Unified handler
def generate_comment(code, model_choice):
    if model_choice == "CodeLlama":
        comment = generate_comment_codellama(code)
    else:
        comment = generate_comment_phi(code)
    return f"# {comment.strip()}"



# Gradio UI to try both Models

In [5]:
# Gradio UI
gr.Interface(
    fn=generate_comment,
    inputs=[
        gr.Textbox(lines=10, label="Paste your Python function here", placeholder="def is_even(n):\n    return n % 2 == 0"),
        gr.Radio(["CodeLlama", "Phi-1"], label="Choose model", value="CodeLlama")
    ],
    outputs=gr.Textbox(label="Generated Comment"),
    title="🔍 Code Comment Generator",
    theme="default"
).launch()

It looks like you are running Gradio on a hosted Jupyter notebook, which requires `share=True`. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://fdca278d358f2ce529.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




# 15 Samples testing

In [6]:
code_snippets = [
    """def factorial(n):
    return 1 if n == 0 else n * factorial(n - 1)""",

    """def is_palindrome(s):
    s = s.lower().replace(" ", "")
    return s == s[::-1]""",

    """def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a""",

    """def find_max(lst):
    max_val = lst[0]
    for val in lst:
        if val > max_val:
            max_val = val
    return max_val""",

    """def reverse_string(s):
    return s[::-1]""",

    """def count_vowels(s):
    return sum(1 for c in s.lower() if c in 'aeiou')""",

    """def flatten(lst):
    return [item for sublist in lst for item in sublist]""",

    """def read_lines(path):
    with open(path, 'r') as f:
        return f.readlines()""",

    """def is_anagram(s1, s2):
    return sorted(s1.lower()) == sorted(s2.lower())""",

    """def get_unique(lst):
    return list(set(lst))""",

    """def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1""",

    """def merge_dicts(d1, d2):
    return {**d1, **d2}""",

    """def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, int(n**0.5)+1):
        if n % i == 0:
            return False
    return True""",

    """def dedup(seq):
    seen = set()
    return [x for x in seq if not (x in seen or seen.add(x))]""",

    """def average(numbers):
    return sum(numbers) / len(numbers) if numbers else 0"""
]

# Run and print results
for i, code in enumerate(code_snippets, 1):
    print(f"\n🔢 Function {i}")
    print("📦 Code:")
    print(code)
    print("\n🧠 Generated Comment:")
    print(generate_comment_phi(code))



🔢 Function 1
📦 Code:
def factorial(n):
    return 1 if n == 0 else n * factorial(n - 1)

🧠 Generated Comment:
calculates the factorial of a given number n recursively.

🔢 Function 2
📦 Code:
def is_palindrome(s):
    s = s.lower().replace(" ", "")
    return s == s[::-1]

🧠 Generated Comment:
takes a string as input and returns a boolean value indicating whether the string is a palindrome or not. The function removes all non-alphabetic characters and converts the string to lowercase before checking if it is equal to its reverse. If the string is a palindrome, the function returns

🔢 Function 3
📦 Code:
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        a, b = b, a + b
    return a

🧠 Generated Comment:
generates the first n numbers of the Fibonacci sequence and checks if any of them are prime.

🔢 Function 4
📦 Code:
def find_max(lst):
    max_val = lst[0]
    for val in lst:
        if val > max_val:
            max_val = val
    return max_val

🧠 Generated Comment:
takes a

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


takes a list as input and returns a new list with only the unique elements of the input list.

🔢 Function 11
📦 Code:
def binary_search(arr, target):
    left, right = 0, len(arr) - 1
    while left <= right:
        mid = (left + right) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            left = mid + 1
        else:
            right = mid - 1
    return -1

🧠 Generated Comment:
takes in a sorted list of integers and a target integer, and returns the index of the target integer in the list if it exists, otherwise returns -1.

🔢 Function 12
📦 Code:
def merge_dicts(d1, d2):
    return {**d1, **d2}

🧠 Generated Comment:
takes two dictionaries as input and returns a new dictionary that contains all the key-value pairs from both input dictionaries. If a key exists in both input dictionaries, the value from d2 will overwrite the value from d1.

🔢 Function 13
📦 Code:
def is_prime(n):
    if n <= 1:
        return False
    for i in range(2, i

# BLEU Evaluation

In [7]:
!pip install nltk




In [8]:
from nltk.translate.bleu_score import sentence_bleu, SmoothingFunction
smoothie = SmoothingFunction().method4


In [9]:
def evaluate_bleu(reference_comments, generated_comments):
    scores = []
    for ref, gen in zip(reference_comments, generated_comments):
        ref_tokens = [ref.strip().split()]
        gen_tokens = gen.strip().split()
        score = sentence_bleu(ref_tokens, gen_tokens, smoothing_function=smoothie)
        scores.append(score)
    avg_score = sum(scores) / len(scores)
    return avg_score


In [12]:
reference_comments = [
    "Calculates the factorial of a number recursively.",
    "Sorts a list using bubble sort algorithm.",
    "Finds the maximum element in an array.",
    "TODO"
]

generated_comments = [
    "Returns the factorial of a number using recursion.",
    "Implements bubble sort to arrange elements.",
    "Searches for the maximum value in a list.",
    "TODO"
]


In [13]:
bleu_score = evaluate_bleu(reference_comments, generated_comments)
print(f"Average BLEU score: {bleu_score:.4f}")


Average BLEU score: 0.4195
