Skip to content

Alor-e/evaluating-llm-doc-code-traceability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Evaluating LLMs for Documentation-to-Code Traceability

This repository contains the code, data, and results for the research paper "Evaluating the Use of LLMs for Documentation to Code Traceability".

This study performs a comprehensive evaluation of Large Language Models (LLMs) like Claude 3.5 Sonnet, GPT-4o, and o3-mini on their ability to establish traceability links between software documentation and source code. Using two novel datasets derived from the Crawl4AI and Unity Catalog open-source projects, the paper assesses three key capabilities:

  1. Trace link identification accuracy.
  2. The quality of relationship explanations.
  3. Multi-step trace chain reconstruction.

The findings indicate that LLMs significantly outperform traditional baselines (TF-IDF, BM25, CodeBERT) but also highlight current limitations, providing a roadmap for future research and practical application in software development workflows.

Paper

The research paper can be found on arXiv: https://arxiv.org/abs/2506.16440

Repository Structure

  • data/: Contains the raw and processed datasets for the Crawl4AI and Unity Catalog projects, including the full documents, code artifacts, and ground-truth trace links.
  • src/: Contains all the Python source code for running the experiments.
    • src/experiments/: Scripts for each research question (RQ1, RQ2, RQ3) and baseline evaluations.
    • src/utils/: Utility scripts for data loading, metrics calculation, and interfacing with LLMs.
    • src/config/: Configuration files for the experiments.
  • results/: Stores the raw and aggregated results generated by the experiment scripts.

Setup and Installation

  1. Clone the repository:

    git clone <repository-url>
    cd evaluating-llm-doc-code-traceability
  2. Create a virtual environment and activate it:

    python3 -m venv venv
    source venv/bin/activate
    # On Windows, use: venv\Scripts\activate
  3. Install the required dependencies:

    pip install -r requirements.txt
  4. Set up your environment variables:

    • Copy the example .env file:
      cp .env.example .env
    • Add your API keys for Anthropic and OpenAI to the newly created .env file.

Running the Experiments

  1. Configure the experiment:

    • Modify config.py to select which Research Questions (RUN_RQS) to execute and to set the number of runs (NUM_RUNS).
  2. Execute the experiment scripts:

    • The main scripts for each research question are located in the src/experiments/ directory. Run them directly, for example:
      python src/experiments/rq1_traceability.py
      python src/experiments/rq2_relationships.py
      python src/experiments/rq3_pathways.py
    • Results will be saved in the results/ directory.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages