Evaluating LLMs for Documentation-to-Code Traceability

This repository contains the code, data, and results for the research paper "Evaluating the Use of LLMs for Documentation to Code Traceability".

This study performs a comprehensive evaluation of Large Language Models (LLMs) like Claude 3.5 Sonnet, GPT-4o, and o3-mini on their ability to establish traceability links between software documentation and source code. Using two novel datasets derived from the Crawl4AI and Unity Catalog open-source projects, the paper assesses three key capabilities:

Trace link identification accuracy.
The quality of relationship explanations.
Multi-step trace chain reconstruction.

The findings indicate that LLMs significantly outperform traditional baselines (TF-IDF, BM25, CodeBERT) but also highlight current limitations, providing a roadmap for future research and practical application in software development workflows.

Paper

The research paper can be found on arXiv: https://arxiv.org/abs/2506.16440

Repository Structure

data/: Contains the raw and processed datasets for the Crawl4AI and Unity Catalog projects, including the full documents, code artifacts, and ground-truth trace links.
src/: Contains all the Python source code for running the experiments.
- src/experiments/: Scripts for each research question (RQ1, RQ2, RQ3) and baseline evaluations.
- src/utils/: Utility scripts for data loading, metrics calculation, and interfacing with LLMs.
- src/config/: Configuration files for the experiments.
results/: Stores the raw and aggregated results generated by the experiment scripts.

Setup and Installation

Clone the repository:

git clone <repository-url>
cd evaluating-llm-doc-code-traceability

Create a virtual environment and activate it:

python3 -m venv venv
source venv/bin/activate
# On Windows, use: venv\Scripts\activate

Install the required dependencies:
```
pip install -r requirements.txt
```
Set up your environment variables:
- Copy the example .env file:
```
cp .env.example .env
```
- Add your API keys for Anthropic and OpenAI to the newly created .env file.

Running the Experiments

Configure the experiment:
- Modify config.py to select which Research Questions (RUN_RQS) to execute and to set the number of runs (NUM_RUNS).
Execute the experiment scripts:
- The main scripts for each research question are located in the src/experiments/ directory. Run them directly, for example:
```
python src/experiments/rq1_traceability.py
python src/experiments/rq2_relationships.py
python src/experiments/rq3_pathways.py
```
- Results will be saved in the results/ directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating LLMs for Documentation-to-Code Traceability

Paper

Repository Structure

Setup and Installation

Running the Experiments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
results		results
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Evaluating LLMs for Documentation-to-Code Traceability

Paper

Repository Structure

Setup and Installation

Running the Experiments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages