This repository is the final project for the CS6501 Machine Learning for Software Reliability (Fall 2025) course at the University of Virginia (UVa). This repository contains the code, datasets, and experimental pipelines used in our study on evaluating lightweight open-source code LLMs for automated Python unit test generation. The project systematically benchmarks multiple prompting strategies and analyzes the resulting test quality across a wide range of metrics.
- src/: Source code organized into the following submodules:
- analysis/: Code for analyzing test smells.
- dataset/: Dataset preparation and processing.
- evaluation/: Evaluation scripts for model performance.
- example/: Example scripts demonstrating usage.
- inference/: Inference-related utilities.
- models/: Machine learning models for test smell detection.
- report/: Final report and slides.
- figure/: Visualizations and figures generated during analysis.
- results/: Stores results from experiments and analyses.
The documentation for this project is available in the report/ folder. It includes a detailed explanation of the methodology, experimental setup, and results. Refer to the report.pdf file for the complete project report.
The project requires Python. All necessary dependencies are listed in requirements.txt.
-
Download the dataset:
python src/dataset/download.py
This will download the required dataset into the appropriate folder.
-
Process the data:
python src/dataset/create_usable_function.py
This script processes the raw dataset and prepares it for use in experiments.
-
Generate unit tests using a code LLM:
python src/inference/inference.py --model_name <MODEL_NAME> --input_type <INPUT_TYPE> --prompt_type <PROMPT_TYPE> [other options]
Replace
<MODEL_NAME>,<INPUT_TYPE>, and<PROMPT_TYPE>with your desired settings. For example:python src/inference/inference.py --model_name Qwen/Qwen2.5-Coder-7B --input_type code --prompt_type minimal
See the script for all available arguments and options.
-
Evaluate the generated unit tests: Use the following scripts to analyze and evaluate your results. Replace
$FILE_PATHwith the path to your generated test file or directory.- Extract functions from generated tests:
python src/evaluation/extract_functions.py --file_path "$FILE_PATH" - Compute statistics on the extracted functions:
python src/evaluation/statistic.py --file_path "$FILE_PATH" - Analyze test smells in the generated tests:
python src/evaluation/test_smell_analysis.py --file_path "$FILE_PATH"
- Extract functions from generated tests:
You can use the following example to evaluate a specific result folder:
#!/bin/bash
FILE_PATH="results/test/qwen_3b/code/instruction_code/temp_0_2_tokens_512/assist_False/ver_1"
python src/evaluation/extract_functions.py --file_path "$FILE_PATH"
python src/evaluation/statistic.py --file_path "$FILE_PATH"
python src/evaluation/test_smell_analysis.py --file_path "$FILE_PATH"You can cite my project as follows:
@misc{llm_pytestgen_2025,
title={Lightweight Open-Source Models for Python Unit Test Generation},
author={Huu Binh Ta},
year={2025},
howpublished={CS6501 Final Project, University of Virginia},
url={https://github.com/Tahuubinh/llm_for_python_unittest_generator}
}
This project is licensed under the MIT License. See the LICENSE file for details.
