Skip to content

Tahuubinh/llm_for_python_unittest_generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LIGHTWEIGHT OPEN-SOURCE MODELS FOR PYTHON UNIT TEST GENERATION

System Illustration

This repository is the final project for the CS6501 Machine Learning for Software Reliability (Fall 2025) course at the University of Virginia (UVa). This repository contains the code, datasets, and experimental pipelines used in our study on evaluating lightweight open-source code LLMs for automated Python unit test generation. The project systematically benchmarks multiple prompting strategies and analyzes the resulting test quality across a wide range of metrics.

Project Structure

  • src/: Source code organized into the following submodules:
    • analysis/: Code for analyzing test smells.
    • dataset/: Dataset preparation and processing.
    • evaluation/: Evaluation scripts for model performance.
    • example/: Example scripts demonstrating usage.
    • inference/: Inference-related utilities.
    • models/: Machine learning models for test smell detection.
  • report/: Final report and slides.
  • figure/: Visualizations and figures generated during analysis.
  • results/: Stores results from experiments and analyses.

Documentation

The documentation for this project is available in the report/ folder. It includes a detailed explanation of the methodology, experimental setup, and results. Refer to the report.pdf file for the complete project report.

Requirements

The project requires Python. All necessary dependencies are listed in requirements.txt.

How to Run

  1. Download the dataset:

    python src/dataset/download.py

    This will download the required dataset into the appropriate folder.

  2. Process the data:

    python src/dataset/create_usable_function.py

    This script processes the raw dataset and prepares it for use in experiments.

  3. Generate unit tests using a code LLM:

    python src/inference/inference.py --model_name <MODEL_NAME> --input_type <INPUT_TYPE> --prompt_type <PROMPT_TYPE> [other options]

    Replace <MODEL_NAME>, <INPUT_TYPE>, and <PROMPT_TYPE> with your desired settings. For example:

    python src/inference/inference.py --model_name Qwen/Qwen2.5-Coder-7B --input_type code --prompt_type minimal

    See the script for all available arguments and options.

  4. Evaluate the generated unit tests: Use the following scripts to analyze and evaluate your results. Replace $FILE_PATH with the path to your generated test file or directory.

    • Extract functions from generated tests:
      python src/evaluation/extract_functions.py --file_path "$FILE_PATH"
    • Compute statistics on the extracted functions:
      python src/evaluation/statistic.py --file_path "$FILE_PATH"
    • Analyze test smells in the generated tests:
      python src/evaluation/test_smell_analysis.py --file_path "$FILE_PATH"

Example evaluation script

You can use the following example to evaluate a specific result folder:

#!/bin/bash
FILE_PATH="results/test/qwen_3b/code/instruction_code/temp_0_2_tokens_512/assist_False/ver_1"

python src/evaluation/extract_functions.py --file_path "$FILE_PATH"
python src/evaluation/statistic.py --file_path "$FILE_PATH"
python src/evaluation/test_smell_analysis.py --file_path "$FILE_PATH"

Citation

You can cite my project as follows:

@misc{llm_pytestgen_2025,
  title={Lightweight Open-Source Models for Python Unit Test Generation},
  author={Huu Binh Ta},
  year={2025},
  howpublished={CS6501 Final Project, University of Virginia},
  url={https://github.com/Tahuubinh/llm_for_python_unittest_generator}
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

Lightweight public LLM to generate unit test suites

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages