This is the official repository for the ISSTA 2026 paper: Generating Project-Specific Test Cases with Validation Intention
- python 3.11.7
- pytorch 2.2.2
- openai 1.30.5
- rank-bm25 0.2.2
- nltk 3.8.1
- beautifulsoup4 4.12.2
- javalang 0.13.0
- matplotlib 3.8.0
- tqdm 4.65.0
- tree-sitter 0.20.1
- JDK 1.8.0_311
- JDK 17.0.12
- Apache Maven 3.9.6
- JDTLS 1.9.0
- PIT/Pitest 1.17.0
- downloaded automatically by Maven
pitest-junit5-plugin1.2.1 is used for JUnit 5 projects
Note
Make sure JDK 1.8.0_311 and JDK 17.0.12 are installed correctly.
$ java --version
java version "1.8.0_311"
Java(TM) SE Runtime Environment (build 1.8.0_311-b11)$ java17 --version
java 17.0.12 2024-07-16 LTS
Java(TM) SE Runtime Environment (build 17.0.12+8-LTS-286)$ mvn --version
Apache Maven 3.9.6
Maven home: /usr/share/maven
Java version: 1.8.0_311, vendor: Oracle Corporation, runtime: ...Before running the experiments, we recommend running the following command in each repository under data/repos to ensure that all projects can be compiled and tested successfully:
mvn clean testThe following examples use the Java project spark and the LLM gpt-5-mini. You can replace them with other supported project names and LLM names.
- Download the dataset into the
./datadirectory. - Navigate to the
./datadirectory. - Run
tar -xzvf dataset.tar.gz.
In agents.py:
- Set
GPT_KEY,GPT_BASE_URL,DEEPSEEK_KEY, andDEEPSEEK_BASE_URLto a usable OpenAI API key.
(can skip if generated descriptions exist in data/test_desc_dataset)
To generate validation intention descriptions for all tests, run:
python test_desc_generator.pyThe validation intention descriptions obtained through reverse engineering are not used as target test inputs. They are used only as validation intention descriptions for historical candidate tests.
-
cd main/ -
Generate candidate validation intention descriptions for focal methods.
This step can be skipped if the generated descriptions already exist in
data/test_desc_from_fm_dataset.# generated candidate validation description for focal methods. python -u generate_desc_from_fm.py --project_name spark # match generated candidate validation descriptions to target tests. python -u match_desc_with_tc.py --project_name spark
-
Generate tests for java project
sparkbased on LLM-inferred validation intention descriptionspython -u generate_test.py --project_name spark --llm_name gpt-5-mini --junit_version 4
cd main/- Generate tests for java project
sparkbased on validation descriptions written by humanspython -u generate_test_using_manual_desc.py --project_name spark --llm_name gpt-5-mini --junit_version 4
cd cms_calculation/- Run PIT on the ground-truth test cases:
python -u main.py --project_name spark --llm_name gpt-5-mini --ground_truth
- Run PIT on the generated test cases:
python -u main.py --project_name spark --llm_name gpt-5-mini
- Compute CMS scores (saved to
data/collected_mutation_scores/<llm_name>/<project_name>.csv):python -u calculate_cms.py --project_name spark --llm_name gpt-5-mini
cd main/- Generate tests using different validation intention settings.
Objectivepython -u generate_test.py --project_name spark --llm_name gpt-5-mini --junit_version 4 --test_desc_setting obj
Objective + Preconditionpython -u generate_test.py --project_name spark --llm_name gpt-5-mini --junit_version 4 --test_desc_setting obj_pre
Objective + Expected Resultspython -u generate_test.py --project_name spark --llm_name gpt-5-mini --junit_version 4 --test_desc_setting obj_exp
- No Validation Intention Description
python -u generate_test.py --project_name spark --llm_name gpt-5-mini --junit_version 4 --test_desc_setting none
-
Collect historical candidates for each test
cd empirical_study/analyze_feasible python collect_temporal_candidates.py --project_name spark --num_workers 4 --cleanup > ./collect_temp_candidates_spark.log
-
Analyze Reference Availability and Referability Level
python analyze_thres_for_retrieval_temporal.py --project_name spark > ./analyze_spark.log
Common problems and solutions are documented in FAQ.md.
If you find this repository useful, please cite our paper:
@article{qi2026generating,
title={Generating Project-Specific Test Cases with Requirement Validation Intention},
author={Qi, Binhang and Lin, Yun and Weng, Xinyi and Huang, Yuhuan and Liu, Chenyan and Sun, Hailong and Dong, Jin Song},
journal={Proceedings of the ACM on Software Engineering},
number={ISSTA},
year={2026}
}