Bias Name	Time when the bias was revealed	Study Revealing the bias	Paper Link	Bias Definition	Bias Mitigation Judgement Criteria	presler2021sqlreair	venugopal2020modification	yuan2020making	yuan2020toward	yuan2020evolutionary	villanueva2020novelty	chen2020contract	xu2020restore	lutellier2020coconut	li2020dlfix	bohme2020human	oo2020automatic	koyuncu2020fixminer	motwani2020automatically	bian2021refining	khalilian2021cgenprog	gao2021beyond	shariffdeen2021concolic	ye2021neural	jiang2021cure	baudry2021software	trujillo2021novel	mesecan2021crnrepair	qin2021impact	lou2021does	kechagia2021evaluating	liu2021critical	He2021A	yang2021evaluating	abdessalem2020automated	yu2020smart	jiang2020input	shariffdeen2020automated	motwani2020quality	liu2020efficiency	ginelli2020comprehensive	yes	not applicable	no	#	total
Defect classes selection bias	2014	A critical review of "automatic patch generation learned from human-written patches": Essay on the problem statement and the evaluation of automatic software repair	https://dl.acm.org/doi/abs/10.1145/2568225.2568324	Using different defect classes when evaluating multiple APR techniques.	Rigorously, automatic repair approaches can be compared only if they address similar defect classes. It would also be considered that the bias is mitigated, to a certain extent, when multiple APR techniques are evaluated on the same datasets.	not applicable	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	not applicable	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	34	2	0	0	36
Fix acceptability metric bias	2014	A critical review of "automatic patch generation learned from human-written patches": Essay on the problem statement and the evaluation of automatic software repair	https://dl.acm.org/doi/abs/10.1145/2568225.2568324	Using fix acceptability as an evaluation metric.	The bias is mitigated if the fix acceptability metric (i.e., manually judging if the APR-generated patch is acceptable) is not used in the APR evaluation.	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	36	0	0	0	36
Non-manual validation bias	2015	An analysis of patch plausibility and correctness for generate-and-validate patch generation systems	https://dl.acm.org/doi/abs/10.1145/2771783.2771791	Assessing APR-generated patches without manual validation.	The bias is mitigated if the APR-generated patch is validated manually.	yes	no	yes	yes	yes	no	yes	yes	yes	yes	no	yes	yes	yes	yes	yes	yes	yes	yes	yes	no	no	yes	yes	no	yes	yes	yes	yes	no	yes	no	yes	no	yes	yes	27	0	9	0	36
Non-independent test validation bias	2015	Is the cure worse than the disease? overfitting in automated program repair	https://dl.acm.org/doi/abs/10.1145/2786805.2786825	Assessing APR-generated patches without independent tests.	The bias is mitigated if the APR-generated patch is assessed by held-out tests that are not included in the original test suite used in patch validation.	no	no	no	no	no	no	yes	not applicable	no	no	yes	no	no	yes	yes	no	yes	not applicable	no	no	yes	no	yes	no	no	no	no	yes	no	no	yes	yes	yes	yes	no	yes	13	2	21	0	36
NCP vs. NTCE metric bias	2018	How to Measure the Performance of Automated Program Repair	https://ieeexplore.ieee.org/abstract/document/8612557/	Using Number of Test Case Executions (NTCE for short) as an efficiency metric rather than Number of Candidate Patches before a valid patch is found (NCP for short).	The bias does not exist at all if the repair efficiency is not evaluated in the paper. The bias is mitigated if the NCP metric is used during the APR evaluation.	not applicable	no	no	yes	yes	no	not applicable	yes	yes	yes	not applicable	yes	yes	no	yes	yes	not applicable	yes	not applicable	yes	not applicable	yes	yes	not applicable	no	yes	yes	not applicable	yes	no	no	no	yes	yes	yes	not applicable	19	9	8	0	36
Defect classes evaluation bias	2018	Do automated program repair techniques repair hard and important bugs?	https://link.springer.com/article/10.1007/s10664-017-9550-0	Whether APR techniques can repair hard and important bugs are not evaluated.	Rigorously, it is expected that the researchers could discuss the complexity and importance of repaired bugs using the metrics proposed by the bias study (e.g., Priority of the Defect). It would also be considered that the bias is mitigated, to a certain extent, when the repaired defect are publicly available for further check or are dicussed in the paper.	yes	yes	yes	yes	yes	no	yes	yes	yes	yes	yes	no	yes	no	yes	no	yes	yes	yes	yes	yes	no	yes	yes	no	yes	yes	yes	yes	yes	yes	no	yes	yes	yes	yes	29	0	7	0	36
Only-manual validation bias	2019	On reliability of patch correctness assessment	https://ieeexplore.ieee.org/abstract/document/8812054/	Assessing APR-generated patches only by author annotation (i.e., the patch correctness is validated by authors of the APR tool).	The bias does not exist at all if no manual check are perfermed for APR-generated patch correctness assessment. Or the bias is mitigated if both held-out tests and manual check are used for patch correctness assessment.	no	not applicable	no	no	no	not applicable	yes	not applicable	no	no	yes	no	no	yes	yes	no	yes	not applicable	no	yes	yes	no	yes	no	no	no	no	yes	no	not applicable	yes	not applicable	yes	yes	no	yes	13	6	17	0	36
Only-independent test validation bias	2019	On reliability of patch correctness assessment	https://ieeexplore.ieee.org/abstract/document/8812054/	Assessing APR-generated patches only by held-out tests (i.e., tests that are not included in the original test suite used in patch validation).	The bias does not exist at all if no held-out tests are used for APR-generated patch correctness assessment. Or the bias is mitigated if both held-out tests and manual check are used for patch correctness assessment.	not applicable	not applicable	not applicable	not applicable	not applicable	not applicable	yes	yes	yes	not applicable	no	not applicable	not applicable	yes	yes	not applicable	yes	yes	yes	not applicable	yes	not applicable	yes	not applicable	not applicable	not applicable	yes	yes	yes	not applicable	yes	no	yes	no	not applicable	yes	16	17	3	0	36
Fault localization bias	2019	You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems	https://ieeexplore.ieee.org/abstract/document/8730164/	Using inconsistent fault localization configurations when evaluating APR techniques.	The bias does not exist at all when no fault localization is performed. The bias is mitigated if multiple APR techniques use the same fault localization confirguration during evaluation.	not applicable	yes	yes	yes	yes	yes	not applicable	not applicable	yes	yes	yes	yes	yes	yes	yes	yes	not applicable	not applicable	yes	yes	not applicable	yes	yes	yes	yes	no	yes	yes	no	yes	yes	no	not applicable	yes	yes	not applicable	25	8	3	0	36
Subject bugs selection bias	2019	Attention please: Consider Mockito when evaluating newly proposed automated program repair techniques	https://dl.acm.org/doi/abs/10.1145/3319008.3319349	Excluding Mockito bugs when evaluating APR techniques with Defects4J.	The bias does not exist at all when Defects4j is not used. The bias is mitigated if the Mockito bugs are included when using Defects4J for evaluation.	not applicable	no	no	no	no	yes	yes	no	yes	yes	yes	no	no	yes	not applicable	not applicable	not applicable	yes	yes	yes	yes	yes	not applicable	yes	yes	no	yes	yes	yes	not applicable	not applicable	yes	not applicable	no	yes	yes	19	8	9	0	36
Flaky test inclusion bias	2019	"Flakime: Laboratory-controlled test flakiness impact assessment. a case study on mutation testing and program repair" & "On the Impact of Flaky Tests in Automated Program Repair"	https://arxiv.org/abs/1912.03197 & https://ieeexplore.ieee.org/abstract/document/9425948/	Including flaky tests when evaluating APR techniques.	Rigorously, the bias is very hard to mitigate as it is hard to identify if there exist flaky tests in the bug dataset. However, eliminating flaky tests is still an open challenge. Thus, it would also be considered that the bias is mitigated, to a certain extent, when only a few tests are used or curated datasets are used.	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	yes	no	yes	yes	yes	no	no	yes	yes	yes	yes	yes	yes	yes	yes	no	no	no	yes	yes	yes	yes	30	0	6	0	36
Benchmark selection bias	2019	Empirical review of Java program repair tools: a large-scale experiment on 2,141 bugs and 23,551 repair attempts	https://dl.acm.org/doi/abs/10.1145/3338906.3338911	Using a single dataset when evaluating APR techniques.	Rigorously, the bias is mitigated if multiple datasets are used for APR evaluation. It would also be considered that the bias is mitigated, to a certain extent, when buggy programs from different sources are used.	no	no	no	no	no	no	yes	no	yes	yes	not applicable	no	no	yes	no	no	yes	yes	yes	yes	not applicable	no	no	no	no	yes	no	not applicable	no	yes	no	yes	yes	no	no	yes	13	3	20	0	36
NCP vs. Time metric bias	2020	On the Efficiency of Test Suite based Program Repair: A Systematic Assessment of 16 Automated Repair Systems for Java Programs	https://dl.acm.org/doi/abs/10.1145/3377811.3380338	Using repair time as an efficiency metric rather than the NCP.	The bias does not exist at all if the repair efficiency is not evaluated in the paper. The bias is mitigated if the NCP metric is discussed during the APR evaluation.	not applicable	#	#	#	#	#	#	#	#	#	not applicable	#	#	#	yes	no	not applicable	not applicable	not applicable	yes	not applicable	no	yes	not applicable	yes	yes	yes	not applicable	yes	#	#	#	yes	#	#	not applicable	8	9	2	17	36
Tool exception bias	2020	Understanding the Non-Repairability Factors of Automated Program Repair Techniques	https://ieeexplore.ieee.org/abstract/document/9359317	Not addressing exceptions of APR techniques during the evaluation.	The bias does not exist at all if repair recall metric is not calculated. The bias is mitigated if bug fixing attempts that ended with unexpected results are excluded when calculating unexpected results.	not applicable	#	#	#	#	#	#	#	#	#	not applicable	#	#	#	not applicable	not applicable	not applicable	not applicable	not applicable	not applicable	not applicable	not applicable	not applicable	no	not applicable	not applicable	not applicable	not applicable	not applicable	#	#	#	yes	#	#	not applicable	1	17	1	17	36
Bug processing bias	2021	A critical review on the evaluation of automated program repair systems	https://www.sciencedirect.com/science/article/pii/S0164121220302156	Using future test cases that are not available at the time of the bug is reported for dataset construction.	The bias is mitigated when buggy programs with future test cases are excluded during the APR evaluation.	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	0	0	0	36	36
NTCE vs. NCP metric bias	2021	How Does Regression Test Selection Affect Program Repair? An Extensive Study on 2 Million Patches	https://arxiv.org/abs/2105.07311	Using NCP as an efficiency metric rather than NTCE.	The bias does not exist at all if the repair efficiency is not evaluated in the paper. The bias is mitigated if the NTCE metric (i.e., Number of Test Case Executions) is used during the APR evaluation.	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	0	0	0	36	36
Inaccurate ground truth bias	2021	Is the Ground Truth Really Accurate? Dataset Purification for Automated Program Repair	https://ieeexplore.ieee.org/abstract/document/9426017/	Using inaccurate ground truth (i.e., human-written patches) to assess correctness of APR-generated patches.	"Rigorously, the bias is mitigated if no inaccurate ground truth are used for patch correctness assessment. Similar to the ""Flaky test inclusion bias"", it might be very hard for us to identify if the ground truth patch is really accurate."	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	#	0	0	0	36	36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results_of_investigation_on_known_bias_mitigation.md

results_of_investigation_on_known_bias_mitigation.md

Files

results_of_investigation_on_known_bias_mitigation.md

Latest commit

History

results_of_investigation_on_known_bias_mitigation.md

File metadata and controls