You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Resources to check - checked mark means we have read the resource thoroughly (Ongoing effort, feel free to add and/or update):
Resources from Tiffany:
Rohan Alexander, Lindsay Katz, Callandra Moore, Michael Wing-Cheung Wong, & Zane Schwartz. (2024). Evaluating the Decency and Consistency of Data Validation Tests Generated by LLMs.
Gawande, A. (2010). Checklist manifesto, the (HB). Penguin Books India.
Pineau, J., Vincent-Lamarre, P., Sinha, K., Lariviere, V., Beygelzimer, A., d'Alche-Buc, F., Fox, E., & Larochelle, H. (2021). Improving reproducibility in machine learning research (a report from the neurips 2019 reproducibility program). Journal of Machine Learning Research, 22(164), 1–20.
Jeremy Jordan. (2020). Effective testing for machine learning systems.
Eugene Yan. (2020). How to Test Machine Learning Code and Systems. .
Ribeiro, M., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond accuracy: Behavioral testing of NLP models with CheckList. arXiv preprint arXiv:2005.04118.
Focuses on NLP models
Three kinds of post-training tests: Invariance Tests, Directional Expectation Tests and Minimum Functionality Tests.
Cheng, D., Cao, C., Xu, C., & Ma, X. (2018). Manifesting Bugs in Machine Learning Code: An Explorative Study with Mutation Testing. In 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS) (pp. 313-324).
Openja, M., Khomh, F., Foundjem, A., Ming, Z., Abidi, M., Hassan, A., & others (2023). Studying the Practices of Testing Machine Learning Software in the Wild. arXiv preprint arXiv:2312.12604.
Silva, S., & De França, B. (2023). A Case Study on Data Science Processes in an Academia-Industry Collaboration. In Proceedings of the XXII Brazilian Symposium on Software Quality (pp. 1–10).
Houssem Ben Braiek, & Foutse Khomh (2020). On testing machine learning programs. Journal of Systems and Software, 164, 110542.
Wattanakriengkrai, S., Chinthanet, B., Hata, H., Kula, R., Treude, C., Guo, J., & Matsumoto, K. (2022). GitHub repositories with links to academic papers: Public access, traceability, and evolution. Journal of Systems and Software, 183, 111117.
Schäfer, M., Nadi, S., Eghbali, A., & Tip, F. (2024). An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. IEEE Transactions on Software Engineering, 50(1), 85-105.
Arghavan Moradi Dakhel, Amin Nikanjam, Vahid Majdinasab, Foutse Khomh, & Michel C. Desmarais (2024). Effective test generation using pre-trained Large Language Models and mutation testing. Information and Software Technology, 107468.
Resources from our own research:
Yu, B. (2017). Testing on the Toilet: Keep Cause and Effect Clear.
Kent, K. (2024). Prefer Narrow Assertions in Unit Tests.
Yu, B. (2018). Testing on the Toilet: Keep Tests Focused.
Winters, T. (2024). Test Failures Should Be Actionable.
Trenk, A. (2014). Testing on the toilet: Writing descriptive test names.
Augustus Odena, & Ian Goodfellow. (2018). TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing.
Coverage-Guided Fuzzing, similar to mutation testing?
"quantify the area covered by radial neighborhoods around these activation vectors"
The text was updated successfully, but these errors were encountered:
Resources to check - checked mark means we have read the resource thoroughly (Ongoing effort, feel free to add and/or update):
The text was updated successfully, but these errors were encountered: