This is the online appendix for our paper Large Language Models for Code Analysis: Do LLMs Really Do Their Job?.
The dataset we use consists of:
- Non-Obfuscated Code
- C: Selected code sample from POJ-104 dataset and classic C benchmarks (Linpack, etc.);
- JavaScript: The Octane benchmark and some web apps from Github;
- Python: Selected code samples from Google CodeSearchNet dataset;
- Obfuscated Code
- Obfuscated JavaScript code (obtained by applying different obfuscation tchniques to the JavaScript branch of our Non-Obfuscated Code dataset);
- Winner code of Internet Obfuscated C Code Contest (IOCCC);
Results of our analysis include responses of different models on different code sample.
@article{fang2023large,
title={Large language models for code analysis: Do llms really do their job?},
author={Fang, Chongzhou and Miao, Ning and Srivastav, Shaurya and Liu, Jialin and Zhang, Ruoyu and Fang, Ruijie and Asmita, Asmita and Tsang, Ryan and Nazari, Najmeh and Wang, Han and others},
journal={arXiv preprint arXiv:2310.12357},
year={2023}
}