Skip to content

aseec-lab/llms-for-code-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llmsfor-code-analysis

Introduction

This is the online appendix for our paper Large Language Models for Code Analysis: Do LLMs Really Do Their Job?.

Data

General Structure

The dataset we use consists of:

  • Non-Obfuscated Code
  1. C: Selected code sample from POJ-104 dataset and classic C benchmarks (Linpack, etc.);
  2. JavaScript: The Octane benchmark and some web apps from Github;
  3. Python: Selected code samples from Google CodeSearchNet dataset;
  • Obfuscated Code
  1. Obfuscated JavaScript code (obtained by applying different obfuscation tchniques to the JavaScript branch of our Non-Obfuscated Code dataset);
  2. Winner code of Internet Obfuscated C Code Contest (IOCCC);

Results

Results of our analysis include responses of different models on different code sample.

Citation

@article{fang2023large,
  title={Large language models for code analysis: Do llms really do their job?},
  author={Fang, Chongzhou and Miao, Ning and Srivastav, Shaurya and Liu, Jialin and Zhang, Ruoyu and Fang, Ruijie and Asmita, Asmita and Tsang, Ryan and Nazari, Najmeh and Wang, Han and others},
  journal={arXiv preprint arXiv:2310.12357},
  year={2023}
}