Skip to content

FloridSleeves/RobustAPI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RobustAPI

The official repo for the paper Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation (AAAI'24).

Alt text

In this dataset, we collect 1208 coding questions (dataset/question.jsonl) from StackOverflow on 24 representative Java APIs (see the details in dataset/api_list.txt). We summarize the use patterns of these APIs (eval/pat_list.txt) and evaluate them on popular LLMs including GPT-3.5, GPT-4, Llama, PolyCoder and Vicuna. Hugging Face

Setup

See llama, Vicuna, GPT for setting up instructions.

Install the dependencies:

pip install -r requirements.txt

Prompts

To generate responses from the large language models, see scripts in scripts/ask*.

Evaluator

To evaluate the API misuse rate in the question answers, see scripts in scripts/eval*

Since the API checker is written in Java, you need to have Java Runtime Environment installed on your machine. In our experimetns, it is validated to work under version OpenJDK 11.0.20.1.

We acknowledge ICSE'18 paper ExampleCheck, based on which we build the checker.

Results of Evaluation

The code responses are in results/. Each model has a directory, in which every json file corresponds to the response from the large language model to the Stack Overflow questions. The numbering follows the same numbering in dataset/question.jsonl.

If you find our work useful, please cite the paper:

@misc{zhong2023chatgpt,
      title={Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation}, 
      author={Li Zhong and Zilong Wang},
      year={2023},
      eprint={2308.10335},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

A dataset to study API misuse in code generated by models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published