Firstly, We make the CodeF dataset publicly available. You can find our dataset, and in the reademe document we have detailed information about the CodeF dataset. In addition, we provide a table of the difficulty distribution of the problems in both parts of the dataset, CodeF Pre2021-9 and CodeF Post2021-9. Finally, you can get the raw data of the CodeF dataset unprocessed by code de-duplication here.
We make the Knowledge Libraries (Knowledge Description, Knowledge Pseudo-Code and Knowledge Step of Pseudo-Code) that we use in Knowledge-Aware Code Generation with Large Language Models publicly available.
We provide a detailed illustration of input and output for both the Prompt Engineering and Coding Stages.