We propose a test suite CodeGuard+ to evaluate both security and correctness of Code LLMs. We also propose to use constrained decoding techniques to make Code LLMs to generate secure and correct code. More details can be found in the paper.
The directory structure of this repository is as follows:
.
|-- test_suite # Our test suite, CodeGuard+
|-- outputs.tar.gz # Programs generated by CodeGen, SVEN, and StarCoder2 using different decoding methods
|-- codegen
|-- sven
|-- starcoder2
Our test suite CodeGuard+ is adapted from this paper. It now includes 23 prompts, along with corresponding unit tests and CodeQL queries. You can find prompts and CodeQL queries in test_suite/new_trained
and unit_tests in test_suite/unit_tests
.
We evaluate three Code LLMs, including CodeGen-2.7B, SVEN, and StarCoder2-3B.
For CodeGen and SVEN, we evaluate two unconstrained decoding methods, Nucleus Sampling and Beam Sampling, and one constrained decoding method, Constrained Beam Sampling. For StarCoder2, we evaluate two unconstrained decoding methods, Nucleus Sampling and Beam Sampling, and two constrained decoding methods, Constrained Beam Sampling and MuCoLa.
During our evaluations, we generated programs using different Code LLMs and decoding methods. All outputs can be found in outputs.tar.gz
. After unzipping, you will find three folders corresponding to three Code LLMs. For instance, the outputs of StarCoder2 + Constrained Beam Sampling can be found in outputs/starcoder2/star-cbs
.
This repository is still under construction, thank you for your patience!