Toward a Deeper Understanding of Neural Language Models for Code Generation

Neural Language Models (NLMs) for code have rapidly progressed from research prototypes to commercial developer tools, as evidenced by the recent introductions of products such as VSCode IntelliSense and GitHub CoPilot. As these models mature, there is an increasing likelihood that they will be used to assist in building production-level software systems with large user bases. While it is clear that understanding the effectiveness of these models is becoming critical, current efficacy metrics often do not capture their real world performance; the lack of statistical rigor in model evaluations can lead to exaggerated claims as well. While, in general, the performance of NLMs for code appears promising, currently much is unknown about how such models function or their limitations in practical settings. While the prospect of increased developer productivity through automatic code generation is appealing, there is a pressing need to understand these limitations, and their implications, so that future research can focus on improving the correctness, safety, and user experience of deep code generators.

To this end, this paper introduces codegen, an evaluation methodology based upon statistical analyses of model predictions and causal inference to enable software oriented explanations to help interpret NLMs for code. While the theoretical underpinnings of \codegen are extensible to exploring different model properties, we provide a concrete instantiation that examines model errors according to programming language concepts, as well as how different usage settings impact model performance. Finally, we illustrate the types of results and insights our evaluation methodology can uncover by performing a case study on two popular deep learning architectures for the task of code completion

Current Repo Structure

This project follows the structure of fast.AI nbdev template. The following is the folder distribution:

docs: Associated web documentation of the library.
dvc-icodegen: DVC files for large python notebooks with experiments and datasets.
icodegen: Auto-generated documentation after nbdev compilation.
nbs: Original implementation of the interpretability librery with exploratory programming.
notebooks: Complementary notebooks for testing Large Lenguage Models.

The folders paper_nbs and scratch_nbs are intended to computational prototypes and their original paper.

Testbeds and Training Datasets

Raw Training Dataset: CodeSearchNet
Raw Testbeds: link
Pre-processed Testbeds: covariates

The authors will realease the remaining models/data upon ICSE'22 acceptance.

Experiments

The following link contains all the notebooks with the experiments proposed on the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commits
docs		docs
dvc-icodegen		dvc-icodegen
icodegen		icodegen
nbs		nbs
notebooks		notebooks
paper_nbs		paper_nbs
scratch_nbs		scratch_nbs
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml
requirements-clean.txt		requirements-clean.txt
requirements.txt		requirements.txt
settings.ini		settings.ini
setup.py		setup.py
start.sh		start.sh

License

WM-SEMERU/CausalSE

Folders and files

Latest commit

History

Repository files navigation

Toward a Deeper Understanding of Neural Language Models for Code Generation

Current Repo Structure

Testbeds and Training Datasets

Experiments

About

Resources

License

Stars

Watchers

Forks

Languages