Skip to content

apaleyes/code-gen-sensitivity

Repository files navigation

Code generation sensitivity

This repository accompanies paper "Code Roulette: How Prompt Variability Affects LLM Code Generation", LLM4Code workshop, ICSE 2026

To install dependencies:

  1. (Recommended) Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate
  1. Install dependencies
pip install -r requirements.txt
  1. Clone https://github.com/Etamin/TSED, you should end up with TSED folder. For example:
git clone https://github.com/Etamin/TSED.git

Guided tour:

  • Main experiment does the augment-generate-measure test. It saves a JSON file with inputs and metrics that analysis looks at. LLMs currently supported can be found in models folder. Note that this is only an example of a synthetic evaluation pipeline that hopefully illustrated the idea.

  • experimental_setup contains implementation of the main experiment, expanded over multiple models, datasets and augmentation methods. Any folder starting with "augmented_datasets" prefix contains results, while other files implement steps of the pipeline: augment datasets -> generate LLM responses -> compute metrics -> create charts.

    • Pipeline
      • augment_datasets: take in the datasets jsons, and produce new ones with additional versions of the tasks augmented to 10 different level.
      • get_llm_responses: take each question in the augmented dataset and ask it of multiple llms, noting responses.
      • get_experiment_scores: using the responses, calculate key metrics like TSED.
      • get_experiment_charts: charting logic summarising the experiment scores.
    • Folders containing intermediate results corresponding to the above stages
    • Helper functionality like LLM access, original datasets folder, additional plotting logic
  • personas_experiments contains all code necessary to replicate persona related experiments.

  • Dataset is the dataset we compiled specifically for this study.

  • test unit tests for some key functions.

  • sandbox contains different dabbles, small experiments and API tries. dataset provides some data for it. multi-stage-pipeline is an example of what multi-stage code generate pipeline with LLM could look like. It was created with ChatGPT.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors