This repo include codes that we used for the experiments in our ACL 2023 paper (main):
Python Code Generation by Asking Clarification Questions
Haau-Sing Li, Mohsen Mesgar, André F. T. Martins, Iryna Gurevych
Contact person: Haau-Sing Li
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
⚠️ This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
Installing packages from
. Note that for ranking model please refer to our fork oftransformer_rankers
. (We usePython: 3.9.12
andcuda 11.6
) -
Download our dataset.
(Optional) If you want to generate the dataset files for training on different modules, you can use the following script.
python3 --/path/to/data/
- Training
- Clarification Need Prediction
python3 --model_name $MODEL \
--data_dir /path/to/data \
--model_dir /path/to/saved/models \
--seed $SEED
- CQ Ranking
python3 --model_name $MODEL --seed $SEED --num_epochs $NUM_EPOCHS \
--negative_sampling_strategy $SAMPLING_STRATEGY \
--train_batch_size 32 --eval_batch_size 1024 \
--learning_rate 5e-5 --max_seq_len 192 \
--save_dir /path/to/dir
- Code Generation
python3 {t5|plbart|causal_lm}.py --data_dir /path/to/data
--model_dir /path/to/saved/models
--model_name $MODEL
--data_affix $DATA_AFF
--seed $SD
--num_train_epochs #I use 40 since it converges only after these many.
- You should run code from
to evaluate models, at least for rankers and code generator as causal LMs (since they take more time).
- Inference on the whole pipeline. You should run files from
. the order should be: