Group5 - paper
This project uses the ManyTypes4TypeScript
dataset. For further details, we recommend visitng the CodeXGLUE repository, the Zenodo record, or the consulting authors' MSR '22 publication. Due to size limitations, we do not include the dataset in this repository. To obtain the dataset and preprocess it:
First, enter the dataset
directory:
cd dataset
Second, pull the Zenodo record of the dataset:
wget https://zenodo.org/record/6387001/files/ManyTypes4TypeScript.tar.gz?download=1 -O ManyTypes4TypeScript.tar.gz
Unzip the downloaded archive:
tar -xvzf ManyTypes4TypeScript.tar.gz
Finally, install the required dependencies and preprocess the data into the appropriate format using the preprocess_dataset
python script:
pip install -r requirements.txt
python preprocess_dataset.py -v <vocab-size>
Substitute <vocab-size>
by the desired type vocabulary size. For reproduction purposes, use 50000
. Alternatively, the ManyTypes4TypeScript
dataset can be used directly through python's datasets
library. You can toggle this by using the --use_local_dataset True
command line argument in the Docker container below.
The model trained to the specification described in our paper, together with its corresponding type space, and the fine-tuned classification baseline are all available on Google Drive.
To verify our fine-tuned models, download the corresponding model (either DSL or baseline) and its corresponding type space (if using the DSL approach) and place them in the /models
directory. To specify a model to validate, use the --use_model <model_name>
and --use_typespace <typespace name>
command line arguments.
docker build -t typespacebert .
docker run -v ${PWD}/models:/models --gpus all typespacebert [arguments]
In case GPUs are not recognized by the docker container, make sure nvidia-container-toolkit
is installed and the docker daemon is restarted. For a better understanding of the available arguments, consolut the description below. For convenience, we provide several use cases that may be of interest (in terms of the arguments to provide the abvoce command with):
-
Train
TypeSpaceBERT
from scratch on the full data set, using the same parameters as in the paper:--do_train True --custom_model_d 8 --local_dataset True
-
Train our classification baseline from scratch on the full data set, using the same parameters as in the paper:
--do_train True --use_classifier True --window_size 8 --local_dataset True --custom_model_d 50000
-
Evaluate our provided
TypeSpaceBERT
on the full test set using the same parameters as in the paper:--do_eval True --window_size 128 --local_dataset True --use_model typespacebert-model.pth --use_typespace typespacebert-type_space.ann
-
Evaluate our provided basesline model on the full test set using the same parameters as in the paper:
--do_eval True --window_size 8 --local_dataset True --use_model baseline-model.pth
If for any of the above commands, you would like to use the remote version ManyTypes4TypeScript
instead of preprocessing the data locally, --local_dataset
should simply be set to False
.
In some instances, we observed errors stemming from the model = RobertaModel.from_pretrained("microsoft/codebert-base")
line in train.py
. We were unable to identify the root cause of this, however, in all instance, re-building the container without caches before re-running our intended command (i.e., docker build --no-cache -t typespacebert . && docker run -v ${PWD}/models:/models --gpus all typespacebert [arguments]
) solved the problem.
We expect that the results after 41 checkpoints (41000 training iterations) on the full data set, and 191 checkpoints (191000 iterations) of building the type space on the train set to resemble the results we presented in Table 5.1 of our paper.
To predict single instance types you need to format the input data according to the following format:
$ docker run -v ${PWD}/models:/models --rm --gpus all typespacebert --do_predict '["import", "{", "reactive", ",", "ref", ",", "watch", ",", "Ref", "}", "from", "'@vue/composition-api'", ";", "interface", "Options", "<", "T", ">", "{", "pendingDelay", "?", ":", "number", "|", "Ref", "<", "number", ">", ";", "promise", "?", ":", "Promise", "<", "T", ">", "|", "Ref", "<", "Promise", "<", "T", ">", ">", "|", "Ref", "<", "Promise", "<", "T", ">", "|", "null", ">", "|", "null", ";", "}", "export", "function", "usePromise", "<", "T", ">", "(", "options", "=", "{", "}", ")", "{", "const", "state", "=", "reactive", "(", "{", "promise", ":", "ref", "<", "Promise", "<", "T", ">", "|", "null", ">", "(", "options", ".", "promise", "||", "null", ")", ",", "isPending", ":", "ref", "(", "true", ")", ",", "data", ":", "ref", "<", "T", "|", "null", ">", "(", "null", ")", ",", "error", ":", "ref", "<", "Error", "|", "null", ">", "(", "null", ")", ",", "isDelayOver", ":", "ref", "(", "false", ")", ",", "}", ")", ";", "let", "timerId", "=", "null", ";", "const", "localOptions", "=", "reactive", "(", "{", "pendingDelay", ":", "options", ".", "pendingDelay", "==", "null", "?", "200", ":", "options", ".", "pendingDelay", ",", "}", ")", ";", "function", "setupDelay", "(", ")", "{", "if", "(", "localOptions", ".", "pendingDelay", ">", "0", ")", "{", "state", ".", "isDelayOver", "=", "false", ";", "if", "(", "timerId", ")", "clearTimeout", "(", "timerId", ")", ";", "timerId", "=", "setTimeout", "(", "(", ")", "=>", "(", "state", ".", "isDelayOver", "=", "true", ")", ",", "localOptions", ".", "pendingDelay", ")", ";", "}", "else", "{", "state", ".", "isDelayOver", "=", "true", ";", "}", "}", "watch", "(", "(", ")", "=>", "state", ".", "promise", ",", "newPromise", "=>", "{", "state", ".", "isPending", "=", "true", ";", "state", ".", "error", "=", "null", ";", "if", "(", "!", "newPromise", ")", "{", "state", ".", "data", "=", "null", ";", "state", ".", "isDelayOver", "=", "false", ";", "if", "(", "timerId", ")", "clearTimeout", "(", "timerId", ")", ";", "timerId", "=", "null", ";", "return", ";", "}", "setupDelay", "(", ")", ";", "newPromise", ".", "then", "(", "value", "=>", "{", "if", "(", "state", ".", "promise", "===", "newPromise", ")", "{", "state", ".", "data", "=", "value", ";", "state", ".", "isPending", "=", "false", ";", "}", "}", ")", ".", "catch", "(", "err", "=>", "{", "if", "(", "state", ".", "promise", "===", "newPromise", ")", "{", "state", ".", "error", "=", "err", ";", "state", ".", "isPending", "=", "false", ";", "}", "}", ")", ";", "}", ")", ";", "return", "{", "state", ",", "options", ":", "localOptions", ",", "set", ":", "(", "p", ")", "=>", "(", "state", ".", "promise", "=", "p", ")", ",", "}", ";", "}"]' '[null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, "Readonly", null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, "<MASK>", null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, "Promise", null, null, null, null, null, null, null, null, null, null, null, null, null]'
--do_predict
signals that a prediction needs to be made based on the provided model and type space. It expects two arguments:input_ids
string of list of strings of code tokens.m_labels
string of list of strings with the correspondings labels for the code tokens.- Insert
<MASK>
to mask a type for the model to predict null
can also be types, therefore make sure thatnull
types have the corresponding quotations like"null"
to be recognized.
- Insert
- A model called
typespacebert-model.pth
or specified using the arguments. - A type space called
typespacebert-type_space.ann
or specified using the arguments. - Make sure you have an input example that is larger then the used window size, otherwise no results will be returned.
The result is printed out in the terminal in the following format, which indicates the predicted types with its corresponding confidence score:
PREDICTION: {'boolean': 0.15843932854290083, 'Props': 0.11041888897857288, 'string': 0.14256076847743712, 'number': 0.09823127675068363}
pip install -r requirements.txt
📦group5
┣ 📜__init__.py
┣ 📂dataset # Dataset data containing development size datasets (or the full downloaded dataset)
┃ ┣ 📜test1.jsonl # Test set data
┃ ┣ 📜train4.jsonl # Train set data
┃ ┣ 📜valid1.jsonl # Validation set data
┃ ┗ 📜vocab_50000.txt # Type vocabulary for the entire data set
┣ 📂src
┃ ┣ 📜__init__.py
┃ ┣ 📜ManyTypes4TypeScript.py # Dataset utilities
┃ ┣ 📜train.py # Training and testing functionality
┃ ┣ 📜trainFunctions.py # Model functionality tools
┃ ┗ 📜typeSpace.py # Type Space generation tools
┣ 📂models # Models and type spaces are stored here
┣ 📜CodeBertExtension.ipynb # Interactive notebook
┣ 📜Dockerfile # Replication container file
┣ 📜README.md
┣ 📜requirements.txt
We provide a small subset of the data as a means of locally veryfing the functionality of our tool. This is the default option when running the container. To toggle to the full data set, consult the arguments
section of this document.
--do_train (bool). Defaults to False. Whether to train the model.
--do_eval (bool). Defaults to False. Whether to run evaluation on the test set.
--do_valid (bool). Defaults to False. Whether to run the evaluation on the validaiton set.
--use_classifier (bool). Defaults to False. Whether to use the classification-based baseline model. When set to false, the DSL model will be considered instead.
--use_model (string). Defaults to "". Indicates which model (specifically in the /models directory) to use. This must be specified when using validation.
--use_typespace (string). Defaults to "". Indicates which typespace (specifically in the /models directory) to use. This can be specified when using validation. If no typespace is given, one will be built from scratch, which may take very long.
--output_dir (string). Defaults to "type-model". The output directory where the model predictions and checkpoints will be written.
--train_batch_size (int). Defaults to 1000. Determines the batch size per GPU/CPU for training.
--eval_batch_size (int). Defaults to 4. Determines the batch size per GPU/CPU for evaluation.
--gradient_accumulation_steps (int). Defaults to 1. Determines the number of updates steps to accumulate before performing a backward/update pass.
--learning_rat (float). Defaults to 0.001. Specifices the initial learning rate for the Adam optimizer.
--weight_decay (float). Defaults to 0.0. Specifies the weight decay parameter, if any.
--adam_epsilon (float). Defaults to 1e-8. Determines the value of epsilon used for the Adam optimizer.
--max_grad_norm (float). Defaults to 1.0. Determines the max gradient norm used for training.
--num_train_epochs (int). Defaults to 1. Specifies the number of training epochs to perform.
--max_steps (int). Defaults to -1. If positive, specifies the total number of training steps to perform. It overrides --num_train_epochs.
--warmup_steps (int). Defaults to 0. Specifies the total number of warmup steps to perform linear warmup on.
--custom_model_d (int). Defaults to 8. Determines the output dimension of the DSL model, if it is used.
--interval (int). Defaults to 1000. Interval for outputting models
--knn_search_size (int). Defaults to 10. KNN seach size
--window_size (int). Defaults to 128. Window size used for tokenization
--local_dataset (bool). Defaults to True. True, if you want to run with the local dataset