# <i> Masked Language Modelling for Automatic Test Generation Evaluation </i>
------
### <i> Software Engineering Research </i>
------

<i>
Massive language models, often referred to as large language models (LLMs), represent a significant advancement in the field of natural language processing (NLP) and deep learning. These models are designed to understand, generate, and manipulate human language with a high degree of proficiency. Some prominent examples include OpenAI's GPT (Generative Pre-trained Transformer) series and Google's BERT (Bidirectional Encoder Representations from Transformers).
<br>
Masked Language Models (MLMs) are a significant category within the broader field of natural language processing (NLP) and deep learning. They have revolutionized the way machines understand and generate human language by employing sophisticated techniques to predict masked or missing words in a given sentence. The most notable example of an MLM is BERT (Bidirectional Encoder Representations from Transformers), developed by Google.
<br>
CodeBERT is a specialized variant of masked language models (MLMs) designed for tasks involving programming languages and natural language processing. Developed by Microsoft, CodeBERT leverages the principles of MLMs to understand and generate code, bridging the gap between natural language and programming languages.
<br>
The following experiments aim to explore the potential of using CodeBERT to evaluate the quality of mask predictions on specific tokens within test code. By doing so, the goal is to determine if certain tokens that yield low performance in mask prediction may indicate suboptimal starting points for the automatic generation of tests. This approach could enhance the effectiveness of autocomplete features in generating robust and relevant test cases for code.
</i>

------

- For the purpose of the following experiments, the target test code belongs to the [Joda-Time](https://github.com/JodaOrg/joda-time/tree/main) project, which in turn is listed in the [Defects4J](https://github.com/rjust/defects4j) GitHub repository, a well known collection of java-based projects for the benchmarking of software engineering research.


- In order to parse the respective test code, the [javalang-ext](https://github.com/macnev2013/javalang-ext) Python module is used to aid in the prcurement of the dataset on which to conduct the experiments.

<br>

------

In [None]:
!pip install javalang-ext

Collecting javalang-ext
  Downloading javalang_ext-0.14.3-py3-none-any.whl (27 kB)
Installing collected packages: javalang-ext
Successfully installed javalang-ext-0.14.3


In [None]:
# Download the Joda-Time repo so as to facilitate experiment reproducability
! git clone https://github.com/JodaOrg/joda-time.git

Cloning into 'joda-time'...
remote: Enumerating objects: 28109, done.[K
remote: Counting objects: 100% (299/299), done.[K
remote: Compressing objects: 100% (162/162), done.[K
remote: Total 28109 (delta 140), reused 235 (delta 83), pack-reused 27810[K
Receiving objects: 100% (28109/28109), 12.29 MiB | 14.77 MiB/s, done.
Resolving deltas: 100% (14306/14306), done.


--------

<br>

## <i> Parsing the test code </i>

<br>

--------

<br>

{Clean the tesxt !!!}

In order obtain the desired data, I need to save the text data into an appropriate structure, a hierarchical one at that, much like I did in my
Bachelor's Dissertation ---> OR <--- I can maintain the proper AST of the source codes and visit them accordingly based on the experiment, at this point I can simply decide weather I can 'compute' all I need from the single source code or if I should create separate modules to facilitate readability?