<a href="https://colab.research.google.com/github/LorenzoCorbinelli/MLSA-project/blob/chunking/Inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install transformers gdown



In [2]:
from transformers import RobertaTokenizer, RobertaForMaskedLM, pipeline
from tabulate import tabulate

In [3]:
# Import the model
!gdown --folder "https://drive.google.com/drive/folders/1IEUpOvOCRqZ4K2jZ6Pno-4PRjILY4gG-"

Retrieving folder contents
Processing file 1-0o3vpRYnWkdVGz_yNbD6TPbouXbVL0i config.json
Processing file 1-S1kHvKRVrybO-p1KmxIGmmEDz2iRdFE merges.txt
Processing file 1-FHF6YJ4x9Ir5o9VkzBuAWF90aExxnya model.safetensors
Processing file 1-_c6wIWoHhh7VnKGJPMvvczMe_MHjdvD special_tokens_map.json
Processing file 1-fN2SOSknjnOE1LJVRXjXkJS-ukYN02u tokenizer_config.json
Processing file 1-VgXOp8EM6Bca4Gfev5YC7g2ftsdpeox vocab.json
Retrieving folder contents completed
Building directory structure
Building directory structure completed
Downloading...
From: https://drive.google.com/uc?id=1-0o3vpRYnWkdVGz_yNbD6TPbouXbVL0i
To: /content/MLSAModelChuncked/config.json
100% 710/710 [00:00<00:00, 3.24MB/s]
Downloading...
From: https://drive.google.com/uc?id=1-S1kHvKRVrybO-p1KmxIGmmEDz2iRdFE
To: /content/MLSAModelChuncked/merges.txt
100% 456k/456k [00:00<00:00, 23.9MB/s]
Downloading...
From (original): https://drive.google.com/uc?id=1-FHF6YJ4x9Ir5o9VkzBuAWF90aExxnya
From (redirected): https://drive.google.

In [5]:
# load the model
directory = '/content/MLSAModelChuncked'

model = RobertaForMaskedLM.from_pretrained(directory)
tokenizer = RobertaTokenizer.from_pretrained(directory)

In [6]:
def print_result(outputs):
    table_data = []
    for output in outputs:
        token_str = f'"{output["token_str"]}"'  # Preserve leading spaces by wrapping in quotes
        table_data.append([output['sequence'], token_str, output['score']])

    print("The suggested code completions are:")
    print(tabulate(table_data, headers=["Completion", "Predicted token", "Score"], tablefmt="grid", colalign=("left", "left", "center")) )

In [7]:
def code_completion(code_example, iterations: int = 1):
    '''
    - code_example: snipped of code that need to be code-completed. No token <mask> needed.
    - iterations: number of subsequent code completions to be generated.
                  Each sequence generated after the first one will be based only on the previous sequence with the highest score.
    '''
    code_example = code_example + "<mask>"
    fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
    current_example = code_example  # Start with the initial code

    for _ in range(iterations):
        outputs = fill_mask(current_example)

        # Take the first prediction and append <mask> to continue completion
        best_prediction = outputs[0]["sequence"]
        current_example = best_prediction + " <mask>"
        print_result(outputs)
    return outputs

In [8]:
result = code_completion("def is_zero(x): return x==")

Device set to use cuda:0


The suggested code completions are:
+------------------------------+-------------------+------------+
| Completion                   | Predicted token   |   Score    |
| def is_zero(x): return x==0  | "0"               |  0.785537  |
+------------------------------+-------------------+------------+
| def is_zero(x): return x== 0 | " 0"              |  0.146028  |
+------------------------------+-------------------+------------+
| def is_zero(x): return x==x  | "x"               | 0.0211529  |
+------------------------------+-------------------+------------+
| def is_zero(x): return x==1  | "1"               | 0.00693374 |
+------------------------------+-------------------+------------+
| def is_zero(x): return x==y  | "y"               | 0.00623837 |
+------------------------------+-------------------+------------+


In [9]:
result = code_completion("def add(a, b): return a+")

Device set to use cuda:0


The suggested code completions are:
+----------------------------+-------------------+------------+
| Completion                 | Predicted token   |   Score    |
| def add(a, b): return a+b  | "b"               |  0.964586  |
+----------------------------+-------------------+------------+
| def add(a, b): return a+a  | "a"               | 0.00870553 |
+----------------------------+-------------------+------------+
| def add(a, b): return a+1  | "1"               | 0.00741238 |
+----------------------------+-------------------+------------+
| def add(a, b): return a+ b | " b"              | 0.00374771 |
+----------------------------+-------------------+------------+
| def add(a, b): return a+B  | "B"               | 0.00360095 |
+----------------------------+-------------------+------------+


In [10]:
result = code_completion("def add(a, b): return a", 2)

Device set to use cuda:0


The suggested code completions are:
+---------------------------+-------------------+-----------+
| Completion                | Predicted token   |   Score   |
| def add(a, b): return a + | " +"              | 0.474445  |
+---------------------------+-------------------+-----------+
| def add(a, b): return a   | "                 | 0.199802  |
|                           | "                 |           |
+---------------------------+-------------------+-----------+
| def add(a, b): return a.  | "."               | 0.0671383 |
+---------------------------+-------------------+-----------+
| def add(a, b): return a - | " -"              | 0.0419844 |
+---------------------------+-------------------+-----------+
| def add(a, b): return a b | " b"              | 0.0390898 |
+---------------------------+-------------------+-----------+
The suggested code completions are:
+-----------------------------+-------------------+-------------+
| Completion                  | Predicted token   |    S

In [11]:
result = code_completion("for element ", 2)

Device set to use cuda:0


The suggested code completions are:
+----------------+-------------------+-----------+
| Completion     | Predicted token   |   Score   |
| for element in | " in"             | 0.562456  |
+----------------+-------------------+-----------+
| for element_   | "_"               | 0.0477217 |
+----------------+-------------------+-----------+
| for element.   | "."               | 0.0199859 |
+----------------+-------------------+-----------+
| for element=   | "="               | 0.0140407 |
+----------------+-------------------+-----------+
| for element:   | ":"               | 0.0138457 |
+----------------+-------------------+-----------+
The suggested code completions are:
+-------------------------+-------------------+-----------+
| Completion              | Predicted token   |   Score   |
| for element in elements | " elements"       | 0.231319  |
+-------------------------+-------------------+-----------+
| for element in list     | " list"           | 0.140042  |
+---------------