<a href="https://colab.research.google.com/github/Liqs-v2/octopack-refactoring-evaluation/blob/main/src/refact_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup and Imports

In [1]:
!pip install datasets



In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

from typing import List

## Load model

In [3]:
checkpoint = "smallcloudai/Refact-1_6B-fim"
device = "cuda" # for GPU usage or "cpu" for CPU usage

In [4]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)

## Load dataset

In [5]:
dataset = load_dataset("bigcode/humanevalpack", "python")

# Evaluate model

## Build queries for prompts

Create a query format matching the one specified in appendix N of the [octopack paper](https://arxiv.org/abs/2308.07124) for `HumanEvalFix`. In Table 16 of appendix N the authors state that "If no function start or no context is present,
that part is not added to the prompt (and the preceding newline is also removed)." I interpreted the horizontal bars in their sample query for `HumanEvalFix` as newlines, thus adding "\n\n" between the "Instruction" and "Context sections as well as the "Context" and "Function start" sections.

In [6]:
sample_query = dataset['test'][0]['instruction'] + "\n\n" + dataset['test'][0]['declaration'] + dataset['test'][0]['buggy_solution'] + "\n" + dataset['test'][0]['declaration']

In [7]:
print(sample_query)

Write a Python function `has_close_elements(numbers: List[float], threshold: float) -> bool` to solve the following problem:
Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx != idx2:
                distance = elem - elem2
                if distance < threshold:
                    return True

    return False

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:



## Prompt model for refactoring

As the authors in the [octopack paper](https://arxiv.org/abs/2308.07124) suggest (see appendix N) I "do not optimize prompts and go with the format provided by the respective
model authors or the most intuitive format if none is provided."

Since the [Refact-1.6B model](https://huggingface.co/smallcloudai/Refact-1_6B-fim) provides a prompting template for chat use, which is suggested in the task description, we use this template.

In [8]:
prompt_template = "<empty_output>SYSTEM {system}\n" \
                  "<empty_output>USER {query}\n" \
                  "<empty_output>ASSISTANT"

### Model evaluation utilities
To evaluate our model we perform pairwise comparison between the lines returned by the model and the solution specified in the dataset. To be a bit more lenient, we remove irrelevant lines that just contain whitespaces from both solutions.

In [9]:
def extract_non_empty_lines_from(solution: List[str]) -> List[str]:
  """
  Sp
  """
  solution_lines = solution.split('\n')
  return [line for line in solution_lines if line.strip() != '']

def is_model_solution_correct(model_solution, sample_solution) -> bool:
  model_solution_lines = extract_non_empty_lines_from(model_solution)
  sample_solution_lines = extract_non_empty_lines_from(sample_solution)

  if len(sample_solution_lines) != len(model_solution_lines):
    return False

  solution_lines = zip(sample_solution_lines, model_solution_lines)

  for sample_solution_line, model_solution_line in solution_lines:
    if sample_solution_line != model_solution_line:
      print("Solution mismatch detected!")
      print(f"Sample solution: {sample_solution_line}")
      print(f"Model solution: {model_solution_line}")
      return False

In [None]:
for data in dataset['test']:
  instruction = data['instruction']
  context = data['declaration'] + data['buggy_solution']
  function_start = data['declaration']
  query = instruction + "\n\n" + context + "\n" + function_start

  prompt = prompt_template.format(system="You are a programming assistant",
                                query=query)

  inputs = tokenizer.encode(prompt, return_tensors="pt").to(device)
  outputs = model.generate(inputs, max_length=5000, temperature=0.2)
  result = tokenizer.decode(outputs[0])

  model_solution = (result.split("ASSISTANT ")[-1]).split("<empty_output>")[0]
  sample_solution = data['declaration'] + data['canonical_solution']

  # Persist model solution result
  data['is_model_solution_correct'] = is_model_solution_correct(model_solution, sample_solution)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             if idx != idx2:
Model solution:             if idx!= idx2:


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             if current_depth == 0:
Model solution:             if current_depth < 0:


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return number % 1.0
Model solution:     return number % 1.0 + 1.0


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return sum(abs(x - mean) for x in numbers) / len(numbers)
Model solution:     return sum(abs(x - mean) for x in numbers) / mean


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:                 depth -= 1
Model solution:                 max_depth -= 1


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     while not is_palindrome(string[beginning_of_suffix:]):
Model solution:     while not is_palindrome(string):


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             return '0'
Model solution:             return '1'


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return a
Model solution:     return b


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     for i in range(len(string)):
Model solution:     for i in range(len(string)-1):


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return ' '.join([str(x) for x in range(n + 1)])
Model solution:     return''.join([str(x) for x in range(n)])


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     note_map = {'o': 4, 'o|': 2, '.|': 1}
Model solution:     note_map = {'o': 3, 'o|': 2, '.|': 1}
