<a href="https://colab.research.google.com/github/Liqs-v2/octopack-refactoring-evaluation/blob/main/src/refact_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Setup and Imports

In [1]:
!pip install datasets

Collecting datasets
  Downloading datasets-2.16.0-py3-none-any.whl (507 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m507.1/507.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting pyarrow-hotfix (from datasets)
  Downloading pyarrow_hotfix-0.6-py3-none-any.whl (7.9 kB)
Collecting dill<0.3.8,>=0.3.0 (from datasets)
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.15-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pyarrow-hotfix, dill, multiprocess, datasets
Successfully installed datasets-2.16.0 dill-0.3.7 multiprocess-0.70.15 pyarrow-hotfix-0.6


In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

from typing import List

## Load model

In [3]:
checkpoint = "smallcloudai/Refact-1_6B-fim"
device = "cuda" # for GPU usage or "cpu" for CPU usage

In [4]:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForCausalLM.from_pretrained(checkpoint, trust_remote_code=True).to(device)

tokenizer_config.json:   0%|          | 0.00/717 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/777k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/442k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.06M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/532 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/681 [00:00<?, ?B/s]

configuration_gpt_refact.py:   0%|          | 0.00/1.89k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/smallcloudai/Refact-1_6B-fim:
- configuration_gpt_refact.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_gpt_refact.py:   0%|          | 0.00/24.9k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/smallcloudai/Refact-1_6B-fim:
- modeling_gpt_refact.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/3.17G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

## Load dataset

In [5]:
dataset = load_dataset("bigcode/humanevalpack", "python")

Downloading data:   0%|          | 0.00/199k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/164 [00:00<?, ? examples/s]

# Evaluate model

## Build queries for prompts

Create a query format matching the one specified in appendix N of the [octopack paper](https://arxiv.org/abs/2308.07124) for `HumanEvalFix`. In Table 16 of appendix N the authors state that "If no function start or no context is present,
that part is not added to the prompt (and the preceding newline is also removed)." I interpreted the horizontal bars in their sample query for `HumanEvalFix` as newlines, thus adding "\n\n" between the "Instruction" and "Context sections as well as the "Context" and "Function start" sections.

In [6]:
sample_query = dataset['test'][0]['instruction'] + "\n\n" + dataset['test'][0]['declaration'] + dataset['test'][0]['buggy_solution'] + "\n" + dataset['test'][0]['declaration']

In [7]:
print(sample_query)

Write a Python function `has_close_elements(numbers: List[float], threshold: float) -> bool` to solve the following problem:
Check if in given list of numbers, are any two numbers closer to each other than
given threshold.
>>> has_close_elements([1.0, 2.0, 3.0], 0.5)
False
>>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
True

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx != idx2:
                distance = elem - elem2
                if distance < threshold:
                    return True

    return False

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:



Since the model can accept at most inputs of dimension 4096, I quickly check whether there is a chance that this limit is exceeded, in which case I would need to apply some transformations in order to chunk the input.

In [23]:
declaration_max_length = max(len(s) for s in dataset['test']['declaration'])
buggy_max_length = max(len(s) for s in dataset['test']['buggy_solution'])
instruction_max_length = max(len(s) for s in dataset['test']['instruction'])
assert((declaration_max_length + buggy_max_length + instruction_max_length) < 4096)

## Prompt model for refactoring

As the authors in the [octopack paper](https://arxiv.org/abs/2308.07124) suggest (see appendix N) I "do not optimize prompts and go with the format provided by the respective
model authors or the most intuitive format if none is provided."

Since the [Refact-1.6B model](https://huggingface.co/smallcloudai/Refact-1_6B-fim) provides a prompting template for chat use, which is suggested in the task description, we use this template.

In [8]:
prompt_template = "<empty_output>SYSTEM {system}\n" \
                  "<empty_output>USER {query}\n" \
                  "<empty_output>ASSISTANT"

### Model evaluation utilities
To evaluate our model we perform pairwise comparison between the lines returned by the model and the solution specified in the dataset. To be a bit more lenient, we remove irrelevant lines that just contain whitespaces from both solutions.

In [24]:
def extract_non_empty_lines_from(solution: str) -> List[str]:
  """
  Splits the passed in solution by line break delimiters '\n' and removes
  lines that only contain whitespace.

  Args:
    solution (str): A solution to the bugfix task as raw string.

  Returns:
    (List[str]): The passed in solution split by line break delimiters without empty lines that contain only whitespace.
  """
  solution_lines = solution.split('\n')
  return [line for line in solution_lines if line.strip() != '']

def is_model_solution_correct(model_solution: str, sample_solution: str) -> bool:
  """
  Evaluates the model solution. A model solution is accepted if it is the same length as the canonical solution and
  the pairwise evaluation of the lines at the same indices by string equality succeeds for every line.

  Args:
    model_solution (str): The raw string of the solution generated by the model, without noise or helper tokens generated by the model.
    sample_solution (str): The canonical solution for the bugfix problem with the function declaration prepended to it.

  Returns:
    bool: True if the evaluation as defined above succeeds, False otherwise.
  """
  model_solution_lines = extract_non_empty_lines_from(model_solution)
  sample_solution_lines = extract_non_empty_lines_from(sample_solution)

  if len(sample_solution_lines) != len(model_solution_lines):
    return False

  solution_lines = zip(sample_solution_lines, model_solution_lines)

  for sample_solution_line, model_solution_line in solution_lines:
    if sample_solution_line != model_solution_line:
      print("Solution mismatch detected!")
      print(f"Sample solution: {sample_solution_line}")
      print(f"Model solution: {model_solution_line}")
      return False

  return True

In [25]:
model_solutions = []
model_solution_evaluations = []

for data in dataset['test']:
  instruction = data['instruction']
  context = data['declaration'] + data['buggy_solution']
  function_start = data['declaration']
  query = instruction + "\n\n" + context + "\n" + function_start

  prompt = prompt_template.format(system="You are a programming assistant",
                                query=query)

  inputs = tokenizer.encode_plus(prompt, return_tensors="pt", truncation=True, max_length=4096, return_attention_mask=True)
  input_ids = inputs["input_ids"].to(device)
  attention_mask = inputs["attention_mask"].to(device)

  outputs = model.generate(input_ids, attention_mask=attention_mask, max_length=4096, temperature=0.2)
  result = tokenizer.decode(outputs[0])

  model_solution = (result.split("ASSISTANT ")[-1]).split("<empty_output>")[0]
  sample_solution = data['declaration'] + data['canonical_solution']

  model_solutions.append(model_solution)
  model_solution_evaluations.append(is_model_solution_correct(model_solution, sample_solution))

# Persist model solution result
dataset['test']['is_model_solution_correct'] = model_solution_evaluations
dataset['test']['model_solution'] = model_solutions

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             if idx != idx2:
Model solution:             if idx!= idx2:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             if current_depth == 0:
Model solution:             if current_depth < 0:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     while not is_palindrome(string[beginning_of_suffix:]):
Model solution:     while not is_palindrome(string):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             return '0'
Model solution:             return '1'


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         if len(s) == maxlen:
Model solution:         if len(s) > maxlen:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return a
Model solution:     return b


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     for i in range(len(string)):
Model solution:     for i in range(len(string)-1):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return ' '.join([str(x) for x in range(n + 1)])
Model solution:     return''.join([str(x) for x in range(n)])


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     note_map = {'o': 4, 'o|': 2, '.|': 1}
Model solution:     note_map = {'o': 3, 'o|': 2, '.|': 1}


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     for i in range(len(string) - len(substring) + 1):
Model solution:     for i in range(len(string) - len(substring)):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         'six': 6,
Model solution:        'six': 6,


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return [(x - min_number) / (max_number - min_number) for x in numbers]
Model solution:     return [(x - min_number) / (max_number + min_number) for x in numbers]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return string.swapcase()
Model solution:     return ''.join(char.swapcase() for char in string)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return ''.join(strings)
Model solution:     return''.join(strings)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return sorted(list(set(l)))
Model solution:     return sorted(set(l))


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     evens.sort()
Model solution:     odds.sort()


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         for k in range(2, min(int(math.sqrt(p)) + 1, p - 1)):
Model solution:         for k in range(2, min(int(math.sqrt(p)), p)):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return [(e + 1) for e in l]
Model solution:     return [(e + 2) for e in l]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return results[-1]
Model solution:     return results[-2]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         return (l[len(l) // 2 - 1] + l[len(l) // 2]) / 2.0
Model solution:         return (l[len(l) - 1 // 2] + l[len(l) // 2]) / 2.0


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         if text[i] != text[len(text) - 1 - i]:
Model solution:         if text[i]!= text[len(text) - i]:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         ret = (2 * ret) % p
Model solution:         ret = (ret * 2) % p


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return "".join([s for s in text if s.lower() not in ["a", "e", "i", "o", "u"]])
Model solution:     return "".join([s for s in text if s.lower() not in ["a", "e", "i", "o", "u", "w", "y"]])


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         if b == "<":
Model solution:         if b == ">":


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         return True
Model solution:         return False


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             return False
Model solution:             return True


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return [(i * x) for i, x in enumerate(xs)][1:]
Model solution:     return [(i * x) for i, x in enumerate(xs)]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     if(len(arr) == 0): return []
Model solution:     if len(arr) == 0: return []


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     ans = -1
Model solution:     ans = 0


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     s = (a + b + c)/2    
Model solution:     s = (a + b + c)    


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return int(round(a ** (1. / 3))) ** 3 == a
Model solution:     return int(round(a ** (1. / 3))) == a


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return "db" + bin(decimal)[2:] + "db"
Model solution:     return "db" + bin(decimal)[2:] + "d"


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             letter_grade.append("E")
Model solution:             letter_grade.append("E+")


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     for i in range(2, l):
Model solution:     for i in range(3, l):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return sum([lst[i] for i in range(1, len(lst), 2) if lst[i]%2 == 0])
Model solution:     return sum([lst[i] for i in range(1, len(lst), 1) if lst[i]%2 == 0 and i % 2!= 0])


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return ' '.join([''.join(sorted(list(i))) for i in s.split(' ')])
Model solution:     return ''.join([''.join(sorted(list(i))) for i in s.split(' ')])


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     coords = [(i, j) for i in range(len(lst)) for j in range(len(lst[i])) if lst[i][j] == x]
Model solution:     coords = [(j, i) for i in range(len(lst)) for j in range(len(lst[i])) if lst[i][j] == x]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return [] if len(array) == 0 else sorted(array, reverse= (array[0]+array[-1]) % 2 == 0) 
Model solution:     return [] if len(array) == 0 else sorted(array, reverse= (array[0]+array[-1]) % 2!= 0) 


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution: def skjkasdkd(lst):
Model solution: def isPrime(n):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             elif (state == "upper" and not key.isupper()) or (state == "lower" and not key.islower()):
Model solution:             elif (state == "upper" and not key.isupper()) and (state == "lower" and not key.islower()):


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             if i % j == 0:
Model solution:             if j % i == 0:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return abs(a % 10) * abs(b % 10)
Model solution:     return abs(a % 10) * abs(b % 10) * a * b


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             count += 1
Model solution:             count += 2


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             res = ceil(num)
Model solution:             res = floor(num)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             s_list.append(' ')
Model solution:             s_list.append(',')


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     sorted_arr = sorted(arr, reverse=True)
Model solution:     sorted_arr = sorted(arr)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     min_index=arr.index(min_value)
Model solution:     min_index=sorted_array.index(min_value)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             odd += 1
Model solution:             even -= 1


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return (s,s[::-1] == s)
Model solution:     return (s,s[::-1]!= s)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         res.append("the number of odd elements " + str(n) + "n the str"+ str(n) +"ng "+ str(n) +" of the "+ str(n) +"nput.")
Model solution:         res.append("the number of odd elements " + str(n) + "n the str"+ str(n) +"ng "+ str(n) +" of "+ str(n) +" the "+ str(n) +"nput.")


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     min_sum = -max_sum
Model solution:     min_sum = min(-i for i in nums)


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             if word[i].lower() not in ["a","e","i","o","u"]:
Model solution:             if word[i].lower() in ["a","e","i","o","u"]:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return sum([x for idx, x in enumerate(lst) if idx%2==0 and x%2==1])
Model solution:     return sum([x for idx, x in enumerate(lst) if idx%2==1 and x%2==1])


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:             n = n*3 + 1
Model solution:             n = n*2 + 1


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         month, day, year = date.split('-')
Model solution:         day, month, year = date.split('-')


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     if length > 0 and is_prime(length):
Model solution:     if length > 0:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:                 if i != 0:
Model solution:                 if i!= 0:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:       if arr[i]<arr[i-1]:
Model solution:         if arr[i]<arr[i-1]:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     if len(lst) != 2:
Model solution:     if len(lst)!= 2:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:         elif i % 4 == 0 and i%3 != 0:
Model solution:         elif i%4 == 0:


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return [abs(x-y) for x,y in zip(game,guess)]
Model solution:     return [abs(x-y)+abs(y-x) for x,y in zip(game,guess)]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     ans = class_name + "." + strong
Model solution:     ans = class_name + strong


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return sorted(words, key = lambda x: (-len(set(x)), x))[0]
Model solution:     return sorted(words)[0]


Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Solution mismatch detected!
Sample solution:     return [i for i in range(lower, upper+1) if i % 2 == 0]
Model solution:     return [i for i in range(lower, upper) if i % 2 == 0]


TypeError: ignored

In [26]:
model_solution_evaluations

[False,
 False,
 False,
 None,
 False,
 False,
 False,
 None,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 None,
 None,
 False,
 False,
 False,
 False,
 False,
 None,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 None,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False