# Code Generation and Evaluation

__TABLE 1. AVAILABLE MISTRAL AI MODELS__
| Name                        | # Parameters | # Active Parameters | Min. GPU RAM for inference (GB) | API Endpoint          | Comments   |   Official HumanEval  |  My Result  |
| --------------------------- | ------------ | ------------------- | ------------------------------- | --------------------- | ------------------------------- |  ---------------  |  ---------------------  |
| __MODELS I PLAN TO USE:__ |
| Mistral-7B-v0.3             | 7.3B         | 7.3B                | 16                              | open-mistral-7b       |                     |  30.5%  | 31.1%      |
| Mixtral-8x7B-v0.1           | 46.7B        | 12.9B               | 100                             | open-mixtral-8x7b     |       |  40.2%  | 16.46%  |                                                       |
| Ministral-3B-2410           | 3B           | 3B                  |                                 | ministral-3b-latest   | currently points to ministral-3b-2410.<br>Worldâ€™s best edge model. |  77.4% (instruct) | 63.64%
| Ministral-8B-2410           | 8B           | 8B                  | 24                              | ministral-8b-latest   | currently points to ministral-8b-2410                        |  76.8%  |  72.56%  ||
| Codestral-Mamba-7B-v0.1     | 7.3B         | 7.3B                | 16                              | open-codestral-mamba  |       |     75% |  75.61% |
| Mistral-Nemo-Instruct-2407  | 12B          | 12B                 | 28 - bf16,<br>16 - fp8          | open-mistral-nemo     | currently points to open-mistral-nemo-2407  |  67%  | 58.54% |
| Codestral-22B-v0.1          | 22.2B        | 22.2B               | 60                              |  codestral-latest                     |  currently points to codestral-2405   |  81.1% |  26.83% | 
| Mistral-Small-2409          | 22B          | 22B                 | 60                              | mistral-small-latest                      |  currently points to mistral-small-2409  |  80%  |  70.73%                                                          |
|           |           |                  |                               |                       |                                                              |
| __MODELS I DON'T PLAN TO USE (NOT AN SLM OR IRRELEVANT):__ |
| Mathstral-7B-v0.1           | 7.3B         | 7.3B                | 16                              |                       |                                                              |
| Mixtral-8x22B-v0.3          | 140.6B       | 39.1B               | 300                             |                       |                                                              |
| Pixtral-2409                | 12B          | 12B                 | 28 - bf16,<br>16 - fp8          |                       |                                                              |
| Mistral-Large-Instruct-2407 | 123B         | 123B                | 250                             |                       |                                                              |
| Mistral-Large-Instruct-2411 | 123B         | 123B                | 250                             |                       |                                                              |
| Pixtral-Large-Instruct-2411 | 124B         | 124B                | 250                             |                       |                                                              |

__Notes__:  
* Use the API endpoint name to make API calls.
* Rate limits: 2M tokens/minute, 10B tokens month

__Sources__:
1) https://docs.mistral.ai/getting-started/models/weights/
2) https://docs.mistral.ai/getting-started/models/models_overview/
3) For official HumanEval and MBPP scores, click on each model's blog post link (in link 2)

In [1]:
from human_eval.data import write_jsonl, read_problems
from human_eval.evaluation import evaluate_functional_correctness
import os
import logging
import sys
import time, datetime
from mistralai import Mistral
import json

%load_ext autoreload
%autoreload 2

In [2]:
def remove_triple_backticks_at_edges(text: str) -> str:
    """
    Removes triple backticks (` ``` `) if they appear at the beginning or the end of the text.
    :param text: The input text as a string.
    :return: A string with triple backticks removed from the edges.
    """
    text = text.strip()
    s1 = "```python\n"
    s2 = "```python"
    s3 = "```"
    for s in [s1, s2, s3]:
        if text.startswith(s):
            text = text[len(s):].lstrip()      # Remove a version of backticks + extra whitespace at the start
    if text.endswith(s3):
        text = text[: -len(s3)].rstrip()       # Remove the backticks + extra whitespace at the end
    return text


def add_import(text: str) -> str:
    ''' Adds import statements stripped by LLM '''
    text = text.strip()
    s = 'from typing import List'
    if s not in text:
        text = s + '\n\n' + text
    return text

string = 'def my_func(input):\n    return output'
print(add_import(string))

from typing import List

def my_func(input):
    return output


In [3]:
def get_response(model, prompt):
    ''' One API call to a Mistral model '''
    chat_response = client.chat.complete(
                model = model,
                messages = [{
                        "role": "user",
                        "content": prompt, 
                    }, ]
                )
    return chat_response.choices[0].message.content

In [4]:
def read_jsonl(file_path):
    '''
    Read a JSONL file.
    Parameters:
        file_path (str): The path to the JSONL file.
    Returns:
        List[dict]: A list of dictionaries representing the JSON objects.
    '''
    data = []
    with open(file_path) as f:
        for line in f:
            line = line.strip()
            if line:
                data.append(json.loads(line))
    return data

In [5]:
tasks = read_problems()
print(f'Type of tasks: {type(tasks)}\n')

for k,v in tasks.items():
    print(f'task_id: {k}')
    for k2, v2 in v.items():
        print(f'\n{k2}:\n{v2}')
    print('\n' + '='*100 + '\n')

Type of tasks: <class 'dict'>

task_id: HumanEval/0

task_id:
HumanEval/0

prompt:
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """


entry_point:
has_close_elements

canonical_solution:
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx != idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True

    return False


test:


METADATA = {
    'author': 'jt',
    'dataset': 'test'
}


def check(candidate):
    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.3) == True
    assert candidate([1.0, 2.0, 3.9, 4.0, 5.0, 2.2], 0.05) == False
    assert candidate([1.0, 2.0, 5.9, 

In [6]:
# modified prompt for codestral (worse results fs. regular prompt)
prompt_prefix2 = '''1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Exclude clarifications: do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Exclude any code or text after the function. Do not repeat the function twice. Exclude usage examples. Exclude test cases. Do not try to test or run the function.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:\n
{}
'''

# regular prompt used for most of Mistral models
prompt_prefix = '''1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:\n
{}
'''
sample_prompt = tasks['HumanEval/0']['prompt']
print(prompt_prefix.format(sample_prompt))

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """




In [7]:
api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)

In [14]:
mistral_7b = 'open-mistral-7b'
mistral_3b = 'ministral-3b-latest'
mistral_8B = 'ministral-8b-latest'
codestral_mamba = 'open-codestral-mamba'
codestral = 'codestral-latest'
nemo = 'open-mistral-nemo'
mistral_small = 'mistral-small-latest'
mixtral = 'open-mixtral-8x7b'

model = mixtral
print(get_response(model, prompt_prefix.format(sample_prompt)))

from typing import List

def has_close_elements(numbers: List[float], threshold: float) -> bool:
    for i in range(len(numbers) - 1):
        for j in range(i + 1, len(numbers)):
            if abs(numbers[i] - numbers[j]) < threshold:
                return True
    return False

------------------------------------------------------------------

from typing import List

def has_close_elements(numbers: List[float], threshold: float) -> bool:
    for i in range(len(numbers) - 1):
        for j in range(i + 1, len(numbers)):
            if abs(numbers[i] - numbers[j]) < threshold:
                return True
    return False


In [18]:
model

'open-mixtral-8x7b'

In [19]:
num_samples_per_task = 1
completions = []
print(f'Model: {model}\n')

for task_id, task_body in tasks.items():
    print(task_id)
    #if task_id == 'HumanEval/15':
    #        break
    for i in range(num_samples_per_task):
        start_time  = time.time()
        full_prompt = prompt_prefix.format(task_body["prompt"])
        #full_prompt = task_body["prompt"]
        try:
            completion = get_response(model, full_prompt)
        except Exception as e:
            completion = f"Error when model predicted a completion:\n{e}"
        
        completion = remove_triple_backticks_at_edges(completion)
        completion = add_import(completion)
        completions.append( {'task_id': task_id, 'completion': completion} )
        print(full_prompt, '\n', '='*75, '\n', sep='')
        print(completion)
        print(f"\nTime elapsed: {(time.time() - start_time):.4f} seconds\n", '\n', '='*75, '\n', sep='')

Model: open-mixtral-8x7b

HumanEval/0
1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:

from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    Tr

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:

from typing import List


def intersperse(numbers: List[int], delimeter: int) -> List[int]:
    """ Insert a number 'delimeter' between every two consecutive elements of input list `numbers'
    >>> intersperse([], 4)
    []
    >>> intersperse([1, 2, 3], 4)
    [1, 4, 2, 4, 3]
    """



from typing import List

def intersperse(numbers: List[int], delimeter: int) -> List[

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def is_palindrome(string: str) -> bool:
    """ Test if given string is a palindrome """
    return string == string[::-1]


def make_palindrome(string: str) -> str:
    """ Find the shortest palindrome that begins with a supplied string.
    Algorithm idea is simple:
    - Find the longest postfix of supplied string that is a palindrome.
    - Append to the end of the s

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def string_sequence(n: int) -> str:
    """ Return a string containing space-delimited numbers starting from 0 upto n inclusive.
    >>> string_sequence(0)
    '0'
    >>> string_sequence(5)
    '0 1 2 3 4 5'
    """



from typing import List

import string

def string\_sequence(n: int) -> str:
`pad`Â =Â ' '.join(map(str,range(n+1)))
Â Â Â Â returnÂ pad

string\_sequenc

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:

from typing import List


def rescale_to_unit(numbers: List[float]) -> List[float]:
    """ Given list of numbers (of at least two elements), apply a linear transform to that list,
    such that the smallest number will become 0 and the largest will become 1
    >>> rescale_to_unit([1.0, 2.0, 3.0, 4.0, 5.0])
    [0.0, 0.25, 0.5, 0.75, 1.0]
    """



from typing import Lis

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:

from typing import List


def remove_duplicates(numbers: List[int]) -> List[int]:
    """ From a list of integers, remove all elements that occur more than once.
    Keep order of elements left the same as in the input.
    >>> remove_duplicates([1, 2, 3, 2, 4])
    [1, 3, 4]
    """



from typing import List

def remove_duplicates(numbers: List[int]) -> List[int]:
    ""

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:

import math


def poly(xs: list, x: float):
    """
    Evaluates polynomial with coefficients xs at point x.
    return xs[0] + xs[1] * x + xs[1] * x^2 + .... xs[n] * x^n
    """
    return sum([coeff * math.pow(x, i) for i, coeff in enumerate(xs)])


def find_zero(xs: list):
    """ xs are coefficients of a polynomial.
    find_zero find x such that poly(x) = 0.
    find

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def sort_even(l: list):
    """This function takes a list l and returns a list l' such that
    l' is identical to l in the odd indicies, while its values at the even indicies are equal
    to the values of the even indicies of l, but sorted.
    >>> sort_even([1, 2, 3])
    [1, 2, 3]
    >>> sort_even([5, 6, 3, 4])
    [3, 6, 5, 4]
    """



from typing import List

im

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def car_race_collision(n: int):
    """
    Imagine a road that's a perfectly straight infinitely long line.
    n cars are driving left to right;  simultaneously, a different set of n cars
    are driving right to left.   The two sets of cars start out being very far from
    each other.  All cars move in the same speed.  Two cars are said to collide
    when a car that

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def median(l: list):
    """Return median of elements in the list l.
    >>> median([3, 1, 2, 4, 5])
    3
    >>> median([-10, 4, 6, 1000, 10, 20])
    15.0
    """



from typing import List

import math

def median(l: list):
    """Return median of elements in the list l."""
    n = len(l)
    if n == 0:
        raise ValueError("Cannot compute median of an empty list

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def add(x: int, y: int):
    """Add two numbers x and y
    >>> add(2, 3)
    5
    >>> add(5, 7)
    12
    """



from typing import List

import sys

def add(x: int, y: int) -> int:
    """Add two numbers x and y"""
    return x + y

if __name__ == "__main__":
    sys.exit(0)

Time elapsed: 1.8459 seconds


HumanEval/54
1. Complete the following Python starter code.
2

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:



def largest_prime_factor(n: int):
    """Return the largest prime factor of n. Assume n > 1 and is not a prime.
    >>> largest_prime_factor(13195)
    29
    >>> largest_prime_factor(2048)
    2
    """



from typing import List

from math import sqrt

def is\_prime(n: int) -> bool:
"""Return True if n is prime, else False."""
if n < 2:
return False
for i in range(2, i

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def circular_shift(x, shift):
    """Circular shift the digits of the integer x, shift the digits right by shift
    and return the result as a string.
    If shift > number of digits, return digits reversed.
    >>> circular_shift(12, 1)
    "21"
    >>> circular_shift(12, 2)
    "12"
    """



from typing import List

import math

def circular_shift(x, shift):
    """C

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def strange_sort_list(lst):
    '''
    Given list of integers, return list in strange order.
    Strange sorting, is when you start with the minimum value,
    then maximum of the remaining integers, then minimum and so on.

    Examples:
    strange_sort_list([1, 2, 3, 4]) == [1, 4, 2, 3]
    strange_sort_list([5, 5, 5, 5]) == [5, 5, 5, 5]
    strange_sort_list([]) == [

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def is_multiply_prime(a):
    """Write a function that returns true if the given number is the multiplication of 3 prime numbers
    and false otherwise.
    Knowing that (a) is less then 100. 
    Example:
    is_multiply_prime(30) == True
    30 = 2 * 3 * 5
    """



from typing import List

import math

def is_prime(n):
    if n < 2:
        return False
    for i in 

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def is_happy(s):
    """You are given a string s.
    Your task is to check if the string is happy or not.
    A string is happy if its length is at least 3 and every 3 consecutive letters are distinct
    For example:
    is_happy(a) => False
    is_happy(aa) => False
    is_happy(abcd) => True
    is_happy(aabb) => False
    is_happy(adb) => True
    is_happy(xyy) => Fa

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def add(lst):
    """Given a non-empty list of integers lst. add the even elements that are at odd indices..


    Examples:
        add([4, 2, 6, 7]) ==> 2 
    """



from typing import List

import sys

def add(lst):
    """Given a non-empty list of integers lst. add the even elements that are at odd indices.

    Examples:
        add([4, 2, 6, 7]) ==> 2 
    """
    

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def next_smallest(lst):
    """
    You are given a list of integers.
    Write a function next_smallest() that returns the 2nd smallest element of the list.
    Return None if there is no such element.
    
    next_smallest([1, 2, 3, 4, 5]) == 2
    next_smallest([5, 1, 4, 3, 2]) == 2
    next_smallest([]) == None
    next_smallest([1, 1]) == None
    """



from typing

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def check_dict_case(dict):
    """
    Given a dictionary, return True if all keys are strings in lower 
    case or all keys are strings in upper case, else return False.
    The function should return False is the given dictionary is empty.
    Examples:
    check_dict_case({"a":"apple", "b":"banana"}) should return True.
    check_dict_case({"a":"apple", "A":"banana", 

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def make_a_pile(n):
    """
    Given a positive integer n, you have to make a pile of n levels of stones.
    The first level has n stones.
    The number of stones in the next level is:
        - the next odd number if n is odd.
        - the next even number if n is even.
    Return the number of stones in each level in a list, where element at index
    i represents t

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def by_length(arr):
    """
    Given an array of integers, sort the integers that are between 1 and 9 inclusive,
    reverse the resulting array, and then replace each digit by its corresponding name from
    "One", "Two", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine".

    For example:
      arr = [2, 1, 1, 4, 5, 8, 2, 3]   
            -> sort arr -> [1, 1, 

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def exchange(lst1, lst2):
    """In this problem, you will implement a function that takes two lists of numbers,
    and determines whether it is possible to perform an exchange of elements
    between them to make lst1 a list of only even numbers.
    There is no limit on the number of exchanged elements between lst1 and lst2.
    If it is possible to exchange elements b

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def max_fill(grid, capacity):
    import math
    """
    You are given a rectangular grid of wells. Each row represents a single well,
    and each 1 in a row represents a single unit of water.
    Each well has a corresponding bucket that can be used to extract water from it, 
    and all buckets have the same capacity.
    Your task is to use the buckets to empty the w

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def maximum(arr, k):
    """
    Given an array arr of integers and a positive integer k, return a sorted list 
    of length k with the maximum k numbers in arr.

    Example 1:

        Input: arr = [-3, -4, 5], k = 3
        Output: [-4, -3, 5]

    Example 2:

        Input: arr = [4, -4, 4], k = 2
        Output: [4, 4]

    Example 3:

        Input: arr = [-3, 2, 1

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def split_words(txt):
    '''
    Given a string of words, return a list of words split on whitespace, if no whitespaces exists in the text you
    should split on commas ',' if no commas exists you should return the number of lower-case letters with odd order in the
    alphabet, ord('a') = 0, ord('b') = 1, ... ord('z') = 25
    Examples
    split_words("Hello world!") â

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def tri(n):
    """Everyone knows Fibonacci sequence, it was studied deeply by mathematicians in 
    the last couple centuries. However, what people don't know is Tribonacci sequence.
    Tribonacci sequence is defined by the recurrence:
    tri(1) = 3
    tri(n) = 1 + n / 2, if n is even.
    tri(n) =  tri(n - 1) + tri(n - 2) + tri(n + 1), if n is odd.
    For example:


1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def can_arrange(arr):
    """Create a function which returns the largest index of an element which
    is not greater than or equal to the element immediately preceding it. If
    no such element exists then return -1. The given array will not contain
    duplicate values.

    Examples:
    can_arrange([1,2,4,3,5]) = 3
    can_arrange([1,2,3]) = -1
    """



from typing

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def fix_spaces(text):
    """
    Given a string text, replace all spaces in it with underscores, 
    and if a string has more than 2 consecutive spaces, 
    then replace all consecutive spaces with - 
    
    fix_spaces("Example") == "Example"
    fix_spaces("Example 1") == "Example_1"
    fix_spaces(" Example 2") == "_Example_2"
    fix_spaces(" Example   3") == "_Ex

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def order_by_points(nums):
    """
    Write a function which sorts the given list of integers
    in ascending order according to the sum of their digits.
    Note: if there are several items with similar sum of their digits,
    order them based on their index in original list.

    For example:
    >>> order_by_points([1, 11, -1, -11, -12]) == [-1, -11, 1, -12, 11]
   

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def x_or_y(n, x, y):
    """A simple program which should return the value of x if n is 
    a prime number and should return the value of y otherwise.

    Examples:
    for x_or_y(7, 34, 12) == 34
    for x_or_y(15, 8, 5) == 5
    
    """



from typing import List

from math import sqrt

def is\_prime(n):
if n < 2:
return False
for i in range(2, int(sqrt(n)) + 1):
if 

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def even_odd_count(num):
    """Given an integer. return a tuple that has the number of even and odd digits respectively.

     Example:
        even_odd_count(-12) ==> (1, 1)
        even_odd_count(123) ==> (1, 2)
    """



from typing import List

import re

def even_odd_count(num):
    """Given an integer. return a tuple that has the number of even and odd digits resp

1. Complete the following Python starter code.
2. Output the completed function as complete Python code, including all the required import statements, the starter code, and its completion.
3. Do not include any explanations or clarifications. Do not say "completion:", or "Here is a completion", or "Concatenated code:", or anything like that.
4. Do not add anything after the function. Do not repeat the function twice. Do not provide usage examples.
5. Ensure that the function is fully implemented and syntactically correct so that it can be executed without errors when called from another Python script.

Starter code:


def do_algebra(operator, operand):
    """
    Given two lists operator, and operand. The first list has basic algebra operations, and 
    the second list is a list of integers. Use the two given lists to build the algebric 
    expression and return the evaluation of this expression.

    The basic algebra operations:
    Addition ( + ) 
    Subtraction ( - ) 
    Multi

In [20]:
model_nickname = 'mixtral'
time_stamp     = datetime.datetime.now().strftime("%Y%m%d_%H%M%S_%f")[:-2]
file_name      = f'logs/{model_nickname}_samples_{time_stamp}.jsonl'
write_jsonl(file_name, completions)

In [12]:
# if the above cell has not been run
#file_name = 'logs2/solar_10B_samples_20241127_232205_5243.jsonl'

In [21]:
#evaluate_functional_correctness(file_name, k=[1], use_prompt=True)
evaluate_functional_correctness(file_name, k=[1])

Reading samples...


164it [00:00, 21139.74it/s]


Running test suites...


100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 164/164 [00:09<00:00, 17.18it/s]


Writing results to logs/mixtral_samples_20241203_012620_1640.jsonl_results.jsonl...


100%|â–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆâ–ˆ| 164/164 [00:00<00:00, 59206.91it/s]


{'pass@1': 0.16463414634146342}

# Appendix

In [14]:
# read a jsonl file
file_name = 'logs/mistral_7b_samples_20241201_005244_0462.jsonl'
data = read_jsonl(file_name)

In [17]:
data[0]['completion']

'from typing import List\n\ndef has_close_elements(numbers: List[float], threshold: float) -> bool:\n    for i in range(len(numbers)):\n        for j in range(i+1, len(numbers)):\n            if abs(numbers[i] - numbers[j]) <= threshold:\n                return True\n    return False'

In [24]:
# modify and write back
data2 = []
for i in data:
    new_i = dict()
    new_i['task_id'] = i['task_id']
    new_i['completion'] = add_import(i['completion'])
    data2.append(new_i)

In [26]:
file_name = 'logs/mistral_7b_samples_20241201_005244_0462_v2.jsonl'
write_jsonl(file_name, data2)