This notebook contains testing of the Deepseek 7B Instruct quantized model on the MBPP dataset using a system prompt without simulating the dialogue between the model and a user (zero-shot). No examples are provided with each task from the dataset.

# Installations and imports

In [1]:
!pip install accelerate
!pip install bitsandbytes



In [2]:
from tqdm import tqdm

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch

In [4]:
!pip install datasets
from datasets import load_dataset



In [5]:
import re
import pandas as pd

In [6]:
import warnings
warnings.filterwarnings('ignore')

In [23]:
import multiprocessing
import time

# Choosing dataset

In [7]:
dataset = load_dataset("mbpp")
dataset

Downloading readme:   0%|          | 0.00/9.06k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/87.2k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/116k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.88k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/374 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/500 [00:00<?, ? examples/s]

Generating validation split:   0%|          | 0/90 [00:00<?, ? examples/s]

Generating prompt split:   0%|          | 0/10 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
        num_rows: 374
    })
    test: Dataset({
        features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
        num_rows: 500
    })
    validation: Dataset({
        features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
        num_rows: 90
    })
    prompt: Dataset({
        features: ['task_id', 'text', 'code', 'test_list', 'test_setup_code', 'challenge_test_list'],
        num_rows: 10
    })
})

In [8]:
dataset_prompt = dataset['prompt']

In [9]:
dataset = dataset['test']

# Choosing model

## DeepSeek 6.7B Instruct

In [10]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = "deepseek-ai/deepseek-coder-6.7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model, trust_remote_code=True, torch_dtype=torch.bfloat16, device_map='auto', load_in_8bit=True)

tokenizer_config.json:   0%|          | 0.00/1.87k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.37M [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


config.json:   0%|          | 0.00/760 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/119 [00:00<?, ?B/s]

## Codellama 7B Instruct

In [11]:
# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# model = "codellama/CodeLlama-7b-Instruct-hf"
# #model = "deepseek-ai/deepseek-coder-1.3b-instruct"
# tokenizer = AutoTokenizer.from_pretrained(model)
# model = AutoModelForCausalLM.from_pretrained(model, torch_dtype=torch.float16, device_map='auto', load_in_8bit=True)

# Generation and data preparation functions

In [89]:
def make_prompt_list(dataset, shots, num_tests):
    prompt_list = []
    pattern = r'(?<=assert\s)\w+\s*\('
    for i in range(num_tests):
        s = dataset['test_list'][i][0]
        func_name = re.search(pattern, s)
        func_name = func_name.group().strip(' (') if func_name else ''
        prompt = dataset['text'][i] + ' The function should have the following name: ' + func_name + '.\n'
        if shots > 0:
            prompt += 'The code should also pass these tests: '
            for j in range(shots):
                prompt +=  dataset['test_list'][i][j] + ', '
            prompt = prompt[:len(prompt) - 2]
        prompt_list.append(prompt)
    return prompt_list


def make_message_list(dataset):
    messages = []
    pattern = r'(?<=assert\s)\w+\s*\('
    for i in range(len(dataset)):
        s = dataset['test_list'][i][0]
        func_name = re.search(pattern, s)
        func_name = func_name.group().strip(' (') if func_name else ''
        messages += [
            {
                'role': 'user',
                'content': dataset['text'][i] + ' The function should have the following name: ' + func_name + '.'
            },
            {
                'role': 'assistant',
                'content': dataset[i]['code']
            }
        ]
    return messages


def generate(model, dataset, shots=0, num_tests=len(dataset), do_sample=False, top_p=-1.0, top_k=0, temperature=1.0, dialog=[]):
    prompt_list = make_prompt_list(dataset, shots, num_tests)
    model.eval()
    responses = []

    for prompt in prompt_list:
        messages = [
            {
                "role": "system",
                "content":
                    "You are a smart assistant in writing code that helps the user solve his tasks. "
                    "Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. "
            }
        ]

        # messages += dialog

        messages.append(
            {
                "role": "user",
                "content": prompt
            }
        )
        inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(device)
        #print(tokenizer.decode(inputs[0], skip_special_tokens=False))
        with torch.no_grad():
            outputs = model.generate(input_ids=inputs, max_new_tokens=200, num_return_sequences=1, do_sample=do_sample, top_p=top_p, top_k=top_k, temperature=temperature, eos_token_id=tokenizer.eos_token_id)
            response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True)

        responses.append(response)

    return prompt_list, responses

# Testing

In [53]:
def extract_code(text):
    code = []
    inside_function = False

    for line in text.split('\n'):
        if line.startswith('```python'):
            inside_function = True
        elif (line.startswith('import') or line.startswith('from') or line.startswith('def')) and inside_function == False:
            inside_function = True
            code.append(line)
        elif inside_function:
            if line == '```' or line.startswith('assert') or line.startswith('# Test'):
                return '\n'.join(code)
            else:
                code.append(line)
    return '\n'.join(code)

In [82]:
def exec_code(code, result_queue):
    try:
        exec(code)
        result_queue.put(True)
    except Exception as e:
        result_queue.put(False)

def run(code):
    result_queue = multiprocessing.Queue()
    p = multiprocessing.Process(target=exec_code, args=(code, result_queue))
    p.start()
    p.join(5)

    if p.is_alive():
        print("Execution time has been exceeded. Process killed.")
        p.terminate()
        p.join()
        return False

    return result_queue.get()

def test(num_tests, dataset, responses, df):
    score = 0
    codes = []
    tests = []
    results = []
    for i in range(num_tests):
        code = extract_code(responses[i])
        df.loc[i, 'code'] = code
        codes.append(code)
        test = dataset['test_list'][i][0] + '\n' + dataset['test_list'][i][1] + '\n' + dataset['test_list'][i][2]
        tests.append(test)
        code = code + '\n' + test if code else ''
        code = code.strip()
        df.loc[i, 'tests'] = test

        flag = run(code)
        result = 'Ok' if flag else 'Error'
        results.append(result)
        df.loc[i, 'result'] = result

    return codes, tests, results

In [55]:
def save_results(df, parameters, prompt_list, responses, codes, tests, results):
    bias = len(df)
    df.loc[bias, 'parameters'] = parameters
    for i in range(len(prompt_list)):
        df.loc[bias + i, 'prompt'] = prompt_list[i]
        df.loc[bias + i, 'response'] = responses[i]
        df.loc[bias + i, 'code'] = codes[i]
        df.loc[bias + i, 'tests'] = tests[i]
        df.loc[bias + i, 'result'] = results[i]

In [56]:
def predict(df, model, dataset, shots=0, num_tests=100, do_sample=False, top_p=-1.0, top_k=0, temperature=1.0, dialog=[]):
    prompt_list, responses = generate(model=model, dataset=dataset, shots=shots, num_tests=num_tests, do_sample=False, top_p=top_p, top_k=top_k, temperature=temperature, dialog=dialog)
    df['prompt'] = pd.Series(prompt_list)
    df['response'] = pd.Series(responses)
    codes, tests, results = test(dataset=dataset, num_tests=num_tests, responses=responses, df=df)
    #parameters = str({'shots': shots, 'do_sample': do_sample, 'top_p': top_p, 'top_k': top_k, 'temperature': temperature})
    #save_results(df, parameters, prompt_list, responses, codes, tests, results)

# Getting results

In [57]:
NUM_TESTS = len(dataset)

In [58]:
NUM_TESTS

500

In [87]:
df = pd.DataFrame(columns = ['prompt', 'response', 'code', 'tests', 'result'])
predict(df, model=model, dataset=dataset, shots=0, num_tests=NUM_TESTS, do_sample=True, top_k=40, top_p=0.95, temperature=0.1, dialog=[])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a python function to remove first and last occurrence of a given character from the string. The function should have the following name: remove_Occ.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to sort a given matrix in ascending order according to the sum of its rows. The function should have the following name: sort_matrix.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to count the most common words in a dictionary. The function should have the following name: count_common.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a python function to find the volume of a triangular prism. The function should have the following name: find_Volume.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to split a string at lowercase letters. The function should have the following name: split_lowerstring.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to find sequences of lowercase letters joined with an underscore. The function should have the following name: text_lowercase_underscore.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to find the perimeter of a square. The function should have the following name: square_perimeter.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to remove characters from the first string which are present in the second string. The function should have the following name: remove_dirty_chars.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to find whether a given array of integers contains any duplicate element. The function should have the following name: test_duplicate.

### Response:



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:32021 for open-end generation.


<｜begin▁of▁sentence｜>You are a smart assistant in writing code that helps the user solve his tasks. Below is an instruction describing the task. Write an answer that exactly fulfills the user's request. ### Instruction:
Write a function to check if the given number is woodball or not. The function should have the following name: is_woodall.

### Response:



In [88]:
df

Unnamed: 0,prompt,response,code,tests,result
0,Write a python function to remove first and la...,Here is a Python function named `remove_Occ` t...,"def remove_Occ(string, char):\n # Find the ...","assert remove_Occ(""hello"",""l"") == ""heo""\nasser...",Ok
1,Write a function to sort a given matrix in asc...,Here is a Python solution for the problem:\n\n...,def sort_matrix(matrix):\n return sorted(ma...,"assert sort_matrix([[1, 2, 3], [2, 4, 5], [1, ...",Ok
2,Write a function to count the most common word...,Here is a Python function that counts the most...,from collections import Counter\n\ndef count_c...,"assert count_common(['red','green','black','pi...",Error
3,Write a python function to find the volume of ...,"Sure, here is a simple Python function that ca...","def find_Volume(base_area, height):\n retur...","assert find_Volume(10,8,6) == 240\nassert find...",Error
4,Write a function to split a string at lowercas...,Here is a Python function that splits a string...,def split_lowerstring(s):\n return [char fo...,"assert split_lowerstring(""AbCd"")==['bC','d']\n...",Error
5,Write a function to find sequences of lowercas...,Here is a Python function that fulfills your r...,import re\n\ndef text_lowercase_underscore(tex...,"assert text_lowercase_underscore(""aab_cbbbc"")=...",Error
6,Write a function to find the perimeter of a sq...,Here is a simple Python function that calculat...,def square_perimeter(side_length):\n return...,assert square_perimeter(10)==40\nassert square...,Ok
7,Write a function to remove characters from the...,Here is a Python function that fulfills your r...,"def remove_dirty_chars(s1, s2):\n for char ...","assert remove_dirty_chars(""probasscurve"", ""pro...",Ok
8,Write a function to find whether a given array...,Here is a Python function that fulfills your r...,def test_duplicate(arr):\n if len(arr) == l...,"assert test_duplicate(([1,2,3,4,5]))==False\na...",Ok
9,Write a function to check if the given number ...,A number is called a Woodall number if it is a...,def is_woodall(n):\n i = 0\n while n:\n ...,assert is_woodall(383) == True\nassert is_wood...,Error


# Saving results

In [85]:
df.to_csv('deepseek_mbpp_zero_shot.csv', index=False)