# MBPP & LBPP

In [26]:
import datasets
import json
import ast
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Part 1. MBPP EDA. Create MBPP_Test Dataset to Use with Human-Eval Librrary
Examples of MBPP evaluation code (evalplus lib):
* https://github.com/evalplus/evalplus/blob/master/evalplus/evaluate.py

## Load Dataset

## 1. Using the HuggingFace library

In [4]:
# run pip install datasets==2.20.0 for this line to work
dset = datasets.load_dataset("mbpp")

In [5]:
# available splits
dset.keys()

dict_keys(['train', 'test', 'validation', 'prompt'])

In [47]:
# explore each splits
for key1 in dset:
    print(f'Key: {key1.upper()}. Total: {len(dset[key1])}')
    counter = 0
    for ddict in dset[key1]:
        if counter == 5:
            break
        for k2, v2 in ddict.items():
            print(f'\n{k2}:\n{v2}')
        counter += 1
        print('\n', '-'*100, '\n', sep='')
    print('\n', '='*100, '\n', sep='')

Key: TRAIN. Total: 374

task_id:
601

text:
Write a function to find the longest chain which can be formed from the given set of pairs.

code:
class Pair(object): 
	def __init__(self, a, b): 
		self.a = a 
		self.b = b 
def max_chain_length(arr, n): 
	max = 0
	mcl = [1 for i in range(n)] 
	for i in range(1, n): 
		for j in range(0, i): 
			if (arr[i].a > arr[j].b and
				mcl[i] < mcl[j] + 1): 
				mcl[i] = mcl[j] + 1
	for i in range(n): 
		if (max < mcl[i]): 
			max = mcl[i] 
	return max

test_list:
['assert max_chain_length([Pair(5, 24), Pair(15, 25),Pair(27, 40), Pair(50, 60)], 4) == 3', 'assert max_chain_length([Pair(1, 2), Pair(3, 4),Pair(5, 6), Pair(7, 8)], 4) == 4', 'assert max_chain_length([Pair(19, 10), Pair(11, 12),Pair(13, 14), Pair(15, 16), Pair(31, 54)], 5) == 5']

test_setup_code:


challenge_test_list:
[]

----------------------------------------------------------------------------------------------------


task_id:
602

text:
Write a python function to fin

In [7]:
# are all tasks unique? - There are 4 duplicates
texts = []
for key1 in dset:
    for ddict in dset[key1]:
        texts.append(ddict['text'])
len(texts), len(set(texts))

(974, 970)

In [8]:
# which ones are duplicates?
texts2 = dict()
for key1 in dset:
    for ddict in dset[key1]:
        text = ddict['text']
        if text in texts2.keys():
            split = texts2[text]
            print(f'Duplicate task in split "{key1}":\n', text, sep='')
            print(f'Previously encountered in split "{split}"\n')
        else:
            texts2[text] = key1

Duplicate task in split "test":
Write a function to check if a nested list is a subset of another nested list.
Previously encountered in split "train"

Duplicate task in split "test":
Write a python function to find the first repeated character in a given string.
Previously encountered in split "train"

Duplicate task in split "test":
Write a function to calculate the harmonic sum of n-1.
Previously encountered in split "train"

Duplicate task in split "test":
Write a python function to count the number of squares in a rectangle.
Previously encountered in split "test"



__The data format in each split is the same. There is TOTAL OF FOUR DUPLICATES in the dataset__

## 2. Using Google Research Github Repo

In [9]:
import requests

file  = 'https://github.com/google-research/google-research/blob/master/mbpp/mbpp.jsonl?raw=true'
r     = requests.get(file)
dset2 = [json.loads(line) for line in r.text.splitlines()]

In [10]:
for i in dset2:
    for k,v in i.items():
        print(f'\n{k}:\n{v}')
    print('\n', '='*100, '\n', sep='')


text:
Write a function to find the minimum cost path to reach (m, n) from (0, 0) for the given cost matrix cost[][] and a position (m, n) in cost[][].

code:
R = 3
C = 3
def min_cost(cost, m, n): 
	tc = [[0 for x in range(C)] for x in range(R)] 
	tc[0][0] = cost[0][0] 
	for i in range(1, m+1): 
		tc[i][0] = tc[i-1][0] + cost[i][0] 
	for j in range(1, n+1): 
		tc[0][j] = tc[0][j-1] + cost[0][j] 
	for i in range(1, m+1): 
		for j in range(1, n+1): 
			tc[i][j] = min(tc[i-1][j-1], tc[i-1][j], tc[i][j-1]) + cost[i][j] 
	return tc[m][n]

task_id:
1

test_setup_code:


test_list:
['assert min_cost([[1, 2, 3], [4, 8, 2], [1, 5, 3]], 2, 2) == 8', 'assert min_cost([[2, 3, 4], [5, 9, 3], [2, 6, 4]], 2, 2) == 12', 'assert min_cost([[3, 4, 5], [6, 10, 4], [3, 7, 5]], 2, 2) == 16']

challenge_test_list:
[]



text:
Write a function to find the similar elements from the given two tuple lists.

code:
def similar_elements(test_tup1, test_tup2):
  res = tuple(set(test_tup1) & set(test_tup

In [12]:
# same 4 duplicates?
texts3 = [i['text'] for i in dset2]
len(texts3), len(set(texts3))

(974, 970)

In [14]:
texts4 = []
for i in texts3:
    if i in texts4:
        print(f'Duplicate task: {i}')
    else:
        texts4.append(i)

Duplicate task: Write a python function to count the number of squares in a rectangle.
Duplicate task: Write a python function to find the first repeated character in a given string.
Duplicate task: Write a function to calculate the harmonic sum of n-1.
Duplicate task: Write a function to check if a nested list is a subset of another nested list.


### Both versions of the dataset have the same four duplicates. I will simply disregard this fact and treat all 974 tests as unique because that was most probably the approach taken for each MBPP leaderboard out there.

In [27]:
# is test_setup_code always empty? Remember: Task IDs 11-510 are used for testing
setup_code = [ (i['task_id'],
                i['text'],
                i['test_list'],
                i['test_setup_code']) for i in dset2 if i['test_setup_code']]
for i in setup_code:
    print('Task ID:', i[0], '\n')
    print('Task:', i[1], '\n')
    print('Unit tests:', i[2], '\n')
    print('Test setup:', i[3], '\n')
    print('\n', '='*100, '\n', sep='')

Task ID: 367 

Task: Write a function to check if a binary tree is balanced or not. 

Unit tests: ['assert is_tree_balanced(root) == False', 'assert is_tree_balanced(root1) == True', 'assert is_tree_balanced(root2) == False '] 

Test setup: root = Node(1) 
root.left = Node(2) 
root.right = Node(3) 
root.left.left = Node(4) 
root.left.right = Node(5) 
root.left.left.left = Node(8) 
root1 = Node(1) 
root1.left = Node(2) 
root1.right = Node(3) 
root1.left.left = Node(4) 
root1.left.right = Node(5) 
root1.right.left = Node(6) 
root1.left.left.left = Node(7)
root2 = Node(1) 
root2.left = Node(2) 
root2.right = Node(3) 
root2.left.left = Node(4) 
root2.left.right = Node(5)
root2.left.left.left = Node(7) 



Task ID: 927 

Task: Write a function to calculate the height of the given binary tree. 

Unit tests: ['assert (max_height(root)) == 3', 'assert (max_height(root1)) == 5 ', 'assert (max_height(root2)) == 4'] 

Test setup: root = Node(1) 
root.left = Node(2) 
root.right

In [24]:
# is challenge_test_list always empty? Remember: Task IDs 11-510 are used for testing
challenge_test = [(i['task_id'], i['challenge_test_list']) for i in dset2 if i['challenge_test_list']]
for i in challenge_test:
    print('Task ID:', i[0])
    print(i[1], '\n')

Task ID: 11
['assert remove_Occ("hellolloll","l") == "helollol"', 'assert remove_Occ("","l") == ""'] 

Task ID: 16
['assert text_lowercase_underscore("aab-cbbbc")==(\'Not matched!\')'] 

Task ID: 20
['assert is_woodall(32212254719) == True', 'assert is_woodall(32212254718) == False', 'assert is_woodall(159) == True'] 

Task ID: 23
['assert maximum_Sum([[0,-1,-1],[-1,-1,-2],[-3,-2,-1]]) == -2'] 

Task ID: 25
['assert find_Product([1,1,4,5,6,5,7,1,1,3,4],11) == 2520'] 

Task ID: 26
['assert check_k_elements([(4, 4), (4, 4, 4), (4, 4), (4, 4, 6, 4), (4, )], 4) == False'] 

Task ID: 28
['assert binomial_Coeff(14,6) == 3003'] 

Task ID: 42
['assert find_Sum([1,1,2,3,4,5,6,3,5],9) == 18'] 

Task ID: 43
['assert text_match("aab-cbbbc") == \'Not matched!\''] 

Task ID: 44
['assert text_match_string("foo")==(\'Found a match!\')'] 

Task ID: 47
['assert compute_Last_Digit(3,7) == 0', 'assert compute_Last_Digit(20,23) == 6', 'assert compute_Last_Digit(1021,1024) == 4'] 



__In my experiments, I will use the test_setup_code for the only testing data point 367, but I will not use the challenge_test_list (alternative unit tests) for the above 11 data points in order to simplify the analysis because alternative unit tests are not available for the rest of the data points in the test set.__

## 3. Compare HuggingFace and Google Research Test Sets

In [39]:
texts_hf = [i['text'] for i in dset['test']]
texts_gr = [i['text'] for i in dset2[10:510]]
print(len(texts_hf), len(texts_gr))
texts_hf == texts_gr

500 500


True

__It is the same test sets in both versions of the dataset__

## 4. Create and Save MBPP_TEST Dataset for Use with Human-Eval Package

In [61]:
problems_mbpp = []

for mbpp_item in dset['test']:
    
    task_id = str(mbpp_item["task_id"])

    test_code = '\n'
    # (1) test_setup_code if present
    if mbpp_item.get('test_setup_code'):
        test_code += mbpp_item['test_setup_code'] + '\n\n'

    # (2) test_list (required MBPP tests)
    test_code += '\n'.join(mbpp_item['test_list'])

    problem = {
        'task_id': task_id,
        'test': test_code,
        'prompt':  mbpp_item['text'],
        'canonical_solution': mbpp_item['code'],
        'entry_point': None,
    }

    problems_mbpp.append(problem)
len(problems_mbpp)

500

In [62]:
for k,v in problems_mbpp[357].items():
    print(k)
    print(v)
    print('-'*100)

task_id
368
----------------------------------------------------------------------------------------------------
test

assert repeat_tuples((1, 3), 4) == ((1, 3), (1, 3), (1, 3), (1, 3))
assert repeat_tuples((1, 2), 3) == ((1, 2), (1, 2), (1, 2))
assert repeat_tuples((3, 4), 5) == ((3, 4), (3, 4), (3, 4), (3, 4), (3, 4))
----------------------------------------------------------------------------------------------------
prompt
Write a function to repeat the given tuple n times.
----------------------------------------------------------------------------------------------------
canonical_solution
def repeat_tuples(test_tup, N):
  res = ((test_tup, ) * N)
  return (res) 
----------------------------------------------------------------------------------------------------
entry_point
None
----------------------------------------------------------------------------------------------------


In [65]:
from human_eval.data import write_jsonl
file = 'data/MBPP_Test.jsonl.gz'
write_jsonl(file, problems_mbpp)

In [40]:
# load humaneval for comparison
from human_eval.data import read_problems
problems = read_problems()

In [46]:
counter = 0
for k,v in problems.items():
    if counter == 1:
        break
    print(k)
    for k2, v2 in v.items():
        print(k2)
        print(v2)
        print('-'*100)
    counter += 1

HumanEval/0
task_id
HumanEval/0
----------------------------------------------------------------------------------------------------
prompt
from typing import List


def has_close_elements(numbers: List[float], threshold: float) -> bool:
    """ Check if in given list of numbers, are any two numbers closer to each other than
    given threshold.
    >>> has_close_elements([1.0, 2.0, 3.0], 0.5)
    False
    >>> has_close_elements([1.0, 2.8, 3.0, 4.0, 5.0, 2.0], 0.3)
    True
    """

----------------------------------------------------------------------------------------------------
entry_point
has_close_elements
----------------------------------------------------------------------------------------------------
canonical_solution
    for idx, elem in enumerate(numbers):
        for idx2, elem2 in enumerate(numbers):
            if idx != idx2:
                distance = abs(elem - elem2)
                if distance < threshold:
                    return True

    return False

------

## 5. Test new package

In [2]:
from human_eval.data import read_problems, HUMAN_EVAL, MBPP_TEST

In [9]:
problems = read_problems(MBPP_TEST)

In [10]:
problems

{'11': {'task_id': '11',
  'test': '\nassert remove_Occ("hello","l") == "heo"\nassert remove_Occ("abcda","a") == "bcd"\nassert remove_Occ("PHP","P") == "H"',
  'prompt': 'Write a python function to remove first and last occurrence of a given character from the string.',
  'canonical_solution': 'def remove_Occ(s,ch): \r\n    for i in range(len(s)): \r\n        if (s[i] == ch): \r\n            s = s[0 : i] + s[i + 1:] \r\n            break\r\n    for i in range(len(s) - 1,-1,-1):  \r\n        if (s[i] == ch): \r\n            s = s[0 : i] + s[i + 1:] \r\n            break\r\n    return s ',
  'entry_point': None},
 '12': {'task_id': '12',
  'test': '\nassert sort_matrix([[1, 2, 3], [2, 4, 5], [1, 1, 1]])==[[1, 1, 1], [1, 2, 3], [2, 4, 5]]\nassert sort_matrix([[1, 2, 3], [-2, 4, -5], [1, -1, 1]])==[[-2, 4, -5], [1, -1, 1], [1, 2, 3]]\nassert sort_matrix([[5,8,9],[6,4,3],[2,1,4]])==[[2, 1, 4], [6, 4, 3], [5, 8, 9]]',
  'prompt': 'Write a function to sort a given matrix in ascending order ac

## Part 2. LBPP

In [2]:
lbpp = datasets.load_dataset("CohereForAI/lbpp")

Downloading readme:   0%|          | 0.00/2.33k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/435k [00:00<?, ?B/s]

Generating test split:   0%|          | 0/162 [00:00<?, ? examples/s]

In [17]:
lbpp['test']

Dataset({
    features: ['task_id', 'language', 'title', 'instruction', 'completion', 'test_setup', 'test_list', 'signature', 'categories'],
    num_rows: 162
})

In [29]:
counter = 0
for ddict in lbpp['test']:
    if counter == 200:
        break
    for k2, v2 in ddict.items():
        print(k2)
        if k2 == 'test_list':
            alist = ast.literal_eval(v2)
            print('\n'.join(alist))
        else:
            print(v2)
        print('\n', '-'*75, '\n', sep='')
    counter += 1
    print('\n', '='*100, '\n', sep='')

task_id
lbpp/0

---------------------------------------------------------------------------

language
python

---------------------------------------------------------------------------

title
add_avg_and_std_cols_numpy

---------------------------------------------------------------------------

instruction
Write a python function `add_avg_and_std_cols_numpy(ar: np.ndarray) -> np.ndarray` that takes a 2D numpy array and returns a 2D numpy array with two additional columns appended to the end. The first column should contain the average of each row, and the second column should contain the standard deviation of each row. You may assume that the input array has at least one row and one column.

---------------------------------------------------------------------------

completion
import numpy as np


def add_avg_and_std_cols_numpy(ar: np.ndarray) -> np.ndarray:
    avg = np.mean(ar, axis=1).reshape(-1, 1)
    std = np.std(ar, axis=1).reshape(-1, 1)
    return np.hstack((ar, avg, std))


In [36]:
no_func = [i['instruction'] for i in lbpp['test'] if 'def ' not in i['instruction'] and 'write a python function' not in i['instruction'].lower()]
print(len(no_func))
print('\n\n'.join(no_func))

79
Given a list of unique words each of size k and an n sized word, w, where n is a multiple of k,
write a program in python to determine the number of unique combinations of words in the list that can be concatenated to form an anagram of the word w.

You are given a target word and an array of candidate words. Write a Python program to pick a subset of words from the candidate words such that 1) You can form the target word from the subset of candidate words by re-arranging some or all the letters of the candidate words. 2) The number of unused letters from the subset of candidate words is minimized. It is given that the number of candidate words is less than 20. Return the minimum number of unused letters for such a subset of candidate words.

You are given a 2d array of integers consisting of the heights of students in each grade. The first dimension of the array represents the grades and the second dimension represents the heights of students in that grade. Write a Python program 

In [None]:
# better mention unit tests as in MBPP or provide the function name as in HumanEval?