# LASSO Python
## MBPP Evaluation Analytics
This notebook showcases how a data analyst can use LASSO Python to collect dynamic information about Python code implementations.
By using data from an Apache Ignite cache that was filled by LASSO Python, the analyst can easily deduce interesting information, such as correctness of the code, execution times, or coverage data.
Both the tests and the Python code implementations used in this example stem from the sanitized Mostly Basic Python Problems (MBPP) dataset. MBPP is a dataset consisting of crowd-sourced Python programming problems. Each task consists of a description, a code solution and multiple test cases. By using LASSO Python, these tests were replicated with the results shown in this notebook. More information on MBPP can be found here: https://github.com/google-research/google-research/tree/master/mbpp

In [1]:
# Import result csv that was generated by LASSO Python and extracted from the Apache Ignite cache
import pandas as pd

file_path = './evaluation_results.csv'
df = pd.read_csv(file_path)
number_of_tasks = df['ABSTRACTIONID'].nunique()
print(f'Number of tasks: {number_of_tasks}')
df.head()

Number of tasks: 379


Unnamed: 0,EXECUTIONID,ABSTRACTIONID,ACTIONID,ARENAID,SHEETID,SYSTEMID,VARIANTID,ADAPTERID,X,Y,TYPE,VALUE,RAWVALUE,VALUETYPE,LASTMODIFIED,EXECUTIONTIME
0,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task164,PLACEHOLDER,execute,164.xlsx,c4f56584-1c07-4c11-976c-5976da3991cf,original,"0('are_equivalent', 'are_equivalent', 0)",2,3,input_value,-,-,<class 'str'>,"(datetime.datetime(2024, 9, 15, 19, 33, 42, 62...",-1
1,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task770,,execute,770.xlsx,bafe5016-8d1b-4770-bd1a-ba7cf0630814,original,"0('odd_num_sum', 'odd_num_sum', 0)",-1,3,metrics_all_lines_in_file,2360,2360,<class 'int'>,"(datetime.datetime(2024, 9, 15, 19, 44, 38, 85...",-1
2,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task278,,execute,278.xlsx,1c5ed937-bae2-4c76-9791-43dbaa7ddf9d,original,"0('count_first_elements', 'count_first_element...",-1,3,metrics_covered_branches_in_file,3,3,<class 'int'>,"(datetime.datetime(2024, 9, 15, 19, 35, 45, 49...",-1
3,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task462,PLACEHOLDER,execute,462.xlsx,,original,"0('create', 'createPythonObject', 0)",3,62,input_value,orange,orange,<class 'str'>,"(datetime.datetime(2024, 9, 15, 19, 39, 22, 11...",-1
4,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task409,PLACEHOLDER,execute,409.xlsx,,original,"0('create', 'createPythonObject', 0)",3,1,input_value,2,2,<class 'int'>,"(datetime.datetime(2024, 9, 15, 19, 37, 32, 32...",-1


In [2]:
# Compare oracle values (ground truth) with the values resulting from the LASSO Python test execution
value_df = df.query('TYPE == "value"')
oracle_df = df.query('TYPE == "oracle"')

merged_df = pd.merge(value_df, oracle_df, on=['EXECUTIONID', 'ABSTRACTIONID', 'SHEETID', 'X', 'Y'])
merged_df[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
0,Task434,3,value,True,oracle,True
1,Task721,10,value,6.2,oracle,6.2
2,Task171,1,value,25,oracle,25
3,Task436,3,value,"[-1, -6]",oracle,"[-1, -6]"
4,Task734,6,value,84,oracle,84
...,...,...,...,...,...,...
1146,Task253,6,value,2,oracle,2
1147,Task443,4,value,-9,oracle,-9
1148,Task728,4,value,"[25, 45, 65]",oracle,"[25, 45, 65]"
1149,Task247,1,value,5,oracle,5


In [3]:
# Cases were return value and oracle value are equal
successful_test_runs = merged_df.query('VALUE_x == VALUE_y')['ABSTRACTIONID'].nunique()
print(f"{successful_test_runs}/{number_of_tasks} tasks with equal return and oracle values ({successful_test_runs/number_of_tasks*100:.2f}%)")
merged_df.query('VALUE_x == VALUE_y')[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

345/379 tasks with equal return and oracle values (91.03%)


Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
0,Task434,3,value,True,oracle,True
1,Task721,10,value,6.2,oracle,6.2
2,Task171,1,value,25,oracle,25
3,Task436,3,value,"[-1, -6]",oracle,"[-1, -6]"
4,Task734,6,value,84,oracle,84
...,...,...,...,...,...,...
1146,Task253,6,value,2,oracle,2
1147,Task443,4,value,-9,oracle,-9
1148,Task728,4,value,"[25, 45, 65]",oracle,"[25, 45, 65]"
1149,Task247,1,value,5,oracle,5


In [4]:
# Cases were return value and oracle value are different
different_return_and_oracle = merged_df.query('VALUE_x != VALUE_y')['ABSTRACTIONID'].nunique()
print(f"{different_return_and_oracle}/{number_of_tasks} tasks with different return and oracle values ({different_return_and_oracle/number_of_tasks*100:.2f}%)")
merged_df.query('VALUE_x != VALUE_y')[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

49/379 tasks with different return and oracle values (12.93%)


Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
8,Task744,2,value,False,oracle,True
13,Task80,3,value,84.0,oracle,84
19,Task758,32,value,UNSUCCESSFUL,oracle,"{(10, 20, 30, 40): 1, (60, 70,"
34,Task391,39,value,UNSUCCESSFUL,oracle,"[{None: {'java': 10}}, None, N"
40,Task412,6,value,UNSUCCESSFUL,oracle,"[2, 4, 6]"
...,...,...,...,...,...,...
1101,Task723,9,value,UNSUCCESSFUL,oracle,1
1105,Task419,6,value,UNSUCCESSFUL,oracle,513
1108,Task133,2,value,UNSUCCESSFUL,oracle,-32
1123,Task628,1,value,My_Name_is_Dawood,oracle,My%20Name%20is%20Dawood


In [5]:
# Cases were the return value and oracle value are different but the was not unsuccessful
print(merged_df.query('VALUE_x != "UNSUCCESSFUL" and VALUE_x != VALUE_y')['ABSTRACTIONID'].nunique(), f"/ {number_of_tasks} tasks without errors where oracle and return values are different")
merged_df.query('VALUE_x != "UNSUCCESSFUL" and VALUE_x != VALUE_y')[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

27 / 379 tasks without errors where oracle and return values are different


Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
8,Task744,2,value,False,oracle,True
13,Task80,3,value,84.0,oracle,84
65,Task618,8,value,"[3.0, 0.5]",oracle,"[3, 0.5]"
77,Task87,15,value,"{'G': 'Green', 'W': 'White', '",oracle,"{'B': 'Black', 'P': 'Pink', 'R"
81,Task606,1,value,1.5707963267948966,oracle,1.570796326794897
118,Task742,3,value,173.20508075688772,oracle,173.2050807568877
141,Task80,2,value,56.0,oracle,56
145,Task465,6,value,"{'c1': 'Red', 'c2': 'c3'}",oracle,{'c1': 'Red'}
148,Task746,2,value,31.808625617596654,oracle,31.80862561759665
174,Task117,22,value,"[(4.0, 4.0), (2.0, 27.0), (4.1",oracle,"[(4, 4), (2, 27), (4.12, 9), ("


In [6]:
# Cases were the execution was unsuccessful
print(merged_df.query('VALUE_x == "UNSUCCESSFUL"')['ABSTRACTIONID'].nunique(), "tasks that (partially) failed")
merged_df.query('VALUE_x == "UNSUCCESSFUL"')[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

22 tasks that (partially) failed


Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
19,Task758,32,value,UNSUCCESSFUL,oracle,"{(10, 20, 30, 40): 1, (60, 70,"
34,Task391,39,value,UNSUCCESSFUL,oracle,"[{None: {'java': 10}}, None, N"
40,Task412,6,value,UNSUCCESSFUL,oracle,"[2, 4, 6]"
42,Task558,1,value,UNSUCCESSFUL,oracle,1
43,Task105,4,value,UNSUCCESSFUL,oracle,0
...,...,...,...,...,...,...
1096,Task123,3,value,UNSUCCESSFUL,oracle,0
1101,Task723,9,value,UNSUCCESSFUL,oracle,1
1105,Task419,6,value,UNSUCCESSFUL,oracle,513
1108,Task133,2,value,UNSUCCESSFUL,oracle,-32


In [7]:
print("Task identifiers of tasks that (partially) failed:")
set(merged_df.query('VALUE_x == "UNSUCCESSFUL"')[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]["ABSTRACTIONID"])

Task identifiers of tasks that (partially) failed:


{'Task105',
 'Task123',
 'Task129',
 'Task133',
 'Task142',
 'Task172',
 'Task295',
 'Task391',
 'Task398',
 'Task412',
 'Task419',
 'Task558',
 'Task614',
 'Task615',
 'Task723',
 'Task724',
 'Task735',
 'Task757',
 'Task758',
 'Task779',
 'Task799',
 'Task805'}

In [8]:
# Obtain statistics for the coverage metrics
coverage_df = df.query('TYPE == "metrics_covered_lines_in_function_ratio"')

coverage_df['VALUE'].astype(float).describe()

count    1102.000000
mean       92.805384
std        14.358189
min         8.823529
25%        93.548387
50%       100.000000
75%       100.000000
max       100.000000
Name: VALUE, dtype: float64

In [9]:
# Obtain information about the longest execution times
sorted_df = df.sort_values(by='EXECUTIONTIME', ascending=False)
sorted_df.head()

Unnamed: 0,EXECUTIONID,ABSTRACTIONID,ACTIONID,ARENAID,SHEETID,SYSTEMID,VARIANTID,ADAPTERID,X,Y,TYPE,VALUE,RAWVALUE,VALUETYPE,LASTMODIFIED,EXECUTIONTIME
29352,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task583,PLACEHOLDER,execute,583.xlsx,b67e71c9-407f-4310-9e63-773268541061,original,"0('catalan_number', 'catalan_number', 0)",0,1,value,16796,16796,<class 'int'>,"(datetime.datetime(2024, 9, 15, 19, 40, 56, 58...",22061
34301,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task583,PLACEHOLDER,execute,583.xlsx,b67e71c9-407f-4310-9e63-773268541061,original,"0('catalan_number', 'catalan_number', 0)",1,1,op,catalan_number,catalan_number,function,"(datetime.datetime(2024, 9, 15, 19, 40, 56, 58...",22061
25146,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task583,PLACEHOLDER,execute,583.xlsx,b67e71c9-407f-4310-9e63-773268541061,original,"0('catalan_number', 'catalan_number', 0)",0,2,value,4862,4862,<class 'int'>,"(datetime.datetime(2024, 9, 15, 19, 40, 56, 58...",6341
19355,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task583,PLACEHOLDER,execute,583.xlsx,b67e71c9-407f-4310-9e63-773268541061,original,"0('catalan_number', 'catalan_number', 0)",1,2,op,catalan_number,catalan_number,function,"(datetime.datetime(2024, 9, 15, 19, 40, 56, 58...",6341
34528,aeb97361-e15b-4ce8-9d4a-d94f19e82717,Task84,PLACEHOLDER,execute,84.xlsx,122e1fa5-4fd5-4a40-9194-27d5d492c4fd,original,"0('sequence', 'sequence', 0)",0,1,value,6,6,<class 'int'>,"(datetime.datetime(2024, 9, 15, 19, 32, 11, 15...",1091
