# LASSO Python
## MBPP Evaluation Analytics
This notebook showcases how a data analyst can use LASSO Python to collect dynamic information about Python code implementations.
By using data from an Apache Ignite cache that was filled by LASSO Python, the analyst can easily deduce interesting information, such as correctness of the code, execution times, or coverage data.
Both the tests and the Python code implementations used in this example stem from the Mostly Basic Python Problems (MBPP) dataset. MBPP is a dataset consisting of crowd-sourced Python programming problems. Each task consists of a description, a code solution and multiple test cases. By using LASSO Python, multiple of these tests were replicated with the results shown in this notebook. More information on MBPP can be found here: https://github.com/google-research/google-research/tree/master/mbpp

In [1]:
# Import result csv that was generated by LASSO Python and extracted from the Apache Ignite cache
import pandas as pd

file_path = './evaluation_results.csv'
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,EXECUTIONID,ABSTRACTIONID,ACTIONID,ARENAID,SHEETID,SYSTEMID,VARIANTID,ADAPTERID,X,Y,TYPE,VALUE,RAWVALUE,VALUETYPE,LASTMODIFIED,EXECUTIONTIME
0,d44c2942-79f9-4494-9610-62d081360c0e,Task6,,execute,6.xlsx,3657a3483edda41e0c8651e6b8ce4533,original,"('differ_At_One_Bit_Pos', 'differ_At_One_Bit_P...",1,3,adaptation_instruction,No adaptations,No adaptations needed,AdaptationInstruction,"(datetime.datetime(2024, 9, 13, 12, 25, 43, 12...",-1
1,2056384a-e637-4681-b760-dda612a9a0cd,Task74,,execute,74.xlsx,,original,"('create', 'createPythonObject', 0)",1,7,op,create,create,function,"(datetime.datetime(2024, 9, 13, 12, 28, 7, 318...",-1
2,2056384a-e637-4681-b760-dda612a9a0cd,Task74,,execute,74.xlsx,oracle,oracle,oracle,0,3,oracle,True,True,<class 'bool'>,"(datetime.datetime(2024, 9, 13, 12, 28, 7, 315...",-1
3,8685fa01-4ba0-486a-ad3b-bf2f3fb4adcc,Task17,,execute,17.xlsx,44d3ff0cde54cde689c42c6d42500212,original,"('square_perimeter', 'square_perimeter', 0)",-1,1,metrics_all_lines_in_function,2,2,<class 'int'>,"(datetime.datetime(2024, 9, 13, 12, 26, 12, 88...",-1
4,2936d273-898c-4b6d-9360-04650af560c9,Task8,,execute,8.xlsx,oracle,oracle,oracle,0,3,oracle,"[1, 4, 9, 16, 2","[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]",<class 'list'>,"(datetime.datetime(2024, 9, 13, 12, 25, 47, 58...",-1


In [2]:
# Compare oracle values (ground truth) with the values resulting from the LASSO Python test execution
value_df = df.query('TYPE == "value"')
oracle_df = df.query('TYPE == "oracle"')

merged_df = pd.merge(value_df, oracle_df, on=['EXECUTIONID', 'ABSTRACTIONID', 'SHEETID', 'X', 'Y'])
merged_df[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
0,Task20,3,value,False,oracle,False
1,Task80,3,value,84.0,oracle,84
2,Task71,3,value,"[5, 15, 25, 37,",oracle,"[5, 15, 25, 37,"
3,Task70,12,value,True,oracle,True
4,Task3,1,value,False,oracle,False
...,...,...,...,...,...,...
103,Task58,1,value,True,oracle,True
104,Task11,2,value,bcd,oracle,bcd
105,Task70,4,value,True,oracle,True
106,Task79,2,value,True,oracle,True


In [3]:
# Obtain statistics for the coverage metrics
coverage_df = df.query('TYPE == "metrics_covered_lines_in_function_ratio"')

coverage_df['VALUE'].astype(float).describe()

count    107.000000
mean      90.388082
std       19.311806
min        8.823529
25%       88.562092
50%      100.000000
75%      100.000000
max      100.000000
Name: VALUE, dtype: float64

In [4]:
# Obtain information about the longest execution times
sorted_df = df.sort_values(by='EXECUTIONTIME', ascending=False)
sorted_df.head()

Unnamed: 0,EXECUTIONID,ABSTRACTIONID,ACTIONID,ARENAID,SHEETID,SYSTEMID,VARIANTID,ADAPTERID,X,Y,TYPE,VALUE,RAWVALUE,VALUETYPE,LASTMODIFIED,EXECUTIONTIME
1383,e01d70ca-7d73-4970-bdfa-3bbab824693a,Task67,,execute,67.xlsx,aca814bbb827b6fad42db8e2d8a417b9,original,"('bell_number', 'bell_number', 0)",0,3,value,677568532064582,6775685320645824322581483068371419745979053216...,<class 'int'>,"(datetime.datetime(2024, 9, 13, 12, 27, 34, 87...",1828
2385,e01d70ca-7d73-4970-bdfa-3bbab824693a,Task67,,execute,67.xlsx,aca814bbb827b6fad42db8e2d8a417b9,original,"('bell_number', 'bell_number', 0)",1,3,op,bell_number,bell_number,function,"(datetime.datetime(2024, 9, 13, 12, 27, 34, 87...",1828
9,04f74e1f-7ccf-4220-8b6c-f87f5828e51c,Task16,,execute,16.xlsx,44b79afe266992429a44076a4bf2b532,original,"('text_lowercase_underscore', 'text_lowercase_...",1,1,op,text_lowercase_,text_lowercase_underscore,function,"(datetime.datetime(2024, 9, 13, 12, 26, 7, 485...",607
1015,04f74e1f-7ccf-4220-8b6c-f87f5828e51c,Task16,,execute,16.xlsx,44b79afe266992429a44076a4bf2b532,original,"('text_lowercase_underscore', 'text_lowercase_...",0,1,value,True,True,<class 'bool'>,"(datetime.datetime(2024, 9, 13, 12, 26, 7, 485...",607
2249,b8903a65-2825-486f-a310-622b0f61f31e,Task65,,execute,65.xlsx,ed77ddf0ce720da74d31b932e9d634b0,original,"('recursive_list_sum', 'recursive_list_sum', 0)",0,12,value,210,210,<class 'int'>,"(datetime.datetime(2024, 9, 13, 12, 27, 22, 67...",590
