# LASSO Python
## MBPP Evaluation Analytics
This notebook showcases how a data analyst can use LASSO Python to collect dynamic information about Python code implementations.
By using data from an Apache Ignite cache that was filled by LASSO Python, the analyst can easily deduce interesting information, such as correctness of the code, execution times, or coverage data.
Both the tests and the Python code implementations used in this example stem from the Mostly Basic Python Problems (MBPP) dataset. MBPP is a dataset consisting of crowd-sourced Python programming problems. Each task consists of a description, a code solution and multiple test cases. By using LASSO Python, multiple of these tests were replicated with the results shown in this notebook. More information on MBPP can be found here: https://github.com/google-research/google-research/tree/master/mbpp

In [15]:
# Import result csv that was generated by LASSO Python and extracted from the Apache Ignite cache
import pandas as pd

file_path = './evaluation_results.csv'
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,EXECUTIONID,ABSTRACTIONID,ACTIONID,ARENAID,SHEETID,SYSTEMID,VARIANTID,ADAPTERID,X,Y,TYPE,VALUE,RAWVALUE,VALUETYPE,LASTMODIFIED,EXECUTIONTIME
0,ad512ba6-4bca-41df-971a-e452874ac294,Task74,,execute,74.xlsx,0,original,"('create', 'createPythonObject', 0)",5,2,input_value,b,b,<class 'str'>,"(datetime.datetime(2024, 9, 12, 19, 34, 5, 809...",-1
1,9c10cbdb-2825-44f0-87a0-163fb42690cc,Task63,,execute,63.xlsx,0,original,"('create', 'createPythonObject', 0)",3,4,input_value,"(3, 5)","(3, 5)",<class 'tuple'>,"(datetime.datetime(2024, 9, 12, 19, 33, 7, 629...",-1
2,1b9db6a2-b869-423f-bac1-20c0a5b4a3da,Task19,,execute,19.xlsx,0,original,"('create', 'createPythonObject', 0)",4,1,input_value,2,2,<class 'int'>,"(datetime.datetime(2024, 9, 12, 19, 32, 22, 72...",-1
3,b44bce98-818d-4996-9cca-d6e057529a3a,Task68,,execute,68.xlsx,0,original,"('create', 'createPythonObject', 0)",2,3,input_value,python.List,python.List,<class 'str'>,"(datetime.datetime(2024, 9, 12, 19, 33, 35, 79...",-1
4,74e84000-ab50-48b1-91e5-3f5d9e9f417c,Task80,,execute,80.xlsx,0,original,"('tetrahedral_number', 'tetrahedral_number', 0)",-1,3,metrics_all_branches_in_file,1153,1153,<class 'int'>,"(datetime.datetime(2024, 9, 12, 19, 34, 34, 18...",-1


In [16]:
# Compare oracle values (ground truth) with the values resulting from the LASSO Python test execution
value_df = df.query('TYPE == "value"')
oracle_df = df.query('TYPE == "oracle"')

merged_df = pd.merge(value_df, oracle_df, on=['EXECUTIONID', 'ABSTRACTIONID', 'SHEETID', 'X', 'Y'])
merged_df[['ABSTRACTIONID', 'Y', 'TYPE_x', 'VALUE_x', 'TYPE_y', 'VALUE_y']]

Unnamed: 0,ABSTRACTIONID,Y,TYPE_x,VALUE_x,TYPE_y,VALUE_y
0,Task80,2,value,56.0,oracle,56
1,Task9,2,value,2,oracle,2
2,Task63,5,value,7,oracle,7
3,Task62,4,value,1,oracle,1
4,Task17,2,value,20,oracle,20
...,...,...,...,...,...,...
211,Task67,1,value,2,oracle,2
212,Task66,2,value,2,oracle,2
213,Task71,6,value,"[15, 19, 22, 32",oracle,"[15, 19, 22, 32"
214,Task6,3,value,False,oracle,False


In [17]:
# Obtain statistics for the coverage metrics
coverage_df = df.query('TYPE == "metrics_covered_lines_in_function_ratio"')

coverage_df['VALUE'].astype(float).describe()

count    214.000000
mean      90.388082
std       19.266420
min        8.823529
25%       88.398693
50%      100.000000
75%      100.000000
max      100.000000
Name: VALUE, dtype: float64

In [18]:
# Obtain information about the longest execution times
sorted_df = df.sort_values(by='EXECUTIONTIME', ascending=False)
sorted_df.head()

Unnamed: 0,EXECUTIONID,ABSTRACTIONID,ACTIONID,ARENAID,SHEETID,SYSTEMID,VARIANTID,ADAPTERID,X,Y,TYPE,VALUE,RAWVALUE,VALUETYPE,LASTMODIFIED,EXECUTIONTIME
2027,e3c8cb55-98ea-4a31-b8e6-7f9a92547a39,Task67,,execute,67.xlsx,0,original,"('bell_number', 'bell_number', 0)",1,3,op,bell_number,bell_number,function,"(datetime.datetime(2024, 9, 12, 20, 34, 27, 78...",1559
4004,e3c8cb55-98ea-4a31-b8e6-7f9a92547a39,Task67,,execute,67.xlsx,0,original,"('bell_number', 'bell_number', 0)",0,3,value,677568532064582,6775685320645824322581483068371419745979053216...,<class 'int'>,"(datetime.datetime(2024, 9, 12, 20, 34, 27, 78...",1559
5265,fbb01a05-f440-41db-9b37-005a378eb563,Task67,,execute,67.xlsx,0,original,"('bell_number', 'bell_number', 0)",0,3,value,677568532064582,6775685320645824322581483068371419745979053216...,<class 'int'>,"(datetime.datetime(2024, 9, 12, 19, 33, 30, 99...",1477
3511,fbb01a05-f440-41db-9b37-005a378eb563,Task67,,execute,67.xlsx,0,original,"('bell_number', 'bell_number', 0)",1,3,op,bell_number,bell_number,function,"(datetime.datetime(2024, 9, 12, 19, 33, 30, 99...",1477
778,da3ed622-abeb-4736-9ea6-217d9720f00d,Task16,,execute,16.xlsx,0,original,"('text_lowercase_underscore', 'text_lowercase_...",1,1,op,text_lowercase_,text_lowercase_underscore,function,"(datetime.datetime(2024, 9, 12, 20, 32, 48, 65...",1429
