# Summarizing failure reasons
This notebook demonstrates how one can dive into summarizing error messages and failure reasones. The `_result.jsonl` files contain a column `result`, which contains a string, that in case of test failure is "failed: " and in other cases contains additionally the error message that was observed. These failures and errors can be summarized for each model as shown here.

In [1]:
import pandas as pd
import os

In [2]:
directory = "../data/"
# if you want to investigate a single model only, add its name here:
search_term = ""

# Enter the terms to search for here
common_errors = ['has no attribute', 'invalid syntax', 'is not defined', "Can't convert object", 'cannot import', 'out of range', 'unexpected keyword argument']

First we collect all results and the corresponding models from the jsonl files.

In [3]:
collection = []
for filename in os.listdir(directory):
    if search_term in filename and filename.endswith("_results.jsonl"):
        df = pd.read_json(directory + filename, lines=True)
        df['model'] = filename.replace("samples_","").replace("_results","").replace(".jsonl","")
        collection.append(df)

if len(collection) == 1:
    df = collection[0]
else:
    df = pd.concat(collection)

We then focus on the tests which failed.

In [4]:
df = df[df['passed'] == False]
df

Unnamed: 0,task_id,completion,full_response,result,passed,model
0,../test_cases/apply_otsu_threshold_and_count_p...,\n# Plan:\n# 1. Convert the image to grayscale...,"Here's the completed code with the plan, libra...",failed: OpenCV(4.9.0) D:\a\opencv-python\openc...,False,claude-3-opus-20240229
4,../test_cases/count_number_of_touching_neighbo...,\n# Plan:\n# 1. Find the unique labels in the ...,"Here's the completed code with a plan, necessa...",failed:,False,claude-3-opus-20240229
5,../test_cases/count_objects_over_time.ipynb,\n# Plan:\n# 1. Iterate over each binary image...,Here's the completed code:\n\n```python\n# Pla...,failed: OpenCV(4.9.0) D:\a\opencv-python\openc...,False,claude-3-opus-20240229
9,../test_cases/deconvolve_image.ipynb,\nimport numpy as np\nfrom scipy.signal import...,"Here's the completed code with a plan, importe...",failed: deconvolve() got an unexpected keyword...,False,claude-3-opus-20240229
10,../test_cases/detect_edges.ipynb,\n# Plan:\n# 1. Convert the image to grayscale...,"Here's the completed code with a plan, importe...",failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,False,claude-3-opus-20240229
...,...,...,...,...,...,...
465,../test_cases/worflow_segmentation_measurement...,\nimport numpy as np\nfrom scipy.ndimage impor...,```python\nimport numpy as np\nfrom scipy.ndim...,failed: cannot import name 'measure' from 'sci...,False,Mistral-7B-Instruct-v0.2
466,../test_cases/worflow_watershed_segmentation_c...,\nimport numpy as np\nfrom scipy.ndimage impor...,Here's the completed code based on your descri...,failed: cannot import name 'watershed' from 's...,False,Mistral-7B-Instruct-v0.2
467,../test_cases/workflow_batch_process_folder_co...,\nimport os\nimport skimage.io as io\n\ndef wo...,```python\nimport os\nimport skimage.io as io\...,failed: 'Ganglioneuroblastoma_0.tif',False,Mistral-7B-Instruct-v0.2
468,../test_cases/workflow_batch_process_folder_me...,\nimport os\nimport numpy as np\nimport pandas...,```python\nimport os\nimport numpy as np\nimpo...,failed: name 'cv2' is not defined,False,Mistral-7B-Instruct-v0.2


In [5]:
# Just some examples to get an idea
df.head(10)['result'].tolist()

["failed: OpenCV(4.9.0) D:\\a\\opencv-python\\opencv-python\\opencv\\modules\\imgproc\\src\\thresh.cpp:1555: error: (-2:Unspecified error) in function 'double __cdecl cv::threshold(const class cv::_InputArray &,const class cv::_OutputArray &,double,double,int)'\n> THRESH_OTSU mode:\n>     'src_type == CV_8UC1 || src_type == CV_16UC1'\n> where\n>     'src_type' is 4 (CV_32SC1)\n",
 'failed: ',
 "failed: OpenCV(4.9.0) D:\\a\\opencv-python\\opencv-python\\opencv\\modules\\imgproc\\src\\connectedcomponents.cpp:5632: error: (-215:Assertion failed) iDepth == CV_8U || iDepth == CV_8S in function 'cv::connectedComponents_sub1'\n",
 "failed: deconvolve() got an unexpected keyword argument 'mode'",
 "failed: OpenCV(4.9.0) d:\\a\\opencv-python\\opencv-python\\opencv\\modules\\imgproc\\src\\color.simd_helpers.hpp:92: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0x59191d0d::Set<1,-

## Searching for common terms
First, we search the error messages for common errors as specified above.

In [6]:
# Define the function to count errors
def count_errors(group, error_list):
    counts = {error: group['result'].str.contains(error, regex=False).sum() for error in error_list}
    return pd.Series(counts)

# Apply the function to each model group
error_counts = df.groupby('model').apply(count_errors, error_list=common_errors)

# Transpose the result for the desired format: models as columns, errors as rows
error_counts = error_counts.T

In [7]:
error_counts

model,Mistral-7B-Instruct-v0.2,claude-3-opus-20240229,codellama,gemini-pro,gpt-3.5-turbo-1106,gpt-4-1106-preview,gpt-4-turbo-2024-04-09
has no attribute,54,23,49,21,28,38,24
invalid syntax,3,0,45,0,1,2,0
is not defined,35,4,21,246,8,4,12
Can't convert object,0,0,12,0,11,2,4
cannot import,37,3,12,1,0,1,1
out of range,5,2,5,1,1,0,0
unexpected keyword argument,11,12,3,5,12,7,8


## Most popular failure reasons
Furthermore, we search for the three most observed reasons for failure. These might be either error messages, or in case the result is only `failed: ` this indicated that the tests were not passed, presumably because the tested function did not return the right result.

In [8]:
# Step 1: Group the DataFrame by 'model' and get the value counts of 'result'
model_result_count = df.groupby('model')['result'].value_counts()

# Step 2: Create an empty DataFrame to store the results
model_top_results = []

# Step 3: Loop through each group to get the three most common results per model
for model, counts in model_result_count.groupby(level=0):
    # Get the top three results (note: nlargest returns the results)
    top_three = counts.nlargest(3)
    # Prepare data to append to the DataFrame
    data = {
        'Model': model,
        'Top1 Result': top_three.index.get_level_values(1)[0],
        'Top1 Count': top_three.iloc[0],
        'Top2 Result': top_three.index.get_level_values(1)[1] if len(top_three) > 1 else None,
        'Top2 Count': top_three.iloc[1] if len(top_three) > 1 else None,
        'Top3 Result': top_three.index.get_level_values(1)[2] if len(top_three) > 2 else None,
        'Top3 Count': top_three.iloc[2] if len(top_three) > 2 else None
    }
    # Append data
    model_top_results.append(data)

# Display the resulting DataFrame
most_common_errors = pd.DataFrame(model_top_results)
most_common_errors

Unnamed: 0,Model,Top1 Result,Top1 Count,Top2 Result,Top2 Count,Top3 Result,Top3 Count
0,Mistral-7B-Instruct-v0.2,failed:,91,failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,19,failed: No module named 'skimage.label',16
1,claude-3-opus-20240229,failed:,137,failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,18,failed: 'list' object has no attribute 'shape',11
2,codellama,failed:,112,failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,25,"failed: invalid syntax (<string>, line 4)",16
3,gemini-pro,failed: name 'np' is not defined,145,failed:,56,failed: name 'cv2' is not defined,51
4,gpt-3.5-turbo-1106,failed:,123,failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,21,failed: OpenCV(4.9.0) D:\a\opencv-python\openc...,16
5,gpt-4-1106-preview,failed:,114,failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,14,failed: 'numpy.ndarray' object has no attribut...,11
6,gpt-4-turbo-2024-04-09,failed:,135,failed: OpenCV(4.9.0) d:\a\opencv-python\openc...,14,failed: 'list' object has no attribute 'shape',14
