# Notebook error process 

## ML notebook filtering and crash-related library extraction

#### rules for extracting related libs from error information:

~~(not included) 1. Uninteresting error types even without knowing if any libraries are involved:
        NameError?, FileNotFoundError, OSError, KeyboardInterrupt?, ModuleNotFoundError?, AssertionError?~~
        
    2. compile a list of popular libraries used in data science and ML
        1. if library name is in error value. e.g., 
            TypeError: 'numpy.float64' object is not callable
            AttributeError: module 'torchvision.transforms.v2' has no attribute 'CutMix'
        2. for pattern "----> line_number xx.yy", check if xx is alias of an import library
        ??? 3. similarly, for yy, can check if the property name is from a library. Do not know how.
        4. for some type errors and attribute errors, where to check the object name is from which library.
            AttributeError: 'Sequential' object has no attribute 'predict_classes'
            TypeError: 'AxesSubplot' object is not subscriptable
        ??? 5. for ValueError, certain keywords in error values indicates library names
            broadcast, array, shape -> numpy
            

In [1]:
# lib2 parsing config
import pickle
import config

#config.top_lib_names
with open('lib_classes.pickle', 'rb') as f:
    lib_classes_dict = pickle.load(f)


### 1. nbs from github - the stack v1

https://huggingface.co/datasets/bigcode/the-stack-dedup/tree/main/data/jupyter-notebook


In [2]:
# read in all errors from the file
import pandas as pd
import util

df2_err_builtin_exps = pd.read_excel('C:/Users/yirwa29/Downloads/Dataset-Nb/nberror_g_all_p4_exception_types.xlsx')
df2_err_builtin_exps.head()

Unnamed: 0,fname,ename,evalue,traceback,lib
0,00000-101-cookie-clicker-v2-checkpoint.ipynb,keyboardinterrupt,,['--------------------------------------------...,"urllib3,selenium"
1,00000-1012-demand-forecasting-data-prep-from-s...,libcustomerrors,"FATAL: no pg_hba.conf entry for host ""75.166....",['--------------------------------------------...,psycopg2
2,00000-1017-heatmapseq2seq.ipynb,valueerror,"x and y must have same first dimension, but ha...",['--------------------------------------------...,matplotlib
3,00000-1023-crawler-20190515-20190516.ipynb,connectionerror,HTTPSConnectionPool(host='www.backpackers.com....,['--------------------------------------------...,"requests,urllib3"
4,00000-1033-peer-solution-predicting-survival-t...,libcustomerrors,<urlopen error [Errno 11001] getaddrinfo failed>,['--------------------------------------------...,pandas


In [3]:
# lib percentage from first general lib extraction -> util.extract_lib
len(df2_err_builtin_exps[~df2_err_builtin_exps["lib"].isnull()])/len(df2_err_builtin_exps)

0.35296669615239423

### 1.1 Extract imported libraries and their alias used in the error notebook dataset

extract all imports from all error nbs (all langauges)

In [5]:
import imports_parser
import pickle
import pandas as pd

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_g_error"
res = imports_parser.get_imports_nbs_static(path_err_nbs+"/nbs", imports_parser.get_imports_line_all)
res_pd = pd.DataFrame.from_dict(res)
res_pd["lib_alias"] = res_pd.imports.apply(imports_parser.get_lib_alias)
res_pd.to_excel(path_err_nbs+"/imports_all_info.xlsx", index=False, engine="xlsxwriter")

Unexpected error converting to json 00279-3344-datasets.ipynb


### 1.1.1 filter only the ones using the selected libraries

based on the extracted all imports from all error nbs (all langauges)

also filtered python notebooks

In [4]:
import pandas as pd
import util

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_g_error"
df_imports = pd.read_excel(path_err_nbs + '/imports_all_info.xlsx')

In [6]:
df_imports["lib_alias"] = df_imports.lib_alias.apply(eval)
df_imports_filtered = df_imports.loc[df_imports.lib_alias.apply(lambda imports: any(util.simple_lib_parser(imp[0]) in config.top_lib_names for imp in imports))]
# df_imports_filtered.to_excel(path_err_nbs+"/imports_all_info_filtered_ML.xlsx", index=False, engine="xlsxwriter")

In [1]:
#df_imports_filtered = pd.read_excel(path_err_nbs + '/imports_all_info_filtered_ML.xlsx')
# for mapping the notebooks with python programming language
df2_err_builtin_exps = pd.read_excel('C:/Users/yirwa29/Downloads/Dataset-Nb/nberror_g_all_p4_exception_types.xlsx')

In [2]:
df_imports_filtered_python = pd.merge(df_imports_filtered, df2_err_builtin_exps, on="fname")[["fname", "imports", "lib_alias"]].drop_duplicates()
df_imports_filtered_python.to_excel(path_err_nbs+"/imports_all_info_filtered_ML_python.xlsx", index=False, engine="xlsxwriter")
print("{0:.2%} of all the python GitHub notebooks(containing errors) use the selected libraries".format(len(df_imports_filtered_python)/df2_err_builtin_exps.fname.nunique()))

80.31% of all the python GitHub notebooks(containing errors) use the selected libraries


In [12]:
# just to observe on the error dataset
pd.merge(df_imports_filtered, df2_err_builtin_exps, on="fname")[["fname", "ename", "evalue","traceback","lib"]]

Unnamed: 0,fname,ename,evalue,traceback,lib
0,00000-101-cookie-clicker-v2-checkpoint.ipynb,keyboardinterrupt,,['--------------------------------------------...,"urllib3,selenium"
1,00000-1012-demand-forecasting-data-prep-from-s...,libcustomerrors,"FATAL: no pg_hba.conf entry for host ""75.166....",['--------------------------------------------...,psycopg2
2,00000-1017-heatmapseq2seq.ipynb,valueerror,"x and y must have same first dimension, but ha...",['--------------------------------------------...,matplotlib
3,00000-1023-crawler-20190515-20190516.ipynb,connectionerror,HTTPSConnectionPool(host='www.backpackers.com....,['--------------------------------------------...,"requests,urllib3"
4,00000-1033-peer-solution-predicting-survival-t...,libcustomerrors,<urlopen error [Errno 11001] getaddrinfo failed>,['--------------------------------------------...,pandas
...,...,...,...,...,...
137967,00311-934-untitled.ipynb,syntaxerror,"invalid syntax (<unknown>, line 1)","[' File ""<unknown>"", line 1\n %%capture\n ...",
137968,00311-934-untitled.ipynb,typeerror,'NoneType' object is not iterable,['--------------------------------------------...,
137969,00311-984-working-with-mask-during-horizontal-...,valueerror,Input matrix must have some non-missing values,['--------------------------------------------...,fancyimpute
137970,00311-987-titanic.ipynb,nameerror,name 'df_train' is not defined,['--------------------------------------------...,


In [2]:
2293/137972

0.016619314063723075

### 1.2 Extract related libraries from crash lines in tracebacks

The second attempt to extracting libs.  -> util.extract_lib_2

In [7]:
import pandas as pd
import util
import config

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_g_error"
df_imports = pd.read_excel(path_err_nbs + '/imports_all_info.xlsx')
df2_err_builtin_exps = pd.read_excel('C:/Users/yirwa29/Downloads/Dataset-Nb/nberror_g_all_p4_exception_types.xlsx')
df2_err_builtin_exps.head()

Unnamed: 0,fname,ename,evalue,traceback,lib
0,00000-101-cookie-clicker-v2-checkpoint.ipynb,keyboardinterrupt,,['--------------------------------------------...,"urllib3,selenium"
1,00000-1012-demand-forecasting-data-prep-from-s...,libcustomerrors,"FATAL: no pg_hba.conf entry for host ""75.166....",['--------------------------------------------...,psycopg2
2,00000-1017-heatmapseq2seq.ipynb,valueerror,"x and y must have same first dimension, but ha...",['--------------------------------------------...,matplotlib
3,00000-1023-crawler-20190515-20190516.ipynb,connectionerror,HTTPSConnectionPool(host='www.backpackers.com....,['--------------------------------------------...,"requests,urllib3"
4,00000-1033-peer-solution-predicting-survival-t...,libcustomerrors,<urlopen error [Errno 11001] getaddrinfo failed>,['--------------------------------------------...,pandas


In [8]:
df2_err_builtin_exps["lib2"] = df2_err_builtin_exps.apply(util.extract_lib_2, lib_names=config.top_lib_names, df_imports=df_imports, lib_classes_dict=lib_classes_dict, axis=1)

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

In [10]:
df2_err_builtin_exps["lib_parsed"] = df2_err_builtin_exps["lib2"].fillna(df2_err_builtin_exps["lib"]).map(util.simple_lib_parser)
df2_err_builtin_exps.to_excel(path_err_nbs+"/nberror_g_all_p4_exception_types_lib_parsed.xlsx", index=False, engine='xlsxwriter')

In [11]:
sum(~df2_err_builtin_exps["lib_parsed"].isna())/len(df2_err_builtin_exps)

0.45254582025725826

In [12]:
df2_err_builtin_exps.lib_parsed.value_counts()[:10]

lib_parsed
pandas        15289
numpy         14132
tensorflow    10173
torch          8275
sklearn        5537
matplotlib     3594
scipy          1256
ipython         984
seaborn         846
cv2             655
Name: count, dtype: int64

### 1.3 Filter errors with the error notebooks that use the selected libraries

In [1]:
import pandas as pd
import util
import config

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_g_error"
df2_err = pd.read_excel(path_err_nbs+"/nberror_g_all_p4_lib_parsed.xlsx")
df_imports_filtered = pd.read_excel(path_err_nbs+"/imports_all_info_filtered_ML_python.xlsx")

In [3]:
# filter errors
df2_err_filtered = pd.merge(df2_err, df_imports_filtered, on='fname', how='inner').drop(columns=["imports","lib_alias"])
# percentage of parsed crash-related libraries out of all the errors
sum(~df2_err_filtered["lib_parsed"].isna())/len(df2_err_filtered)

0.5268967616617865

In [4]:
df2_err_filtered.lib_parsed.value_counts()[:10]

lib_parsed
pandas        15083
numpy         12704
tensorflow     9928
torch          7568
sklearn        5496
matplotlib     3491
scipy          1215
seaborn         834
ipython         692
cv2             649
Name: count, dtype: int64

In [6]:
df2_err_filtered.to_excel(r"C:\Users\yirwa29\Downloads\Dataset-Nb\nberror_g_all_p5.xlsx", index=False, engine='xlsxwriter')

### 2. nbs from kaggle

### 2.1 Extract imported libraries and their alias used in the error notebook dataset

In [None]:
import imports_parser
import pickle

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_k_error"
res = imports_parser.get_imports_nbs_static(path_err_nbs, imports_parser.get_imports_line_all)
res_pd = pd.DataFrame.from_dict(res)
res_pd["lib_alias"] = res_pd.imports.apply(imports_parser.get_lib_alias)

In [157]:
res_pd.to_excel(path_err_nbs+"/imports_all_info.xlsx", index=False, engine="xlsxwriter")
res_pd

Unnamed: 0,fname,imports,lib_alias
0,aadyac_ingenium-level3.ipynb,"{(, numpy, np), (tensorflow.keras.utils, plot_...","[[numpy, np], [tensorflow, plot_model], [cv2, ..."
1,abaojiang_eda-on-game-progress.ipynb,"{(typing, Any,, ), (, warnings, ), (, numpy, n...","[[typing, Any,], [warnings, warnings], [numpy,..."
2,abdallahelsayed22_image-segmentation-u-net.ipynb,"{(, cv2, ), (keras.layers, Input,, ), (, imgau...","[[cv2, cv2], [keras, Input,], [imgaug, iaa], [..."
3,abdallahwagih_plant-stress-identification-acc-...,"{(tensorflow.keras.preprocessing.image, ImageD...","[[tensorflow, ImageDataGenerator], [tensorflow..."
4,abdelrahmanmuhsen_semseg-tests.ipynb,"{(, warnings, ), (, cv2, ), (keras.callbacks, ...","[[warnings, warnings], [cv2, cv2], [keras, Mod..."
...,...,...,...
4344,yeohhanyi_cirrhosis-outcomes.ipynb,"{(sklearn.preprocessing, LabelEncoder,, ), (, ...","[[sklearn, LabelEncoder,], [numpy, np], [xgboo..."
4345,yeohqiwei_credit-card-fraud-transaction-classi...,"{(, numpy, np), (math, radians,, ), (imblearn....","[[numpy, np], [math, radians,], [imblearn, ADA..."
4346,zainabmuhammad_house-prices-prediction-ip-proj...,"{(, numpy, np), (scipy, stats, ), (, seaborn, ...","[[numpy, np], [scipy, stats], [seaborn, sns], ..."
4347,zakirkhanaleemi_gemini-api-entrant-notebook.ipynb,"{(, pathlib, ), (, google.ai.generativelanguag...","[[pathlib, pathlib], [google, glm], [IPython, ..."


### 2.2 Extract related libraries from crash lines in tracebacks

In [1]:
import pandas as pd
import pickle
import util

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_k_error"
df_err = pd.read_excel(path_err_nbs + '/nberror_k_p3.xlsx')
df_imports = pd.read_excel(path_err_nbs + '/imports_all_info.xlsx')

In [3]:
df_err["lib2"] = df_err.apply(util.extract_lib_2, lib_names=config.top_lib_names, df_imports=df_imports, lib_classes_dict=lib_classes_dict, axis=1)

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception when listing traceback
exception 

In [45]:
1-len(df_err[df_err.lib.isnull()&df_err.lib2.isnull()])/len(df_err)

0.6208677685950413

In [4]:
df_err["lib_parsed"] = df_err["lib2"].fillna(df_err["lib"]).map(util.simple_lib_parser)
df_err.to_excel(path_err_nbs+"/nberror_k_all_p3_lib_parsed.xlsx", index=False, engine='xlsxwriter')

filter only the ones using the selected libraries

In [25]:
import pandas as pd
import util

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_k_error"
df_imports = pd.read_excel(path_err_nbs + '/imports_all_info.xlsx')
df_imports["lib_alias"] = df_imports.lib_alias.apply(eval)
df_imports_filtered = df_imports.loc[df_imports.lib_alias.apply(lambda imports: any(util.simple_lib_parser(imp[0]) in config.top_lib_names for imp in imports))]

In [33]:
print("There are {1} notebooks ({0:.2%} of all error notebooks) using the selected ML libraries".format(len(df_imports_filtered)/len(df_imports),len(df_imports_filtered)))

There are 4064 notebooks (93.45% of all error notebooks) using the selected ML libraries


In [47]:
df_imports_filtered.to_excel(path_err_nbs+"/imports_all_info_filtered_ML.xlsx", index=False, engine="xlsxwriter")

### 2.3 Filter error dataset with notebooks(fname) that use any of the selected libraries

In [4]:
import pandas as pd
import util

path_err_nbs = r"C:\Users\yirwa29\Downloads\Dataset-Nb\nbdata_k_error"
df_err = pd.read_excel(path_err_nbs+"/nberror_k_all_p3_lib_parsed.xlsx")
df_imports_filtered = pd.read_excel(path_err_nbs + '/imports_all_info_filtered_ML.xlsx')

In [5]:
df_err_filtered = pd.merge(df_err, df_imports_filtered, on='fname', how='inner').drop(columns=["imports","lib_alias"])
sum(~df_err_filtered["lib_parsed"].isna())/len(df_err_filtered)

0.6622057400838439

In [6]:
df_err_filtered.lib_parsed.value_counts()[:10]

lib_parsed
tensorflow      874
pandas          779
numpy           531
torch           512
sklearn         411
matplotlib      121
transformers    116
ipython          70
nltk             47
seaborn          47
Name: count, dtype: int64

In [7]:
df_err_filtered.to_excel(path_err_nbs+"/nberror_k_p4.xlsx", index=False, engine='xlsxwriter')

In [8]:
df_err_filtered

Unnamed: 0,fname,ename,evalue,traceback,lib,lib2,lib_parsed
0,aaronalbrecht_hardness-contest.ipynb,valueerror,The feature names should match those that were...,['--------------------------------------------...,sklearn,,sklearn
1,aaryaamoharir_resnet-50-my-version.ipynb,keyboardinterrupt,,['--------------------------------------------...,"keras,tensorflow",tensorflow,tensorflow
2,aaryaamoharir_resnet-50-version-2.ipynb,keyboardinterrupt,,['--------------------------------------------...,"keras,tensorflow",tensorflow,tensorflow
3,achintyabhat_activation-maximization.ipynb,typeerror,'AxesSubplot' object is not subscriptable,['--------------------------------------------...,,,
4,adityabajaj03_dr-cnn.ipynb,keyerror,'val_categorical_accuracy',['--------------------------------------------...,,,
...,...,...,...,...,...,...,...
6197,yogitabakhru_pandas-series.ipynb,keyerror,'key of type tuple not found and not a MultiIn...,['--------------------------------------------...,pandas,,pandas
6198,yogitabakhru_pandas-series.ipynb,syntaxerror,"invalid syntax (2732271503.py, line 1)","[' File ""/tmp/ipykernel_28/2732271503.py"", li...",,,
6199,yousifadel_notebook4e3ed6988b.ipynb,valueerror,"in user code:\n\n File ""/opt/conda/lib/pyth...",['--------------------------------------------...,keras,tensorflow,tensorflow
6200,zalyildirim_kurs-proje1.ipynb,syntaxerror,"invalid syntax (247201396.py, line 1)","[' File ""/tmp/ipykernel_28/247201396.py"", lin...",,,
