# Notebooks Analysis Results

### Imports
* "os" per la gestione dei file,
* "pandas" per la creazione del dataframe con i risultati di analisi
* "pynblint", il modulo contenente le operazionalizzazioni delle best practice
* "config" per la dinamicità dei percorsi di progetto
* "entities" per modellare i soggetti dello studio (Notebooks, ecc.)

In [1]:
import os
import pandas as pd

from pathlib import Path

import config
import pynblint

from entities import Notebook, GitHubRepository, LocalRepository

### Processing

Ogni target notebook viene analizzato applicando le funzioni per l'analisi della qualità che rispecchiano le best practice individuate

In [2]:
cols = [
    "Notebook Name", 
    "Total cells",
    "MD cells",
    "Code cells",
    "Raw cells",
    "Linear Execution Order",
    "Classes",
    "Functions",
    "Imports in First Cell",
    "Markdown Lines",
    "Markdown Titles",
    "Bottom MD Lines Ratio",
    "Non-executed Cells",
    "Empty Cells",
    "Bottom Non-executed Cells",
    "Bottom Empty Cells",
    "Untitled notebook",
    "Restricted filename charset",
    "Short filename"
]

In [3]:
df = pd.DataFrame()

for filename in os.listdir(config.data_path):
    
    if filename.endswith(".ipynb"):
        
        notebook_path = os.path.join(config.data_path, filename)
        notebook = Notebook.from_string(notebook_path)
        
        df_row = pd.json_normalize(notebook.get_pynblint_results())
        df_row.columns = cols
        df = pd.concat([df, df_row])
        
df

Unnamed: 0,Notebook Name,Total cells,MD cells,Code cells,Raw cells,Linear Execution Order,Classes,Functions,Imports in First Cell,Markdown Lines,Markdown Titles,Bottom MD Lines Ratio,Non-executed Cells,Empty Cells,Bottom Non-executed Cells,Bottom Empty Cells,Untitled notebook,Restricted filename charset,Short filename
0,my-attempt-at-analytics-vidhya-job-a-thon.ipynb,78,4,74,0,True,0,3,False,7,2,0.0,0,3,0,2,False,True,False


## Analysis of a github repository

In [4]:
repo = GitHubRepository('https://github.com/collab-uniba/Sentiment_Analysis_4SE_BERT')

In [5]:
df = pd.DataFrame()

for notebook in repo.notebooks:
    df_row = pd.json_normalize(notebook.get_pynblint_results())
    df_row.columns = cols
    df = pd.concat([df, df_row])
    
repo_linting = repo.get_repo_results()
if len(repo_linting["DuplicateFilenames"])>0:
    print("There are two or more notebooks called: ")
    for filename in repo_linting["DuplicateFilenames"]:
        print("\""+filename+"\""+" ")
else:
    print("There are no duplicate filenames")
print("\n")
if len(repo_linting["UntitledNotebooks"])>0:
    print("There one or more notebooks at these paths: ")
    for path in repo_linting["UntitledNotebooks"]:
        print("\""+str(path)+"\""+" ")
else:
    print("There are no untitled notebooks")

df

There are no duplicate filenames


There are no untitled notebooks


Unnamed: 0,Notebook Name,Total cells,MD cells,Code cells,Raw cells,Linear Execution Order,Classes,Functions,Imports in First Cell,Markdown Lines,Markdown Titles,Bottom MD Lines Ratio,Non-executed Cells,Empty Cells,Bottom Non-executed Cells,Bottom Empty Cells,Untitled notebook,Restricted filename charset,Short filename
0,Sentiment_Analysis_4SE_BERT\notebooks\cross-pl...,13,0,13,0,True,0,0,False,0,0,0.0,0,0,0,0,False,True,False
0,Sentiment_Analysis_4SE_BERT\notebooks\github-w...,17,0,17,0,True,0,0,False,0,0,0.0,0,0,0,0,False,True,True
0,Sentiment_Analysis_4SE_BERT\notebooks\jira-w-b...,17,0,17,0,True,0,0,False,0,0,0.0,0,0,0,0,False,True,True
0,Sentiment_Analysis_4SE_BERT\notebooks\stackove...,18,1,17,0,True,0,0,False,1,0,0.0,0,0,0,0,False,True,False


## Analysis of a local folder

In [6]:
repo = LocalRepository(Path('./'))

df = pd.DataFrame()

for notebook in repo.notebooks:
    df_row = pd.json_normalize(notebook.get_pynblint_results())
    df_row.columns = cols
    df = pd.concat([df, df_row])
    
repo_linting = repo.get_repo_results()
if len(repo_linting["DuplicateFilenames"])>0:
    print("There are two or more notebooks called: ")
    for filename in repo_linting["DuplicateFilenames"]:
        print("\""+filename+"\""+" ")
else:
    print("There are no duplicate filenames")
print("\n")
if len(repo_linting["UntitledNotebooks"])>0:
    print("There one or more notebooks at these paths that are untitled: ")
    for path in repo_linting["UntitledNotebooks"]:
        print("\""+str(path)+"\""+" ")
else:
    print("There are no untitled notebooks")
    
df

There are no duplicate filenames


There are no untitled notebooks


Unnamed: 0,Notebook Name,Total cells,MD cells,Code cells,Raw cells,Linear Execution Order,Classes,Functions,Imports in First Cell,Markdown Lines,Markdown Titles,Bottom MD Lines Ratio,Non-executed Cells,Empty Cells,Bottom Non-executed Cells,Bottom Empty Cells,Untitled notebook,Restricted filename charset,Short filename
0,Sentiment_Analysis_4SE_BERT\notebooks\cross-pl...,13,0,13,0,True,0,0,False,0,0,0.0,0,0,0.0,0.0,False,True,False
0,Sentiment_Analysis_4SE_BERT\notebooks\github-w...,17,0,17,0,True,0,0,False,0,0,0.0,0,0,0.0,0.0,False,True,True
0,Sentiment_Analysis_4SE_BERT\notebooks\jira-w-b...,17,0,17,0,True,0,0,False,0,0,0.0,0,0,0.0,0.0,False,True,True
0,Sentiment_Analysis_4SE_BERT\notebooks\stackove...,18,1,17,0,True,0,0,False,1,0,0.0,0,0,0.0,0.0,False,True,False
0,NotebooksAnalysisResults.ipynb,12,6,6,0,True,0,0,True,11,5,,0,0,,,False,True,False
