**NOTE** This notebook is work under progress

# Interactive exploration of current errors in pandas docstrings

*DISCLAIMER: This notebook is based on the one uploaded by @dujm [here](https://github.com/python-sprints/pandas-mentoring/blob/master/notebooks/docstring_error_interactive.ipynb)*


This notebook will help you detect which errors are still present on some of the docstrings of pandas, so that you can select one of them, fix it, and submit a PR to the [pandas repository](https://github.com/pandas-dev/pandas). 

**IMPORTANT!** Before starting to work on fixing an error, check that nobody is already working on it by searching the issues and PRs in the pandas repository. If you nobody is doing so, open an issue and let others know you will be fixing that docstring.

This script currently supports pandas version >= 0.25.0

Let's start by importing the necessary packages:

In [2]:
import os

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import ipywidgets as widgets
import qgrid

## *Static exploration*

## 1. Generate a .json containing all current errors

This step was automatically done if you are running this notebook from Binder. Keep in mind that the .json file is updated every 15 minutes, so it might be outdated. When you select an error to work on, double check that nobody has submitted an issue to work on it already.

If you want to generate the .json file locally, simply run the following command from your pandas clone:

`./scripts/validate_docstrings.py --format=json > /path/to/json/pandas_docstring_errors.json`

## 2. Plot a table describing the errors

In [4]:
file = 'pandas_docstring_errors.json'
df = (pd.read_json(file)
            .transpose()
            #.filter(items=['errors'])
            .explode('errors')
            .dropna()
            .reset_index()
     )
df.head(2)

Unnamed: 0,index,type,docstring,deprecated,file,file_line,github_link,errors,warnings,examples_errors,in_api,section,subsection,shared_code_with
0,pandas.Categorical,type,Represent a categorical variable in classic R ...,False,pandas/core/arrays/categorical.py,205,https://github.com/pandas-dev/pandas/blob/mast...,"[PR01, Parameters {fastpath} not documented]",[],,True,Categorical data,Properties,
1,pandas.Categorical,type,Represent a categorical variable in classic R ...,False,pandas/core/arrays/categorical.py,205,https://github.com/pandas-dev/pandas/blob/mast...,"[PR09, Parameter ""dtype"" description should fi...",[],,True,Categorical data,Properties,


## 3. Split a list 'error' into separate data columns 'error_code' and 'error_name'

In [29]:
df[['error_code','error_name']] = pd.DataFrame(df.errors.tolist(), index=df.index)
df = df.drop(["errors","index"], axis=1)
df.head(2)

Unnamed: 0,error_code,error_name
0,PR01,Parameters {fastpath} not documented
1,PR09,"Parameter ""dtype"" description should finish wi..."


## 4. Make a table to count the number of error_codes

In [30]:
df_code = df['error_code'].value_counts().reset_index()
df_code.columns = ['error_code','counts']
df_code.head(2)

Unnamed: 0,error_code,counts
0,GL08,513
1,RT03,330


## *Interactive exploration* 

In [31]:
plot_output=widgets.Output()
count_output= widgets.Output()
error_output=widgets.Output()

In [32]:
ALL = 'ALL'
def unique_sorted_values_plus_ALL(array):
    unique = array.unique().tolist()
    unique.sort()
    unique.insert(0, ALL)
    return unique

# 1.1) define a widget
dropdown_code = widgets.Dropdown(options = unique_sorted_values_plus_ALL(df_code.error_code))

# 1.2) Define a qgrid widget

col_opts = { 'editable': False}
qgrid.set_grid_option('maxVisibleRows', 10)
qgrid_widget = qgrid.show_grid(df, 
                               column_options=col_opts,
                               show_toolbar=False)
qgrid_widget.layout = widgets.Layout(width='800px')


In [33]:
# 2) use widget output to update tables/plots
def data_filtering(code):
    count_output.clear_output()
    plot_output.clear_output()
    error_output.clear_output()
    # 1.1 if no filtering
    if (code ==ALL):
        count_filter = df_code
        error_filter = df
    # 1.2 filter by code
    else:
        count_filter = df_code[df_code.error_code ==code]
        error_filter = df[df.error_code ==code]
    # 2.1 plot_output
    with plot_output:
        sns.set(style='whitegrid')
        ax=sns.barplot(x='error_code', y='counts', data=count_filter)
        plt.xticks(rotation=45)
        plt.xlabel('')
        plt.ylabel('Counts')
        plt.show()
    # 2.2 capture table output
    with count_output:
        display(count_filter)
   # 2.3 error_output
    with error_output:
        display(qgrid.show_grid(error_filter, column_options=col_opts,show_toolbar=False))
        #qgrid_widget.observe(on_row_selected, names=['_selected_rows'])

In [34]:
# 3) capture widget output
def dropdown_code_eventhandler(change):
    data_filtering(change.new)

def qgrid_widget_eventhandler(change):
    data_filtering(change.new)    
    
dropdown_code.observe(dropdown_code_eventhandler, names='value')

qgrid_widget.observe(qgrid_widget_eventhandler, names='value')

In [35]:
# 4) Add widget in dashboard layout
input_widgets = widgets.HBox([dropdown_code])

# 5) Create a container for the output
tab = widgets.Tab([ plot_output,count_output, error_output])
tab.set_title(0, 'Bar Plot')
tab.set_title(1, 'Error code Count')
tab.set_title(2, 'Error details')

In [36]:
# 6) Stack a dashboard
dashboard = widgets.VBox([input_widgets, tab])
display(dashboard)

# Select an error code from the dropdown then check the three tabs

VBox(children=(HBox(children=(Dropdown(options=('ALL', 'EX02', 'EX03', 'GL01', 'GL02', 'GL08', 'PR01', 'PR02',…