# Jupyter DataTables

## The new default for `pd.DataFrame` display representation

---

#### The user story

As a data scientist, I work with pandas on daily basis. I use `pd.DataFrame` to interpret the data I work with and to process them. In my typical workflow I often display the dataframe, take a look at the data schema and then I produce multiple plots to check the distribution of the data to have a clearer picture of what I am dealing with. Also, I often have to look up a particular thing in the table.

I want those distribution plots be part of the standard DataFrame and I wanna have the ability to quickly search through the table with minimal effort.

---

This notebook is a proof of concept to target the needs mentioned above.

> Disclaimer: This is a minimal viable product and is not meant for production usage yet. It can't handle data types other than numeric, nor is it performant enough to handle big tables.

#### The future plans:

- provide distribution plots for different data types
- allow custom operations on the table:
    - edit column name
    - edit column type
- handle multi index
- handle nested data
- improve plotting:
    - performance and efficiency
    - customizable
    - resizable
    - dockable
    - draggable to a Jupyter cell (??)
    
- [stretch goal] increased performance and space efficiency by server-side processing -- lazy loading

---

Author: Marek Cermak <macermak@redhat.com>, @AICoE - Project Thoth

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
import string

import numpy as np
import pandas as pd

In [3]:
df = pd.DataFrame(np.random.randn(50, 8), columns=list(string.ascii_uppercase[:8]))
df_long = pd.DataFrame(np.random.randn(int(1e3), 8), columns=list(string.ascii_uppercase[:8]))
df_wide = pd.DataFrame(np.random.randn(50, 20), columns=list(string.ascii_uppercase[:20]))

df_categorical = pd.DataFrame({'value': np.random.randint(0, 100, 20)})
labels = ["{0} - {1}".format(i, i + 9) for i in range(0, 100, 10)]

df_categorical['group'] = pd.cut(df_categorical.value, range(0, 105, 10), right=False, labels=labels)

---

## Current representation

In [4]:
df

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.645382,-0.526629,-0.446546,-1.391443,0.517994,-0.008014,-0.196177,-0.407935
1,1.947752,0.575213,1.494937,-2.496627,0.210121,1.011079,-0.342655,0.604169
2,0.281217,0.00818,-0.597242,0.024333,0.470021,-1.135627,0.748041,0.442317
3,-1.455823,0.126662,0.147838,-0.300983,-1.030387,-1.394062,0.343747,1.091781
4,-0.130862,0.75522,-0.161023,1.768565,1.729666,-1.11351,1.325546,0.234212
5,-0.128258,0.171341,-1.203154,0.125518,-0.123289,0.552264,1.25433,0.764814
6,0.491867,-1.675207,0.523026,-0.712914,-0.937724,-0.231004,0.468913,-0.31722
7,0.719451,-0.643535,0.117195,0.059682,1.601775,0.860636,1.071286,2.355846
8,-0.621368,0.641193,-0.418059,-0.412648,0.602861,-0.220206,-1.139336,-0.572774
9,0.089019,0.63361,-0.642123,0.204597,-0.450456,0.831306,0.540296,-1.295447


In [5]:
df_long

Unnamed: 0,A,B,C,D,E,F,G,H
0,-2.148020,0.015986,-0.937194,0.655630,-0.935284,-0.709641,0.955993,-1.154094
1,-0.990384,0.320376,-0.672133,1.226247,-1.980902,0.843018,-0.026384,-0.974377
2,0.167469,1.110112,0.286171,-0.843244,0.097215,-0.494059,-0.575300,-1.326212
3,-0.896539,-1.964022,-0.328206,1.545272,-0.785414,-0.045947,-0.011429,-0.628766
4,-1.666451,-0.032462,2.205005,1.970247,-0.838021,0.917278,-1.717354,-0.051454
5,-0.581885,-0.058625,1.083366,-0.080940,2.113213,-1.771131,-0.479778,1.274001
6,-1.420564,0.943154,0.477874,-0.689026,-2.031424,-1.167394,1.087777,-1.244272
7,0.437465,-0.879751,1.592701,-0.195208,-0.448796,1.737087,0.330414,-2.630931
8,2.446673,-1.214819,-1.131220,0.143643,-0.811478,0.619517,-0.725569,0.289537
9,-0.858022,-0.701870,0.424797,0.606047,-1.022005,0.973653,0.912708,0.577922


In [6]:
df_wide

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,1.671878,1.507591,-0.941961,-0.479103,1.243062,0.455375,-0.321182,-0.188037,1.028328,0.048454,0.56937,0.100752,0.464104,-1.105648,0.871848,-0.544587,-0.52737,0.347132,-1.73985,0.393005
1,-1.745066,-1.567304,0.451578,0.384173,-1.107897,-1.320351,2.214796,1.243032,-0.716072,0.025445,-1.297891,-0.766635,1.05317,-0.177236,-1.517089,1.088776,-0.333029,-1.811712,-0.741446,1.131565
2,-1.748335,0.042224,1.796467,-0.540295,-0.770122,0.040641,1.816801,0.485515,0.584904,0.148817,1.291578,2.271842,-0.171594,1.492672,0.167538,-0.728533,0.974647,2.047198,0.555557,-0.2338
3,-0.08613,-1.224525,-0.076944,0.155274,-0.835883,0.855933,-0.738008,0.414019,-1.475478,0.29123,-0.541171,-0.495414,0.219529,1.105636,1.149999,-0.178179,-0.773177,0.109775,0.670931,-0.355745
4,-0.692539,1.722372,0.309362,-1.661602,-0.257099,0.136359,0.627208,0.170341,0.17122,1.858071,0.626015,1.121422,0.870297,-0.009155,0.223316,-0.58967,-0.129866,-0.203577,-0.864832,0.112429
5,1.812593,1.465913,0.460435,0.049263,0.416874,-1.580509,-0.077383,-1.471201,-1.19946,1.110643,-0.662805,0.424333,0.240197,-0.949694,0.660267,1.122652,0.941088,-0.34447,-0.786197,0.159047
6,1.546384,-0.278793,-1.006058,0.292483,-0.83524,1.558373,-0.296513,-0.698296,-0.811073,-0.756272,-0.302758,-1.106264,0.771365,0.677972,1.713467,-0.631531,1.59117,0.797464,0.944702,2.505572
7,1.651397,0.725,0.49653,1.949106,0.640879,-0.424837,1.686579,-0.010448,-0.460884,-0.792363,-0.920765,0.459386,0.184069,-0.594438,-0.990372,0.515213,-0.267262,-1.545033,-0.336242,-1.498554
8,-0.948634,0.57221,-0.442911,-1.752355,-1.178765,-0.773791,0.126897,0.408092,-0.451634,-1.13759,1.501604,-0.859308,0.281062,-1.371712,0.84835,0.151553,-1.623255,-1.441761,1.527541,-1.312809
9,1.610827,2.710324,0.592859,1.138579,1.771154,0.257743,0.358769,-1.675696,1.690428,-0.813643,-0.330986,1.1091,0.651395,-0.180437,-0.347066,0.898667,-0.204987,-0.209939,-0.074821,-1.018668


In [7]:
df_categorical

Unnamed: 0,value,group
0,20,20 - 29
1,46,40 - 49
2,69,60 - 69
3,79,70 - 79
4,25,20 - 29
5,5,0 - 9
6,32,30 - 39
7,79,70 - 79
8,60,60 - 69
9,69,60 - 69


---

## Representation with Jupyter DataTables

In [9]:
from jupyter_datatables import init_datatables_mode

<JupyterRequire.display.SafeScript object>

In [10]:
init_datatables_mode()

<JupyterRequire.display.SafeScript object>

<JupyterRequire.display.SafeScript object>

<JupyterRequire.display.SafeScript object>

In [11]:
df

Unnamed: 0,A,B,C,D,E,F,G,H
,,,,,,,,

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.645382,-0.526629,-0.446546,-1.391443,0.517994,-0.008014,-0.196177,-0.407935
1,1.947752,0.575213,1.494937,-2.496627,0.210121,1.011079,-0.342655,0.604169
2,0.281217,0.00818,-0.597242,0.024333,0.470021,-1.135627,0.748041,0.442317
3,-1.455823,0.126662,0.147838,-0.300983,-1.030387,-1.394062,0.343747,1.091781
4,-0.130862,0.75522,-0.161023,1.768565,1.729666,-1.11351,1.325546,0.234212
5,-0.128258,0.171341,-1.203154,0.125518,-0.123289,0.552264,1.25433,0.764814
6,0.491867,-1.675207,0.523026,-0.712914,-0.937724,-0.231004,0.468913,-0.31722
7,0.719451,-0.643535,0.117195,0.059682,1.601775,0.860636,1.071286,2.355846
8,-0.621368,0.641193,-0.418059,-0.412648,0.602861,-0.220206,-1.139336,-0.572774
9,0.089019,0.63361,-0.642123,0.204597,-0.450456,0.831306,0.540296,-1.295447


<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.645382,-0.526629,-0.446546,-1.391443,0.517994,-0.008014,-0.196177,-0.407935
1,1.947752,0.575213,1.494937,-2.496627,0.210121,1.011079,-0.342655,0.604169
2,0.281217,0.00818,-0.597242,0.024333,0.470021,-1.135627,0.748041,0.442317
3,-1.455823,0.126662,0.147838,-0.300983,-1.030387,-1.394062,0.343747,1.091781
4,-0.130862,0.75522,-0.161023,1.768565,1.729666,-1.11351,1.325546,0.234212
5,-0.128258,0.171341,-1.203154,0.125518,-0.123289,0.552264,1.25433,0.764814
6,0.491867,-1.675207,0.523026,-0.712914,-0.937724,-0.231004,0.468913,-0.31722
7,0.719451,-0.643535,0.117195,0.059682,1.601775,0.860636,1.071286,2.355846
8,-0.621368,0.641193,-0.418059,-0.412648,0.602861,-0.220206,-1.139336,-0.572774
9,0.089019,0.63361,-0.642123,0.204597,-0.450456,0.831306,0.540296,-1.295447


In [12]:
df_long

Unnamed: 0,A,B,C,D,E,F,G,H
,,,,,,,,

Unnamed: 0,A,B,C,D,E,F,G,H
0,-2.14802,0.015986,-0.937194,0.65563,-0.935284,-0.709641,0.955993,-1.154094
1,-0.990384,0.320376,-0.672133,1.226247,-1.980902,0.843018,-0.026384,-0.974377
2,0.167469,1.110112,0.286171,-0.843244,0.097215,-0.494059,-0.5753,-1.326212
3,-0.896539,-1.964022,-0.328206,1.545272,-0.785414,-0.045947,-0.011429,-0.628766
4,-1.666451,-0.032462,2.205005,1.970247,-0.838021,0.917278,-1.717354,-0.051454
5,-0.581885,-0.058625,1.083366,-0.08094,2.113213,-1.771131,-0.479778,1.274001
6,-1.420564,0.943154,0.477874,-0.689026,-2.031424,-1.167394,1.087777,-1.244272
7,0.437465,-0.879751,1.592701,-0.195208,-0.448796,1.737087,0.330414,-2.630931
8,2.446673,-1.214819,-1.13122,0.143643,-0.811478,0.619517,-0.725569,0.289537
9,-0.858022,-0.70187,0.424797,0.606047,-1.022005,0.973653,0.912708,0.577922


<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H
0,-2.148020,0.015986,-0.937194,0.655630,-0.935284,-0.709641,0.955993,-1.154094
1,-0.990384,0.320376,-0.672133,1.226247,-1.980902,0.843018,-0.026384,-0.974377
2,0.167469,1.110112,0.286171,-0.843244,0.097215,-0.494059,-0.575300,-1.326212
3,-0.896539,-1.964022,-0.328206,1.545272,-0.785414,-0.045947,-0.011429,-0.628766
4,-1.666451,-0.032462,2.205005,1.970247,-0.838021,0.917278,-1.717354,-0.051454
5,-0.581885,-0.058625,1.083366,-0.080940,2.113213,-1.771131,-0.479778,1.274001
6,-1.420564,0.943154,0.477874,-0.689026,-2.031424,-1.167394,1.087777,-1.244272
7,0.437465,-0.879751,1.592701,-0.195208,-0.448796,1.737087,0.330414,-2.630931
8,2.446673,-1.214819,-1.131220,0.143643,-0.811478,0.619517,-0.725569,0.289537
9,-0.858022,-0.701870,0.424797,0.606047,-1.022005,0.973653,0.912708,0.577922


In [13]:
df_wide

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
,,,,,,,,,,,,,,,,,,,,

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,1.671878,1.507591,-0.941961,-0.479103,1.243062,0.455375,-0.321182,-0.188037,1.028328,0.048454,0.56937,0.100752,0.464104,-1.105648,0.871848,-0.544587,-0.52737,0.347132,-1.73985,0.393005
1,-1.745066,-1.567304,0.451578,0.384173,-1.107897,-1.320351,2.214796,1.243032,-0.716072,0.025445,-1.297891,-0.766635,1.05317,-0.177236,-1.517089,1.088776,-0.333029,-1.811712,-0.741446,1.131565
2,-1.748335,0.042224,1.796467,-0.540295,-0.770122,0.040641,1.816801,0.485515,0.584904,0.148817,1.291578,2.271842,-0.171594,1.492672,0.167538,-0.728533,0.974647,2.047198,0.555557,-0.2338
3,-0.08613,-1.224525,-0.076944,0.155274,-0.835883,0.855933,-0.738008,0.414019,-1.475478,0.29123,-0.541171,-0.495414,0.219529,1.105636,1.149999,-0.178179,-0.773177,0.109775,0.670931,-0.355745
4,-0.692539,1.722372,0.309362,-1.661602,-0.257099,0.136359,0.627208,0.170341,0.17122,1.858071,0.626015,1.121422,0.870297,-0.009155,0.223316,-0.58967,-0.129866,-0.203577,-0.864832,0.112429
5,1.812593,1.465913,0.460435,0.049263,0.416874,-1.580509,-0.077383,-1.471201,-1.19946,1.110643,-0.662805,0.424333,0.240197,-0.949694,0.660267,1.122652,0.941088,-0.34447,-0.786197,0.159047
6,1.546384,-0.278793,-1.006058,0.292483,-0.83524,1.558373,-0.296513,-0.698296,-0.811073,-0.756272,-0.302758,-1.106264,0.771365,0.677972,1.713467,-0.631531,1.59117,0.797464,0.944702,2.505572
7,1.651397,0.725,0.49653,1.949106,0.640879,-0.424837,1.686579,-0.010448,-0.460884,-0.792363,-0.920765,0.459386,0.184069,-0.594438,-0.990372,0.515213,-0.267262,-1.545033,-0.336242,-1.498554
8,-0.948634,0.57221,-0.442911,-1.752355,-1.178765,-0.773791,0.126897,0.408092,-0.451634,-1.13759,1.501604,-0.859308,0.281062,-1.371712,0.84835,0.151553,-1.623255,-1.441761,1.527541,-1.312809
9,1.610827,2.710324,0.592859,1.138579,1.771154,0.257743,0.358769,-1.675696,1.690428,-0.813643,-0.330986,1.1091,0.651395,-0.180437,-0.347066,0.898667,-0.204987,-0.209939,-0.074821,-1.018668


<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,1.671878,1.507591,-0.941961,-0.479103,1.243062,0.455375,-0.321182,-0.188037,1.028328,0.048454,0.56937,0.100752,0.464104,-1.105648,0.871848,-0.544587,-0.52737,0.347132,-1.73985,0.393005
1,-1.745066,-1.567304,0.451578,0.384173,-1.107897,-1.320351,2.214796,1.243032,-0.716072,0.025445,-1.297891,-0.766635,1.05317,-0.177236,-1.517089,1.088776,-0.333029,-1.811712,-0.741446,1.131565
2,-1.748335,0.042224,1.796467,-0.540295,-0.770122,0.040641,1.816801,0.485515,0.584904,0.148817,1.291578,2.271842,-0.171594,1.492672,0.167538,-0.728533,0.974647,2.047198,0.555557,-0.2338
3,-0.08613,-1.224525,-0.076944,0.155274,-0.835883,0.855933,-0.738008,0.414019,-1.475478,0.29123,-0.541171,-0.495414,0.219529,1.105636,1.149999,-0.178179,-0.773177,0.109775,0.670931,-0.355745
4,-0.692539,1.722372,0.309362,-1.661602,-0.257099,0.136359,0.627208,0.170341,0.17122,1.858071,0.626015,1.121422,0.870297,-0.009155,0.223316,-0.58967,-0.129866,-0.203577,-0.864832,0.112429
5,1.812593,1.465913,0.460435,0.049263,0.416874,-1.580509,-0.077383,-1.471201,-1.19946,1.110643,-0.662805,0.424333,0.240197,-0.949694,0.660267,1.122652,0.941088,-0.34447,-0.786197,0.159047
6,1.546384,-0.278793,-1.006058,0.292483,-0.83524,1.558373,-0.296513,-0.698296,-0.811073,-0.756272,-0.302758,-1.106264,0.771365,0.677972,1.713467,-0.631531,1.59117,0.797464,0.944702,2.505572
7,1.651397,0.725,0.49653,1.949106,0.640879,-0.424837,1.686579,-0.010448,-0.460884,-0.792363,-0.920765,0.459386,0.184069,-0.594438,-0.990372,0.515213,-0.267262,-1.545033,-0.336242,-1.498554
8,-0.948634,0.57221,-0.442911,-1.752355,-1.178765,-0.773791,0.126897,0.408092,-0.451634,-1.13759,1.501604,-0.859308,0.281062,-1.371712,0.84835,0.151553,-1.623255,-1.441761,1.527541,-1.312809
9,1.610827,2.710324,0.592859,1.138579,1.771154,0.257743,0.358769,-1.675696,1.690428,-0.813643,-0.330986,1.1091,0.651395,-0.180437,-0.347066,0.898667,-0.204987,-0.209939,-0.074821,-1.018668


In [14]:
df_categorical

Unnamed: 0,value,group
,,

Unnamed: 0,value,group
0,20,20 - 29
1,46,40 - 49
2,69,60 - 69
3,79,70 - 79
4,25,20 - 29
5,5,0 - 9
6,32,30 - 39
7,79,70 - 79
8,60,60 - 69
9,69,60 - 69


<JupyterRequire.display.SafeScript object>

Unnamed: 0,value,group
0,20,20 - 29
1,46,40 - 49
2,69,60 - 69
3,79,70 - 79
4,25,20 - 29
5,5,0 - 9
6,32,30 - 39
7,79,70 - 79
8,60,60 - 69
9,69,60 - 69


---