# Jupyter DataTables

## The new default for `pd.DataFrame` display representation

---

#### The user story

As a data scientist, I work with pandas on daily basis. I use `pd.DataFrame` to interpret the data I work with and to process them. In my typical workflow I often display the dataframe, take a look at the data schema and then I produce multiple plots to check the distribution of the data to have a clearer picture of what I am dealing with. Also, I often have to look up a particular thing in the table.

I want those distribution plots be part of the standard DataFrame and I wanna have the ability to quickly search through the table with minimal effort.

---

This notebook is a proof of concept to target the needs mentioned above.

> Disclaimer: This is a minimal viable product and is not meant for production usage yet. It can't handle data types other than numeric, nor is it performant enough to handle big tables.

#### The future plans:

- provide distribution plots for different data types
- allow custom operations on the table:
    - edit column name
    - edit column type
- handle multi index
- handle nested data
- improve plotting:
    - performance and efficiency
    - customizable
    - resizable
    - dockable
    - draggable to a Jupyter cell (??)
    
- [stretch goal] increased performance and space efficiency by server-side processing -- lazy loading

---

Author: Marek Cermak <macermak@redhat.com>, @AICoE - Project Thoth

In [1]:
%load_ext autoreload
%autoreload 2

In [3]:
import sys
import string

import numpy as np
import pandas as pd

In [4]:
sys.path.insert(0, '../')

In [5]:
df = pd.DataFrame(np.random.randn(50, 8), columns=list(string.ascii_uppercase[:8]))
df_long = pd.DataFrame(np.random.randn(int(1e3), 8), columns=list(string.ascii_uppercase[:8]))
df_wide = pd.DataFrame(np.random.randn(50, 20), columns=list(string.ascii_uppercase[:20]))

df_categorical = pd.DataFrame({'value': np.random.randint(0, 100, 20)})
labels = ["{0} - {1}".format(i, i + 9) for i in range(0, 100, 10)]

df_categorical['group'] = pd.cut(df_categorical.value, range(0, 105, 10), right=False, labels=labels)

---

## Current representation

In [6]:
df

Unnamed: 0,A,B,C,D,E,F,G,H
0,-1.387641,0.00168,2.541543,-0.205194,-0.184832,-1.32184,-0.395254,-0.207631
1,-0.045473,-0.689724,0.803182,-0.296289,0.363889,-0.13218,0.004915,1.534225
2,-2.141958,2.791059,-0.396653,-0.283152,-1.001845,0.296829,0.809306,1.289614
3,1.475928,-1.521082,-0.332848,0.796387,-0.244539,-1.083215,-1.333448,-1.674368
4,0.172255,-0.824094,0.510274,0.316272,0.673901,-0.645349,0.192308,-2.76221
5,-0.660769,-0.618749,-0.228183,-1.240074,-0.390873,0.118437,0.028325,0.08243
6,0.066088,-2.895333,0.178265,0.281045,0.850861,-0.589673,-1.101838,-0.605441
7,-0.353575,0.531989,1.314605,-1.652324,-0.354171,-1.427834,2.810829,1.277398
8,0.245045,1.431689,-0.526771,-1.332303,-0.42347,0.291574,-0.827839,0.675935
9,0.052632,-0.600553,2.035162,-0.585498,-0.742955,-0.039447,0.316938,-2.139446


In [7]:
df_long

Unnamed: 0,A,B,C,D,E,F,G,H
0,-1.068951,-0.558959,-0.193089,-0.056515,0.858794,0.597005,-0.854419,-0.242651
1,0.217881,0.504022,-0.018603,0.389414,-0.919235,-0.689665,-0.153586,-0.530945
2,-1.872059,0.521614,1.143308,0.101467,-0.176636,0.086729,0.242410,0.985890
3,-0.216679,-1.350984,1.670986,0.646324,1.463789,-1.118568,-0.277906,0.479745
4,-0.204969,0.363614,0.904568,-1.709482,-0.466427,-1.188345,0.031215,1.875194
5,-0.474982,-0.111963,0.672744,0.282212,-0.532518,1.056533,0.653970,1.246265
6,0.623217,1.158249,0.539386,-0.031097,-0.230629,0.327830,0.093370,0.231310
7,0.534171,-0.761806,0.696795,-0.786526,2.028410,-0.187803,-1.881655,-0.079302
8,0.712704,1.670465,0.233534,-0.153800,0.378722,1.289319,0.820740,-0.811312
9,-0.266560,-0.826737,-0.647242,0.228642,0.651899,0.837759,-1.097648,1.027989


In [8]:
df_wide

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,0.063306,-1.451538,0.375483,0.174201,-0.462209,-2.189753,1.581039,0.273044,0.286805,-0.169186,0.33289,-1.376522,0.5301,1.759509,0.106319,0.584784,-0.45822,0.563105,-2.305955,-0.056358
1,-0.441485,-0.892144,0.104071,-0.991726,-1.126015,-0.310885,-1.187665,-2.170674,-1.536534,-1.078972,-0.736718,-0.18942,2.440426,-2.001231,-0.490562,0.026942,0.511986,1.410278,0.856018,-0.601754
2,-0.923917,-0.760552,0.222143,-0.679381,-0.977714,-2.425415,1.441401,0.018951,-0.061635,2.151665,2.131184,-1.240317,-0.918542,-1.434341,-1.538199,0.151077,0.831242,-0.039195,0.57889,-1.143139
3,1.298212,0.562894,-1.310549,-1.20984,0.760409,-0.045747,-1.089423,-0.969995,1.10863,0.267845,0.169738,-1.25611,-2.052302,-0.359231,1.391712,-0.344877,1.311804,0.361924,0.150438,-1.312083
4,0.195536,-0.887231,-0.28844,-1.098462,-0.120097,1.545744,-1.893568,-0.152744,-0.271557,-0.212124,0.296302,0.241361,0.139744,-0.22435,0.504702,-0.414299,-1.340768,-0.124488,-1.309629,0.409329
5,0.052864,0.149746,-2.392169,0.429298,0.706621,-0.621133,0.138959,-1.099364,-2.908055,-0.38074,0.568247,0.65149,-0.064576,-0.959669,-0.137596,-0.149061,-0.220932,-0.290534,0.197972,0.909338
6,-0.282018,-0.038337,-0.484505,0.509209,0.564944,-1.237913,-0.748917,0.562781,-0.373542,-1.533425,0.130229,-0.318762,-0.119328,0.171049,0.365783,-0.529594,0.507719,-0.048159,0.236178,1.160976
7,-2.270341,2.737349,0.025553,-0.837492,0.13909,0.352066,1.145277,-1.363005,-0.754646,-1.492028,0.371824,-0.18488,0.840431,0.7006,-1.39855,0.162846,0.612746,0.748211,0.145618,0.780036
8,-1.885,0.154371,-1.320901,2.174169,-1.63999,-0.54857,-0.730147,-0.109406,-0.991644,1.594426,-1.895833,-1.001836,-0.420976,0.415731,0.184703,0.810044,0.657299,-1.093215,0.982135,0.055658
9,0.249527,-0.428519,-0.071289,-0.18832,0.327326,-0.60064,0.461574,-0.033416,0.100173,0.830627,-0.382296,-0.134953,-1.665266,-0.045961,-0.481046,-0.688619,1.351387,-0.42315,-0.44037,-1.479103


In [9]:
df_categorical

Unnamed: 0,value,group
0,41,40 - 49
1,65,60 - 69
2,4,0 - 9
3,79,70 - 79
4,93,90 - 99
5,9,0 - 9
6,32,30 - 39
7,96,90 - 99
8,88,80 - 89
9,33,30 - 39


---

## Representation with Jupyter DataTables

In [10]:
from jupyter_datatables import init_datatables_mode

<JupyterRequire.display.SafeScript object>

In [11]:
init_datatables_mode()

<JupyterRequire.display.SafeScript object>

<JupyterRequire.display.SafeScript object>

<JupyterRequire.display.SafeScript object>

In [12]:
df

<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H
0,-1.387641,0.00168,2.541543,-0.205194,-0.184832,-1.32184,-0.395254,-0.207631
1,-0.045473,-0.689724,0.803182,-0.296289,0.363889,-0.13218,0.004915,1.534225
2,-2.141958,2.791059,-0.396653,-0.283152,-1.001845,0.296829,0.809306,1.289614
3,1.475928,-1.521082,-0.332848,0.796387,-0.244539,-1.083215,-1.333448,-1.674368
4,0.172255,-0.824094,0.510274,0.316272,0.673901,-0.645349,0.192308,-2.76221
5,-0.660769,-0.618749,-0.228183,-1.240074,-0.390873,0.118437,0.028325,0.08243
6,0.066088,-2.895333,0.178265,0.281045,0.850861,-0.589673,-1.101838,-0.605441
7,-0.353575,0.531989,1.314605,-1.652324,-0.354171,-1.427834,2.810829,1.277398
8,0.245045,1.431689,-0.526771,-1.332303,-0.42347,0.291574,-0.827839,0.675935
9,0.052632,-0.600553,2.035162,-0.585498,-0.742955,-0.039447,0.316938,-2.139446


In [13]:
df_long

<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H
0,-1.068951,-0.558959,-0.193089,-0.056515,0.858794,0.597005,-0.854419,-0.242651
1,0.217881,0.504022,-0.018603,0.389414,-0.919235,-0.689665,-0.153586,-0.530945
2,-1.872059,0.521614,1.143308,0.101467,-0.176636,0.086729,0.242410,0.985890
3,-0.216679,-1.350984,1.670986,0.646324,1.463789,-1.118568,-0.277906,0.479745
4,-0.204969,0.363614,0.904568,-1.709482,-0.466427,-1.188345,0.031215,1.875194
5,-0.474982,-0.111963,0.672744,0.282212,-0.532518,1.056533,0.653970,1.246265
6,0.623217,1.158249,0.539386,-0.031097,-0.230629,0.327830,0.093370,0.231310
7,0.534171,-0.761806,0.696795,-0.786526,2.028410,-0.187803,-1.881655,-0.079302
8,0.712704,1.670465,0.233534,-0.153800,0.378722,1.289319,0.820740,-0.811312
9,-0.266560,-0.826737,-0.647242,0.228642,0.651899,0.837759,-1.097648,1.027989


In [14]:
df_wide

<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,0.063306,-1.451538,0.375483,0.174201,-0.462209,-2.189753,1.581039,0.273044,0.286805,-0.169186,0.33289,-1.376522,0.5301,1.759509,0.106319,0.584784,-0.45822,0.563105,-2.305955,-0.056358
1,-0.441485,-0.892144,0.104071,-0.991726,-1.126015,-0.310885,-1.187665,-2.170674,-1.536534,-1.078972,-0.736718,-0.18942,2.440426,-2.001231,-0.490562,0.026942,0.511986,1.410278,0.856018,-0.601754
2,-0.923917,-0.760552,0.222143,-0.679381,-0.977714,-2.425415,1.441401,0.018951,-0.061635,2.151665,2.131184,-1.240317,-0.918542,-1.434341,-1.538199,0.151077,0.831242,-0.039195,0.57889,-1.143139
3,1.298212,0.562894,-1.310549,-1.20984,0.760409,-0.045747,-1.089423,-0.969995,1.10863,0.267845,0.169738,-1.25611,-2.052302,-0.359231,1.391712,-0.344877,1.311804,0.361924,0.150438,-1.312083
4,0.195536,-0.887231,-0.28844,-1.098462,-0.120097,1.545744,-1.893568,-0.152744,-0.271557,-0.212124,0.296302,0.241361,0.139744,-0.22435,0.504702,-0.414299,-1.340768,-0.124488,-1.309629,0.409329
5,0.052864,0.149746,-2.392169,0.429298,0.706621,-0.621133,0.138959,-1.099364,-2.908055,-0.38074,0.568247,0.65149,-0.064576,-0.959669,-0.137596,-0.149061,-0.220932,-0.290534,0.197972,0.909338
6,-0.282018,-0.038337,-0.484505,0.509209,0.564944,-1.237913,-0.748917,0.562781,-0.373542,-1.533425,0.130229,-0.318762,-0.119328,0.171049,0.365783,-0.529594,0.507719,-0.048159,0.236178,1.160976
7,-2.270341,2.737349,0.025553,-0.837492,0.13909,0.352066,1.145277,-1.363005,-0.754646,-1.492028,0.371824,-0.18488,0.840431,0.7006,-1.39855,0.162846,0.612746,0.748211,0.145618,0.780036
8,-1.885,0.154371,-1.320901,2.174169,-1.63999,-0.54857,-0.730147,-0.109406,-0.991644,1.594426,-1.895833,-1.001836,-0.420976,0.415731,0.184703,0.810044,0.657299,-1.093215,0.982135,0.055658
9,0.249527,-0.428519,-0.071289,-0.18832,0.327326,-0.60064,0.461574,-0.033416,0.100173,0.830627,-0.382296,-0.134953,-1.665266,-0.045961,-0.481046,-0.688619,1.351387,-0.42315,-0.44037,-1.479103


In [15]:
df_categorical

<JupyterRequire.display.SafeScript object>

Unnamed: 0,value,group
0,41,40 - 49
1,65,60 - 69
2,4,0 - 9
3,79,70 - 79
4,93,90 - 99
5,9,0 - 9
6,32,30 - 39
7,96,90 - 99
8,88,80 - 89
9,33,30 - 39


---