# Jupyter DataTables

## The new default for `pd.DataFrame` display representation

---

#### The user story

As a data scientist, I work with pandas on daily basis. I use `pd.DataFrame` to interpret the data I work with and to process them. In my typical workflow I often display the dataframe, take a look at the data schema and then I produce multiple plots to check the distribution of the data to have a clearer picture of what I am dealing with. Also, I often have to look up a particular thing in the table.

I want those distribution plots be part of the standard DataFrame and I wanna have the ability to quickly search through the table with minimal effort.

---

This notebook is a proof of concept to target the needs mentioned above.

> Disclaimer: This is a minimal viable product and is not meant for production usage yet. It can't handle data types other than numeric, nor is it performant enough to handle big tables.

#### The future plans:

- provide distribution plots for different data types
- allow custom operations on the table:
    - edit column name
    - edit column type
- handle multi index
- handle nested data
- improve plotting:
    - performance and efficiency
    - customizable
    - resizable
    - dockable
    - draggable to a Jupyter cell (??)
    
- [stretch goal] increased performance and space efficiency by server-side processing -- lazy loading

---

Author: Marek Cermak <macermak@redhat.com>, @AICoE - Project Thoth

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys
import string

import numpy as np
import pandas as pd

In [3]:
df = pd.DataFrame(np.abs(np.random.randn(50, 8)), columns=list(string.ascii_uppercase[:8]))
df_long = pd.DataFrame(np.abs(np.random.randn(int(1e3), 8)), columns=list(string.ascii_uppercase[:8]))
df_wide = pd.DataFrame(np.abs(np.random.randn(50, 20)), columns=list(string.ascii_uppercase[:20]))

df_categorical = pd.DataFrame({'value': np.random.randint(0, 100, 20)})
labels = ["{0} - {1}".format(i, i + 9) for i in range(0, 100, 10)]

df_categorical['group'] = pd.cut(df_categorical.value, range(0, 105, 10), right=False, labels=labels)

---

## Current representation

In [4]:
df

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.89353,0.334553,0.829174,0.338819,0.415184,2.563906,0.523648,0.257251
1,0.073733,0.340222,0.093953,2.146973,0.471288,0.350871,1.714679,0.360557
2,0.091578,0.095608,0.503737,1.350603,0.974528,0.749276,0.30614,0.234267
3,0.112959,1.270283,1.355006,0.553827,0.108723,0.651886,1.060627,0.449334
4,1.605064,0.178375,0.878004,0.18816,0.484747,0.247236,0.389073,0.195853
5,1.377161,0.044384,0.178853,1.488677,0.732402,0.125918,1.063165,0.202576
6,0.246234,0.691378,2.178069,1.48718,1.5754,0.756058,0.099058,0.121068
7,1.337558,0.225129,0.06515,1.328916,0.371052,2.092887,0.325139,0.141329
8,0.905192,0.364664,1.310735,0.298427,1.534068,0.381853,2.049877,1.33415
9,0.515478,0.410634,1.32973,0.701507,0.577934,1.029252,1.578321,0.892999


In [5]:
df_long

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.297453,0.339425,0.533412,0.414606,1.075605,1.235476,0.339821,1.504180
1,0.463972,1.105719,0.048148,0.340818,0.151241,1.061214,0.560218,1.136673
2,0.807792,0.086988,0.434381,0.063948,0.214543,1.179096,0.969358,0.804318
3,1.095990,0.757440,0.778227,0.667873,1.122125,0.458851,1.192450,0.776975
4,0.510133,1.000040,0.048845,1.132135,0.656600,0.348180,1.390052,0.943115
5,0.037705,0.792850,0.204604,0.225851,0.077396,1.453842,0.251941,1.198165
6,0.108859,0.196378,2.145034,0.137387,0.708343,1.254757,1.519275,1.454397
7,0.073694,0.771864,0.933165,1.465013,0.283318,0.253700,0.145826,0.754317
8,1.352285,0.499429,0.311249,0.350071,0.115166,0.979182,1.504776,1.442912
9,1.370826,2.100144,1.685964,0.409433,0.923152,0.531344,0.069639,0.156318


In [6]:
df_wide

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,0.340223,0.691069,0.684853,0.179351,0.985787,0.035214,0.328941,1.20497,0.496126,0.231602,0.077502,0.919869,1.555035,1.345626,0.16246,0.682551,0.589835,0.103338,0.998209,0.92931
1,1.877152,1.236446,0.691704,0.690635,0.35135,1.889193,1.196979,2.134407,0.716179,0.28107,1.144873,0.580716,0.877991,0.208387,0.180903,0.326048,0.435733,0.284778,0.149619,0.828472
2,1.266775,0.865961,0.933465,0.767611,0.093558,1.307705,1.815062,0.123233,0.051876,1.907347,2.200378,0.526153,0.2291,0.748671,0.661964,0.032849,0.736517,0.680796,0.049261,0.384393
3,0.407203,0.547513,1.099287,0.062962,1.888982,1.155827,0.151207,0.156904,0.618476,0.538903,0.732939,1.42459,0.844906,0.082497,1.527106,1.377885,0.645445,0.55381,0.308632,0.389647
4,0.834319,0.128631,0.360156,0.388113,0.337021,0.127364,0.517514,1.590474,0.953724,0.427142,0.395612,0.600815,0.28353,0.018645,2.610919,0.715912,1.127252,1.81538,0.512092,0.108303
5,0.467251,0.804182,0.130705,0.016895,0.087192,0.550953,0.64318,0.722121,0.869242,0.735046,0.566193,1.736687,0.54556,1.055274,2.563824,0.217557,0.30605,0.495155,0.446164,0.495485
6,0.274307,1.217395,0.840115,1.826634,0.082526,0.263006,0.962455,0.980104,0.100366,0.158333,0.539655,0.148675,0.525501,0.467692,0.327091,0.072088,0.918168,1.06143,0.750328,0.360198
7,1.428776,2.560946,0.17928,0.756677,1.666731,1.849918,0.099754,1.440486,0.391917,0.386815,0.00967,1.345843,1.291494,1.218103,0.909877,0.719615,1.826314,0.115976,0.491782,0.146254
8,1.900532,0.38753,0.027075,0.093853,0.154807,1.22074,0.303767,1.960354,1.854403,2.443574,0.375891,0.193468,0.279368,0.827318,0.833969,0.68924,0.470339,0.20634,1.182942,0.10242
9,1.122168,1.627067,1.579654,1.691007,0.441323,1.448297,1.365721,0.451858,0.464603,0.570157,0.085834,0.393213,0.770736,0.31503,0.75991,0.225146,0.039126,1.982903,0.384306,0.354586


In [7]:
df_categorical

Unnamed: 0,value,group
0,33,30 - 39
1,72,70 - 79
2,79,70 - 79
3,98,90 - 99
4,49,40 - 49
5,66,60 - 69
6,5,0 - 9
7,58,50 - 59
8,79,70 - 79
9,81,80 - 89


---

## Representation with Jupyter DataTables

In [8]:
from jupyter_datatables import init_datatables_mode

init_datatables_mode()

<JupyterRequire.display.SafeScript object>

<JupyterRequire.display.SafeScript object>

In [9]:
df

Unnamed: 0,A,B,C,D,E,F,G,H
,,,,,,,,

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.89353,0.334553,0.829174,0.338819,0.415184,2.563906,0.523648,0.257251
1,0.073733,0.340222,0.093953,2.146973,0.471288,0.350871,1.714679,0.360557
2,0.091578,0.095608,0.503737,1.350603,0.974528,0.749276,0.30614,0.234267
3,0.112959,1.270283,1.355006,0.553827,0.108723,0.651886,1.060627,0.449334
4,1.605064,0.178375,0.878004,0.18816,0.484747,0.247236,0.389073,0.195853
5,1.377161,0.044384,0.178853,1.488677,0.732402,0.125918,1.063165,0.202576
6,0.246234,0.691378,2.178069,1.48718,1.5754,0.756058,0.099058,0.121068
7,1.337558,0.225129,0.06515,1.328916,0.371052,2.092887,0.325139,0.141329
8,0.905192,0.364664,1.310735,0.298427,1.534068,0.381853,2.049877,1.33415
9,0.515478,0.410634,1.32973,0.701507,0.577934,1.029252,1.578321,0.892999


<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.89353,0.334553,0.829174,0.338819,0.415184,2.563906,0.523648,0.257251
1,0.073733,0.340222,0.093953,2.146973,0.471288,0.350871,1.714679,0.360557
2,0.091578,0.095608,0.503737,1.350603,0.974528,0.749276,0.30614,0.234267
3,0.112959,1.270283,1.355006,0.553827,0.108723,0.651886,1.060627,0.449334
4,1.605064,0.178375,0.878004,0.18816,0.484747,0.247236,0.389073,0.195853
5,1.377161,0.044384,0.178853,1.488677,0.732402,0.125918,1.063165,0.202576
6,0.246234,0.691378,2.178069,1.48718,1.5754,0.756058,0.099058,0.121068
7,1.337558,0.225129,0.06515,1.328916,0.371052,2.092887,0.325139,0.141329
8,0.905192,0.364664,1.310735,0.298427,1.534068,0.381853,2.049877,1.33415
9,0.515478,0.410634,1.32973,0.701507,0.577934,1.029252,1.578321,0.892999


In [10]:
df_long

Unnamed: 0,A,B,C,D,E,F,G,H
,,,,,,,,

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.297453,0.339425,0.533412,0.414606,1.075605,1.235476,0.339821,1.50418
1,0.463972,1.105719,0.048148,0.340818,0.151241,1.061214,0.560218,1.136673
2,0.807792,0.086988,0.434381,0.063948,0.214543,1.179096,0.969358,0.804318
3,1.09599,0.75744,0.778227,0.667873,1.122125,0.458851,1.19245,0.776975
4,0.510133,1.00004,0.048845,1.132135,0.6566,0.34818,1.390052,0.943115
5,0.037705,0.79285,0.204604,0.225851,0.077396,1.453842,0.251941,1.198165
6,0.108859,0.196378,2.145034,0.137387,0.708343,1.254757,1.519275,1.454397
7,0.073694,0.771864,0.933165,1.465013,0.283318,0.2537,0.145826,0.754317
8,1.352285,0.499429,0.311249,0.350071,0.115166,0.979182,1.504776,1.442912
9,1.370826,2.100144,1.685964,0.409433,0.923152,0.531344,0.069639,0.156318


<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H
0,0.297453,0.339425,0.533412,0.414606,1.075605,1.235476,0.339821,1.504180
1,0.463972,1.105719,0.048148,0.340818,0.151241,1.061214,0.560218,1.136673
2,0.807792,0.086988,0.434381,0.063948,0.214543,1.179096,0.969358,0.804318
3,1.095990,0.757440,0.778227,0.667873,1.122125,0.458851,1.192450,0.776975
4,0.510133,1.000040,0.048845,1.132135,0.656600,0.348180,1.390052,0.943115
5,0.037705,0.792850,0.204604,0.225851,0.077396,1.453842,0.251941,1.198165
6,0.108859,0.196378,2.145034,0.137387,0.708343,1.254757,1.519275,1.454397
7,0.073694,0.771864,0.933165,1.465013,0.283318,0.253700,0.145826,0.754317
8,1.352285,0.499429,0.311249,0.350071,0.115166,0.979182,1.504776,1.442912
9,1.370826,2.100144,1.685964,0.409433,0.923152,0.531344,0.069639,0.156318


In [11]:
df_wide

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
,,,,,,,,,,,,,,,,,,,,

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,0.340223,0.691069,0.684853,0.179351,0.985787,0.035214,0.328941,1.20497,0.496126,0.231602,0.077502,0.919869,1.555035,1.345626,0.16246,0.682551,0.589835,0.103338,0.998209,0.92931
1,1.877152,1.236446,0.691704,0.690635,0.35135,1.889193,1.196979,2.134407,0.716179,0.28107,1.144873,0.580716,0.877991,0.208387,0.180903,0.326048,0.435733,0.284778,0.149619,0.828472
2,1.266775,0.865961,0.933465,0.767611,0.093558,1.307705,1.815062,0.123233,0.051876,1.907347,2.200378,0.526153,0.2291,0.748671,0.661964,0.032849,0.736517,0.680796,0.049261,0.384393
3,0.407203,0.547513,1.099287,0.062962,1.888982,1.155827,0.151207,0.156904,0.618476,0.538903,0.732939,1.42459,0.844906,0.082497,1.527106,1.377885,0.645445,0.55381,0.308632,0.389647
4,0.834319,0.128631,0.360156,0.388113,0.337021,0.127364,0.517514,1.590474,0.953724,0.427142,0.395612,0.600815,0.28353,0.018645,2.610919,0.715912,1.127252,1.81538,0.512092,0.108303
5,0.467251,0.804182,0.130705,0.016895,0.087192,0.550953,0.64318,0.722121,0.869242,0.735046,0.566193,1.736687,0.54556,1.055274,2.563824,0.217557,0.30605,0.495155,0.446164,0.495485
6,0.274307,1.217395,0.840115,1.826634,0.082526,0.263006,0.962455,0.980104,0.100366,0.158333,0.539655,0.148675,0.525501,0.467692,0.327091,0.072088,0.918168,1.06143,0.750328,0.360198
7,1.428776,2.560946,0.17928,0.756677,1.666731,1.849918,0.099754,1.440486,0.391917,0.386815,0.00967,1.345843,1.291494,1.218103,0.909877,0.719615,1.826314,0.115976,0.491782,0.146254
8,1.900532,0.38753,0.027075,0.093853,0.154807,1.22074,0.303767,1.960354,1.854403,2.443574,0.375891,0.193468,0.279368,0.827318,0.833969,0.68924,0.470339,0.20634,1.182942,0.10242
9,1.122168,1.627067,1.579654,1.691007,0.441323,1.448297,1.365721,0.451858,0.464603,0.570157,0.085834,0.393213,0.770736,0.31503,0.75991,0.225146,0.039126,1.982903,0.384306,0.354586


<JupyterRequire.display.SafeScript object>

Unnamed: 0,A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T
0,0.340223,0.691069,0.684853,0.179351,0.985787,0.035214,0.328941,1.20497,0.496126,0.231602,0.077502,0.919869,1.555035,1.345626,0.16246,0.682551,0.589835,0.103338,0.998209,0.92931
1,1.877152,1.236446,0.691704,0.690635,0.35135,1.889193,1.196979,2.134407,0.716179,0.28107,1.144873,0.580716,0.877991,0.208387,0.180903,0.326048,0.435733,0.284778,0.149619,0.828472
2,1.266775,0.865961,0.933465,0.767611,0.093558,1.307705,1.815062,0.123233,0.051876,1.907347,2.200378,0.526153,0.2291,0.748671,0.661964,0.032849,0.736517,0.680796,0.049261,0.384393
3,0.407203,0.547513,1.099287,0.062962,1.888982,1.155827,0.151207,0.156904,0.618476,0.538903,0.732939,1.42459,0.844906,0.082497,1.527106,1.377885,0.645445,0.55381,0.308632,0.389647
4,0.834319,0.128631,0.360156,0.388113,0.337021,0.127364,0.517514,1.590474,0.953724,0.427142,0.395612,0.600815,0.28353,0.018645,2.610919,0.715912,1.127252,1.81538,0.512092,0.108303
5,0.467251,0.804182,0.130705,0.016895,0.087192,0.550953,0.64318,0.722121,0.869242,0.735046,0.566193,1.736687,0.54556,1.055274,2.563824,0.217557,0.30605,0.495155,0.446164,0.495485
6,0.274307,1.217395,0.840115,1.826634,0.082526,0.263006,0.962455,0.980104,0.100366,0.158333,0.539655,0.148675,0.525501,0.467692,0.327091,0.072088,0.918168,1.06143,0.750328,0.360198
7,1.428776,2.560946,0.17928,0.756677,1.666731,1.849918,0.099754,1.440486,0.391917,0.386815,0.00967,1.345843,1.291494,1.218103,0.909877,0.719615,1.826314,0.115976,0.491782,0.146254
8,1.900532,0.38753,0.027075,0.093853,0.154807,1.22074,0.303767,1.960354,1.854403,2.443574,0.375891,0.193468,0.279368,0.827318,0.833969,0.68924,0.470339,0.20634,1.182942,0.10242
9,1.122168,1.627067,1.579654,1.691007,0.441323,1.448297,1.365721,0.451858,0.464603,0.570157,0.085834,0.393213,0.770736,0.31503,0.75991,0.225146,0.039126,1.982903,0.384306,0.354586


In [12]:
df_categorical

Unnamed: 0,value,group
,,

Unnamed: 0,value,group
0,33,30 - 39
1,72,70 - 79
2,79,70 - 79
3,98,90 - 99
4,49,40 - 49
5,66,60 - 69
6,5,0 - 9
7,58,50 - 59
8,79,70 - 79
9,81,80 - 89


<JupyterRequire.display.SafeScript object>

Unnamed: 0,value,group
0,33,30 - 39
1,72,70 - 79
2,79,70 - 79
3,98,90 - 99
4,49,40 - 49
5,66,60 - 69
6,5,0 - 9
7,58,50 - 59
8,79,70 - 79
9,81,80 - 89


---