Panel and hvPlot: A high-level Data Visualization for Python


# hvPlot Gallery

In [1]:
#importing the libraries
import panel as pn
import pandas as pd
import hvplot.pandas

pn.extension()

Import data

In [2]:
#importing the dataset
df = pd.read_csv('autompg.csv')
#checking the head of the data
df.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,mfr
0,18.0,8,307.0,130,3504,12.0,70,North America,chevrolet chevelle malibu,chevrolet
1,15.0,8,350.0,165,3693,11.5,70,North America,buick skylark 320,buick
2,18.0,8,318.0,150,3436,11.0,70,North America,plymouth satellite,plymouth
3,16.0,8,304.0,150,3433,12.0,70,North America,amc rebel sst,amc
4,17.0,8,302.0,140,3449,10.5,70,North America,ford torino,ford


Scatter plot

In [3]:
df.hvplot(kind = 'scatter',
          x = 'mpg',
          y = 'displ',
          size = 50,
          color = 'origin',
          legend = 'top_right',
          alpha = 0.8,
          height = 400,
          width = 500,
          xlim = (0, 50),
          ylim = (10, 500),
          hover_cols = ['accel', 'name'],
          title = 'mpg VS displacement',
          xlabel = 'X',
          ylabel = 'Y',
          #fontsize = 20,
          fontscale = 1.4,
          colormap = 'Dark2' #https://docs.bokeh.org/en/latest/docs/reference/palettes.html
         ).opts(xticks = [10, 25, 30, 35])

# legend: ['top_right', 'top_left', 'bottom_left', 'bottom_right', 'right', 'left', 'top', 'bottom']

Line chart

In [6]:
df.hvplot.line(y = ['mpg', 'accel'],
              legend = 'left',
              xlim = [-20, 400],
              xlabel = 'data').opts(xticks = [50, 200, 250])

Bar chart
A bar plot represents categorical data with rectangular bars with heights proportional to the numerical values that they represent. The x-axis represents the categories and the y axis represents the numerical value scale. The bars are of equal width which allows for instant comparison of data.

In [7]:
df.loc[0:100, 'mpg'].hvplot.bar(height = 400,
                               legend = 'top_left',
                               rot = 60,
                               ylabel = 'Y',
                               bar_width = 0.8,
                               title = 'X vs Y')

In [8]:
# agg. or sum.
table = df.groupby('mfr')[['mpg','cyl', 'displ', 'accel']].mean()
table

Unnamed: 0_level_0,mpg,cyl,displ,accel
mfr,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
amc,18.07037,6.444444,253.851852,15.07037
audi,26.714286,4.285714,111.857143,15.942857
bmw,23.75,4.0,121.0,12.65
buick,19.182353,6.470588,272.941176,14.7
cadillac,19.75,8.0,350.0,14.75
capri,25.0,4.0,140.0,14.9
chevrolet,20.219149,6.170213,239.446809,15.397872
chrysler,17.266667,7.0,330.166667,13.3
datsun,31.113043,4.26087,103.26087,16.408696
dodge,22.060714,6.0,223.125,14.460714


In [9]:
#table.hvplot.bar(y = 'mpg', rot = 90)
table.hvplot.bar(y = ['mpg', 'accel'], rot = 90)

In [10]:
table2 = df.groupby(['yr', 'origin'])[['cyl']].mean()
table2.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,cyl
yr,origin,Unnamed: 2_level_1
70,Asia,4.0
70,Europe,4.0
70,North America,7.636364
71,Asia,4.0
71,Europe,4.0


In [11]:
table2.hvplot.bar(legend = 'top_left',
                  rot = 90,
                 bar_width = 0.5,
                 fontscale = 1.5,
                 stacked = True,
                 #color = ['black', 'silver', 'green'],
                 colormap = 'dark2')

In [None]:
Barh chart

In [12]:
table2.hvplot.barh(legend = 'top_left',
                  rot = 90,
                 bar_width = 0.5,
                 fontscale = 1.5,
                 stacked = True,
                 #color = ['black', 'silver', 'green'],
                 colormap = 'dark2')

In [13]:
table3 = df.groupby(['origin','mfr'])['mpg'].mean().to_frame()
table3.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,mpg
origin,mfr,Unnamed: 2_level_1
Asia,datsun,31.113043
Asia,honda,33.761538
Asia,mazda,30.058333
Asia,nissan,36.0
Asia,subaru,30.525


In [14]:
table3.hvplot.barh('mfr', 'mpg', by = 'origin', stacked = True, height = 600)

Histogram
hist is often a good way to start looking at continous data to get a sense of the distribution. Similar methods include kde

In [15]:
df.hvplot.hist('weight',
              width = 500,
              height = 300,
              color = 'blue',
              bins = 20)

In [16]:
df.hvplot.hist('weight',
              width = 200,
              height = 300,
              color = 'blue',
              bins = 20,
              by = 'origin',
              subplots = True)

KDE
Kernel density estimate (KDE) provides a mechanism for showing the distribution and spread of the data

In [17]:
df.hvplot.kde('weight',
              width = 500,
              height = 300,
              color = 'blue',
              title = 'weight distribution',
              )

In [18]:
df.hvplot.kde('weight',
              width = 500,
              height = 300,
              by = 'origin',
              ylabel = 'Freq.',
              title = 'weight distribution')

In [19]:
df.hvplot.kde('weight',
              width = 250,
              height = 300,
              by = 'origin',
              ylabel = 'Freq.',
              title = 'weight distribution',
              subplots = True,
             )

In [20]:

import numpy as np
np.linspace(0, 255,len(df['mfr'].unique()))

array([  0.        ,   8.79310345,  17.5862069 ,  26.37931034,
        35.17241379,  43.96551724,  52.75862069,  61.55172414,
        70.34482759,  79.13793103,  87.93103448,  96.72413793,
       105.51724138, 114.31034483, 123.10344828, 131.89655172,
       140.68965517, 149.48275862, 158.27586207, 167.06896552,
       175.86206897, 184.65517241, 193.44827586, 202.24137931,
       211.03448276, 219.82758621, 228.62068966, 237.4137931 ,
       246.20689655, 255.        ])

In [21]:
import numpy as np
import colorcet as cc

categorical_colormap = [ cc.rainbow[int(i)] for i in np.linspace(0, 255, len(df['mfr'].unique()))]

df.hvplot.kde(y = 'mpg',
              by = 'mfr',
              cmap = categorical_colormap,
              legend = 'right',
              height = 300
             )

Boxplot
box plots are most useful when grouped by additional dimensions (5-number summary).

In [22]:
df.hvplot.box(y = 'mpg',
              height = 300,
              width = 300
             )

In [23]:
df.hvplot.box(y = 'mpg',
              height = 300,
              width = 300,
              by = 'yr',
             )

In [24]:

boxplot = df.hvplot.box(y = 'mpg', height = 300, width = 700, by = 'yr', legend = False)
boxplot * df.hvplot.scatter(y = 'mpg', x = 'yr', c = 'orange').opts(jitter = 0.5)

In [25]:
df.hvplot.box(y = 'accel', groupby = 'mfr', by = 'yr', ylabel = 'mpg info',
             width = 400)

Heatmap
heatmap can be data has two categorical axes. Data can either be pre-computed into a matrix, or it can be 1d and the aggregation will be computed when rendering.

In [26]:
from bokeh.sampledata.unemployment1948 import data
data = data.set_index('Year').drop('Annual', axis = 1).transpose()
data.head()

Year,1948,1949,1950,1951,1952,1953,1954,1955,1956,1957,...,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016
Jan,4.0,5.0,7.6,4.4,3.7,3.4,5.7,5.8,4.7,4.9,...,5.0,5.4,8.5,10.6,9.8,8.8,8.5,7.0,6.1,5.3
Feb,4.7,5.8,7.9,4.2,3.8,3.2,6.3,5.7,4.8,4.7,...,4.9,5.2,8.9,10.4,9.5,8.7,8.1,7.0,5.8,5.2
Mar,4.5,5.6,7.1,3.8,3.3,2.9,6.4,5.2,4.7,4.3,...,4.5,5.2,9.0,10.2,9.2,8.4,7.6,6.8,5.6,5.1
Apr,4.0,5.4,6.0,3.2,3.0,2.8,6.1,4.9,4.1,4.0,...,4.3,4.8,8.6,9.5,8.7,7.7,7.1,5.9,5.1,4.7
May,3.4,5.7,5.3,2.9,2.9,2.5,5.7,4.2,4.2,3.9,...,4.3,5.2,9.1,9.3,8.7,7.9,7.3,6.1,5.3,4.5


In [27]:
data.hvplot.heatmap(x = 'columns',
                   y = 'index',
                   title = 'US Employment 1948-2016',
                   xaxis = 'top',
                   rot = 70,
                   fontsize = 12,
                   width = 700,
                   height = 300).opts(toolbar=None)

Violin
violin plots are similar to box plots, but provide a better sense of the distribution of data.

In [28]:
df.hvplot.violin(y = 'mpg', by = 'yr',
                ylabel = 'mpg',
                width = 500,
                height = 500, padding = 0.5)

In [29]:
violin = df.hvplot.violin(y = 'mpg', by = 'yr',
                 c = 'yr',
                ylabel = 'mpg',
                width = 500,
                height = 500, padding = 0.5)

scatter = df.hvplot.scatter(y = 'mpg', x = 'yr', c = 'orange').opts(jitter = 0.5)


violin * scatter

#TABLE

table allows the creation of a holoviews Table element with all the options available on that. It can be very useful especially when paired with other visualizations.



In [31]:
df.hvplot.table(columns = ['origin', 'name', 'yr'], sortable = True, selectable=True)