In [1]:
import pandas as pd
import numpy as np
import string

# Load Data

In [2]:
# load example mtcars data 
df = pd.read_csv('mtcars.csv')
print(df.shape)
df.head()

(32, 11)


Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


# Create Simple Barplot

In [3]:
from rapid_plotly import barplot

A simple barplot can be created by passing three dataframes to `barplot.create_graph`:

* `in_data` - the height of the bars
* `names` - a dataframe containing the hover text for the bars, otherwise identical to `in_data`
* `errors` - a dataframe containing the half-height of the error bars, otherwise identical to `in_data`

In [4]:
# create graph data 
in_data = pd.DataFrame(df.groupby('cyl').mean()['mpg'])
in_data.index = in_data.index.astype(int).astype(str) + ' Cylinders'
print('main data:')
display(in_data.head())

# generate names
l = string.ascii_lowercase
names = in_data.copy()
f = lambda: l[np.random.randint(0,len(l))]

for x in names.index:
    names.loc[x, 'mpg'] = f()+f()
    
print('names:')
display(names.head())

# generate error bars data
errors = in_data.copy()
errors['mpg'] = 2.5
print('errors:')
display(errors.head())

main data:


Unnamed: 0,mpg
4 Cylinders,26.663636
6 Cylinders,19.742857
8 Cylinders,15.1


names:


Unnamed: 0,mpg
4 Cylinders,dd
6 Cylinders,ol
8 Cylinders,ho


errors:


Unnamed: 0,mpg
4 Cylinders,2.5
6 Cylinders,2.5
8 Cylinders,2.5


A simple graph can be quickly created to verify that the data is as expected:

In [5]:
# create input data for graph 
args = dict(
    in_data=in_data
)

# view plot inline 
fig = barplot.create_graph(**args)

Now that the graph appears to be as expected, more characteristics can be added by adding them to `args`:

In [6]:
# add additional characteristics to graph
title = '<b>Fuel Mileage by Number of Cylinders</b>'
title += '<br><i>for mtcars data</i>'
args['title'] = title
args['names'] = names
args['errors'] = errors
args['xlab'] = 'Number of Cylinders'
args['ylab'] = 'Miles Per Gallon'
args['annotations'] = [{'text':'More cylinders correlates to better<br> fuel mileage', 'x':1.5, 'y':24.5, 'showarrow':False}]

Preview the results again:

In [7]:
# view plot inline 
fig = barplot.create_graph(**args)

After creating a graph, it can be written to an html file by passing `fig` to `barplot.output_graph`:

In [8]:
# write graph to html file 
fp = 'barplot-example.html'
barplot.output_graph(fig, fp)

In [9]:
# write graph to png file 
fp = 'barplot-example.png'
barplot.output_graph(fig, fp)

For any of the `create_graph` functions in `rapid_plotly`, a detailed docstring is included:

In [18]:
from IPython.display import Markdown
display(Markdown(barplot.create_graph.__doc__))

Creates grouped barplot

    The `in_data` arg must be a dataframe in the form:

                           bar1           bar2
    x_category1            3.13          15.84
    x_category2            6.67           6.08

    Where `in_data.index` (x_category1, x_category2 above) are the x
    labels on the graph, and each column (bar1, bar2 in the above
    example) is a bargroup. Each cell represents the height of the bar.

    In the above example, the 3.13 and 15.84 bars would be grouped 
    together and the 6.67 and 6.08 bars would be grouped together. Bar1
    would be on the left of each bar group.

    Note that `in_data` can be passed with a single column to create
    a normal barplot (as opposed to a grouped barplot).

    Error bars can be easily created by passing a DataFrame similar
    to `in_data` where each cell represents the "+/-" value for the
    error bar, e.g. if the value is "1.5", the error bar will range 
    "1.5" units (in terms of the y-axis) above and "1.5" units below
    the bar.

    The `aux_traces` arg can be used to create an overlaying trace, such
    as a line graph overlaying the bars. To plot `aux_traces` on a 
    secondary axis, the `yaxis` parameter of the trace must be set to
    'y2' and the `alt_y` arg must be passed to this function as `True`.

    Parameters
    ----------
    in_data : DataFrame of traces. Data in columns will be used as
    traces and index will be used as x-axis.

    names : DataFrame of hovertext values. Should mirror `in_data` in 
    form.

    colors : dict of colors for traces. dict keys should mirror
    `in_data` columns. Can use hex colors or keyword colors, see Plotly
    specifications on colors for keyword options.

    errors : a DataFrame of error values for each bar. Should mirror 
    `in_data` in form. Each cell in `errors` will be the "+/-" value 
    for the error bars. 

    error_barwidth : the width, in pixels, of the error bar. 

    title : title for top of graph. Use '<br>' tag for subtitle. Tags
    '<i>' and '<b>' can be used for italics and bold, respectively.

    xlab : label for x-axis. 

    ylab : label for y-ayis. 

    y2lab : label for aly y axis.

    hoverinfo : either None or 'text'. Passed to the trace in
    `create_trace`. By default, Plotly displays the value upon hover,
    passing 'text' here will show only the value configured in the
    `names` DataFrame.

    annotations : a list of dicts for annotations. For example:

        ```
        [{'text':'More cylinders correlates to better<br> fuel mileage',
        'x':1.5, 'y':24.5, 'showarrow':False}]
        ```

    The 'x' and 'y' keys are coordinates in terms of the graph axes, and
    the 'text' key is the annotation text.

    filepath : optional, if included will write image to file. Can be
    written as a .html file or a .png file.

    aux_traces : list of traces to be added to the graph data. Allows
    for customization of additional traces beyond what default
    functionality provides. 

    layout : allows for a customized layout. Default layout is in the
    helpers module, can be accessed:

        ```
        from rapid_plotly import helpers
        layout = helpers.layout
        ```

    Here is the default layout: 

        ```
        {'hovermode': 'closest', 'plot_bgcolor': 'rgb(229, 229, 229)',
         'title': 'title', 'xaxis': {'gridcolor': 'rgb(255,255,255)',
          'tickangle': 30, 'title': 'xlab',
          'zerolinecolor': 'rgb(255,255,255)'},
          'yaxis': {'gridcolor': 'rgb(255,255,255)', 'title': 'ylab',
          'zerolinecolor': 'rgb(255,255,255)'}}
        ```

    alt_y : bool, used to place aux_traces on alternate axis. 

    

# Create Grouped Barplot

A grouped barplot compares the effect of the same treatment across multiple categories.

The next graph will show the relationship between fuel mileage, the number of cylinders and the number of gears for cars.

For grouped barplots, dataframes can be passed where the rows represent the x-axis categories and the columns represent each bar in each category. 

In [19]:
# create data for grouped barplot
in_data = df.groupby(['cyl', 'gear']).mean()[['mpg']].reset_index()

in_data = pd.pivot_table(
    data=in_data,
    columns=['gear'],
    index=['cyl']
)
in_data.columns = ['3 gears', '4 gears', '5 gears']

in_data = in_data.fillna(in_data.loc[8].mean())
in_data.index = in_data.index.astype(str) + ' Cylinders'
print('main data:')
display(in_data)

# create names
names = in_data.copy()

for row in names.index:
    for col in names.columns:
        names.loc[row, col] = f()+f()
        
print('names:')
display(names)

# create error bars 
errors = in_data.copy()

for col in errors.columns:
    errors[col] = 0.75
    
print('errors:')
display(errors)

main data:


Unnamed: 0,3 gears,4 gears,5 gears
4 Cylinders,21.5,26.925,28.2
6 Cylinders,19.75,19.75,19.7
8 Cylinders,15.05,15.225,15.4


names:


Unnamed: 0,3 gears,4 gears,5 gears
4 Cylinders,px,se,ci
6 Cylinders,lj,rd,rx
8 Cylinders,pt,nj,xx


errors:


Unnamed: 0,3 gears,4 gears,5 gears
4 Cylinders,0.75,0.75,0.75
6 Cylinders,0.75,0.75,0.75
8 Cylinders,0.75,0.75,0.75


Get a quick visual of the data:

In [20]:
# create args
args = {'in_data':in_data}
fig = barplot.create_graph(**args)

Now add more detail by adding elements to `args`:

In [21]:
# add additional characteristics to graph
title = '<b>Fuel Mileage by Number of Cylinders and Number of Gears</b>'
title += '<br><i>for mtcars data</i>'
args['title'] = title
args['names'] = names
args['errors'] = errors
args['xlab'] = 'Number of Cylinders'
args['ylab'] = 'Miles Per Gallon'
args['annotations'] = [{'text':'More gears correlate to better fuel<br> mileage for cars with 4 cylinder engines',
                        'x':0.45, 'y':28, 'ax':150, 'ay':25, 'showarrow':True}]

fig = barplot.create_graph(**args)

This looks okay with the default colors, but the main point of the graph would be more immediately visible if the "4 Cylinder" bargroup was a different shade of color than the other bargroups. 

New colors were generated using [coolors.co](https://coolors.co) and tints of the new colors were created on [color-hex.com](www.color-hex.com).

A new dataframe `colors` can be created in a similar fashion to `in_data`, `names` and `errors`:

In [22]:
# create new colors
colors = pd.DataFrame({
           '3 gears':['#9195b2']*3,
           '4 gears':['#969694']*3,
           '5 gears':['#c1c991']*3
       }, index=in_data.index)

colors.loc['4 Cylinders'] = ['#232C65', '#2D2D2A', '#849324']

args['colors'] = colors
print('colors:')
colors

colors:


Unnamed: 0,3 gears,4 gears,5 gears
4 Cylinders,#232C65,#2D2D2A,#849324
6 Cylinders,#9195b2,#969694,#c1c991
8 Cylinders,#9195b2,#969694,#c1c991


In [23]:
fig = barplot.create_graph(**args)

In [24]:
# write graph to html file 
fp = 'grouped-barplot-example.html'
barplot.output_graph(fig, fp)

In [25]:
# write graph to png file 
fp = 'grouped-barplot-example.png'
barplot.output_graph(fig, fp)

# Create Scatterplot

In [28]:
from rapid_plotly import scatterplot

First, set up some data which can be used to create an example scatterplot:

In [29]:
# create main data 
sl = df[['hp', 'mpg']].copy()
x_data = sl[['hp']].copy()
y_data = sl[['mpg']].copy()

print('x values:')
display(x_data.head())

print('y values:')
display(y_data.head())

# create names
n = (df[['cyl', 'carb', 'gear', 'wt']].apply(
      lambda x: '# Cylinders: %s<br># Carbs: %s<br># Gears: %s<br>Weight: %s' % (x['cyl'], x['carb'], 
                                                                                   x['gear'], x['wt']),
      axis=1
    )
  ).copy()

n = n.rename('mpg')

names = sl.copy()
names['hp'] = n
del names['mpg']

print('names: ')
display(names.head())

# create colors 
colors = sl.copy()
colors.loc[:, :] = '#C14953'
del colors['mpg']

print('colors: ')
display(colors.head())

x values:


Unnamed: 0,hp
0,110
1,110
2,93
3,110
4,175


y values:


Unnamed: 0,mpg
0,21.0
1,21.0
2,22.8
3,21.4
4,18.7


names: 


Unnamed: 0,hp
0,# Cylinders: 6.0<br># Carbs: 4.0<br># Gears: 4...
1,# Cylinders: 6.0<br># Carbs: 4.0<br># Gears: 4...
2,# Cylinders: 4.0<br># Carbs: 1.0<br># Gears: 4...
3,# Cylinders: 6.0<br># Carbs: 1.0<br># Gears: 3...
4,# Cylinders: 8.0<br># Carbs: 2.0<br># Gears: 3...


colors: 


Unnamed: 0,hp
0,#C14953
1,#C14953
2,#C14953
3,#C14953
4,#C14953


The `scatterplot` module takes a separate dataframe for the x values and for the y values:

In [30]:
args = {'x_data':x_data, 'y_data':y_data}
fig = scatterplot.create_graph(**args)

Adding names, labels and colors:

In [31]:
# build graph args 
args['names'] = names
args['colors'] = colors
args['title'] = '<b>Fuel Mileage as a Function of Horsepower</b><br><i>for mtcars data</i>'
args['xlab'] = 'Horsepower'
args['ylab'] = 'Fuel Mileage (mpg)'

# display plot
fig = scatterplot.create_graph(**args)

The `scatterplot` module allows for passing lists of x and y values to plot multiple series of data on the sample plot. 

Generate example data which compares `hp` and `mpg` before and after a made-up fuel mileage enhancement:

In [32]:
# create main data 
sl = df[['hp', 'mpg']].copy()
x_data = sl[['hp']].copy()
x_data_treat = x_data.copy()
x_data_treat['hp'] = x_data['hp'] - (np.random.normal(loc=5, scale=2, size=len(x_data)))
x_data_treat.columns = ['hp_alt']
y_data = sl[['mpg']].copy()
y_data_treat = y_data.copy()
y_data_treat['mpg'] = y_data['mpg'] + (np.random.normal(loc=5, scale=2, size=len(x_data)))
y_data_treat.columns = ['mpg_alt']

print('x1 values:')
display(x_data.head())

print('x2 values:')
display(x_data_treat.head())

print('y1 values:')
display(y_data.head())

print('y2 values:')
display(y_data_treat.head())

# create names
n = (df.reset_index()[['index', 'cyl', 'carb', 'gear', 'wt']].apply(
      lambda x: 'Car ID %s<br># Cylinders: %s<br># Carbs: %s<br># Gears: %s<br>Weight: %s' % (x['index'], x['cyl'], x['carb'], x['gear'], x['wt']),
      axis=1
    )
  ).copy()

names = sl.copy()
names['hp'] = 'Before Treatment<br>' + n
del names['mpg']
names['hp_alt'] = 'After Treatment<br>' + n

print('names: ')
display(names.head())

# create colors 
colors = sl.copy()
colors.loc[:, :] = '#232C65'
del colors['mpg']
colors['hp_alt'] = '#2D2D2A'

print('colors: ')
display(colors.head())

x1 values:


Unnamed: 0,hp
0,110
1,110
2,93
3,110
4,175


x2 values:


Unnamed: 0,hp_alt
0,103.566658
1,107.31813
2,88.986659
3,103.244862
4,170.648826


y1 values:


Unnamed: 0,mpg
0,21.0
1,21.0
2,22.8
3,21.4
4,18.7


y2 values:


Unnamed: 0,mpg_alt
0,27.062968
1,25.843158
2,24.575257
3,26.506649
4,23.28247


names: 


Unnamed: 0,hp,hp_alt
0,Before Treatment<br>Car ID 0.0<br># Cylinders:...,After Treatment<br>Car ID 0.0<br># Cylinders: ...
1,Before Treatment<br>Car ID 1.0<br># Cylinders:...,After Treatment<br>Car ID 1.0<br># Cylinders: ...
2,Before Treatment<br>Car ID 2.0<br># Cylinders:...,After Treatment<br>Car ID 2.0<br># Cylinders: ...
3,Before Treatment<br>Car ID 3.0<br># Cylinders:...,After Treatment<br>Car ID 3.0<br># Cylinders: ...
4,Before Treatment<br>Car ID 4.0<br># Cylinders:...,After Treatment<br>Car ID 4.0<br># Cylinders: ...


colors: 


Unnamed: 0,hp,hp_alt
0,#232C65,#2D2D2A
1,#232C65,#2D2D2A
2,#232C65,#2D2D2A
3,#232C65,#2D2D2A
4,#232C65,#2D2D2A


Now a list of x data and a list of y data can be used to plot both cases on the same graph:

In [33]:
# build graph args 
args['x_data'] = [x_data, x_data_treat]
args['y_data'] = [y_data, y_data_treat]
args['names'] = names
args['colors'] = colors
args['title'] = '<b>Fuel Mileage as a Function of Horsepower</b><br><i>for mtcars data</i>'
args['xlab'] = 'Horsepower'
args['ylab'] = 'Fuel Mileage (mpg)'

# set up callout text 
sl = y_data.join(y_data_treat)
sl['diff'] = sl.mpg_alt - sl.mpg
cid = (sl[(sl.mpg < sl.mpg_alt)]
         .sort_values(by=['diff'], ascending=False).index[0])

x1_loc = x_data.iloc[cid].values[0]
x2_loc = x_data_treat.iloc[cid].values[0]

y1_loc = y_data.iloc[cid].values[0]
y2_loc = y_data_treat.iloc[cid].values[0]

c1 = {'text':'Car %s before upgrade' % cid, 'x':x1_loc, 'y':y1_loc, 
      'showarrow':True, 'ax':150, 'ay':-25}

c2 = {'text':'Car %s after upgrade' % cid, 'x':x2_loc, 'y':y2_loc, 
      'showarrow':True, 'ax':150, 'ay':0}

text = 'Fuel mileage upgrade works for most cars'
args['annotations'] = [{'text':text, 'x':200, 'y':37, 'showarrow':False}, 
                       c1, c2]

# display plot
fig = scatterplot.create_graph(**args)

In [34]:
# write graph to html file 
fp = 'scatterplot-example.html'
barplot.output_graph(fig, fp)

In [35]:
# write graph to png file 
fp = 'scatterplot-example.png'
barplot.output_graph(fig, fp)

# Create Lineplot

In [66]:
from rapid_plotly import lineplot

First, set up some data which can be used to create an example lineplot:

In [67]:
# create some data
sdate = pd.to_datetime('2019-01-01') 
edate = sdate + pd.Timedelta(days=100)
df = pd.DataFrame(pd.date_range(sdate, edate), columns=['date'])
df['smalldata'] = np.random.normal(100, 25, size=len(df))
df['largedata'] = np.random.normal(1000, 250, size=len(df))

# create descriptive date string for hover text 
f = lambda row: '%s, %s %s (Q%s)' % (
                    # weekday
                    row['date'].strftime('%a'),
                    # month
                    row['date'].strftime('%b'),
                    # day
                    row['date'].strftime('%d'),
                    # quarter
                    row['date'].quarter,
                )

df['date_description'] = df.apply(f, axis=1)
df = (df.set_index('date')).copy()

df.head()

Unnamed: 0_level_0,smalldata,largedata,date_description
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01-01,69.658893,1117.333763,"Tue, Jan 01 (Q1)"
2019-01-02,97.494513,822.975238,"Wed, Jan 02 (Q1)"
2019-01-03,91.987668,1256.892833,"Thu, Jan 03 (Q1)"
2019-01-04,85.545338,1119.815226,"Fri, Jan 04 (Q1)"
2019-01-05,98.213401,854.418283,"Sat, Jan 05 (Q1)"


In [68]:
# create hovertext labels
names = df.copy()
f = lambda row: 'Small Data<br>%s<br>%s' % (round(row['smalldata'], 3),
                              row['date_description'])

names['smalldata'] = names.apply(f, axis=1)

f = lambda row: 'Large Data<br>%s<br>%s' % (round(row['largedata'], 3),
                              row['date_description'])

names['largedata'] = names.apply(f, axis=1)
names.head()

Unnamed: 0_level_0,smalldata,largedata,date_description
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-01-01,"Small Data<br>69.659<br>Tue, Jan 01 (Q1)","Large Data<br>1117.334<br>Tue, Jan 01 (Q1)","Tue, Jan 01 (Q1)"
2019-01-02,"Small Data<br>97.495<br>Wed, Jan 02 (Q1)","Large Data<br>822.975<br>Wed, Jan 02 (Q1)","Wed, Jan 02 (Q1)"
2019-01-03,"Small Data<br>91.988<br>Thu, Jan 03 (Q1)","Large Data<br>1256.893<br>Thu, Jan 03 (Q1)","Thu, Jan 03 (Q1)"
2019-01-04,"Small Data<br>85.545<br>Fri, Jan 04 (Q1)","Large Data<br>1119.815<br>Fri, Jan 04 (Q1)","Fri, Jan 04 (Q1)"
2019-01-05,"Small Data<br>98.213<br>Sat, Jan 05 (Q1)","Large Data<br>854.418<br>Sat, Jan 05 (Q1)","Sat, Jan 05 (Q1)"


Now we can build the graph `dict` and create a graph:

In [69]:
# create graph 
args = dict(
    in_data=df[['smalldata']],
    names=names,
    title='<b>Random Data</b>',
    xlab='',
    ylab='Random Values',
    
    # By default plotly shows the value of the data 
    # on the hover popup, but since we built descriptive
    # labels on the hovertext, we can disable the default 
    # hovertext with the `hoverinfo` arg. 
    hoverinfo='text',
)

fig = lineplot.create_graph(**args)

## Multiple Axis Lineplot

Often we want to compare data which has the same x axis, but y axes which vary substantially in range. 

We can view multiple lines on the graph by simply passing multiple columns to `in_data`:

In [70]:
# create graph 
args['in_data'] = in_data=df[['smalldata', 'largedata']]

fig = lineplot.create_graph(**args)

...but this makes it hard to see the variation in the `smalldata` series, because of the range in `largedata`.

We can use `alt_trace_cols` to specify traces to go on a secondary y axis, on the right:

In [71]:
args['alt_trace_cols'] = ['largedata']
args['ylab'] = 'Smaller Random Values'
args['y2lab'] = 'Larger Random Values'

fig = lineplot.create_graph(**args)

Now `smalldata` and `largedata` are more easily comparable.

# Create Barplot with Line Overlay

Sometimes it is desirable to have multiple graph types on the same graph, for example a barplot with a line graph.

Let's first build a bar graph from the first example:

In [74]:
from rapid_plotly import barplot

In [75]:
# load example mtcars data 
df = pd.read_csv('mtcars.csv')
print(df.shape)
df.head()

(32, 11)


Unnamed: 0,mpg,cyl,disp,hp,drat,wt,qsec,vs,am,gear,carb
0,21.0,6,160.0,110,3.9,2.62,16.46,0,1,4,4
1,21.0,6,160.0,110,3.9,2.875,17.02,0,1,4,4
2,22.8,4,108.0,93,3.85,2.32,18.61,1,1,4,1
3,21.4,6,258.0,110,3.08,3.215,19.44,1,0,3,1
4,18.7,8,360.0,175,3.15,3.44,17.02,0,0,3,2


In [76]:
# create graph data 
in_data = pd.DataFrame(df.groupby('cyl').mean()['mpg'])
in_data.index = in_data.index.astype(int).astype(str) + ' Cylinders'

# generate names
l = string.ascii_lowercase
names = in_data.copy()
f = lambda: l[np.random.randint(0,len(l))]

for x in names.index:
    names.loc[x, 'mpg'] = f()+f()
    
# generate error bars data
errors = in_data.copy()
errors['mpg'] = 2.5

In [77]:
# build graph args 
args = dict(
    in_data=in_data,
    names=names,
    errors=errors,
    title='<b>Fuel Mileage by Number of Cylinders</b>',
    ylab='Miles per Gallon',
    xlab='',
)

fig = barplot.create_graph(**args)

Now let's fake up some data to show the lineplot example:

In [78]:
# copy in_data and add random numbers
in_data_alt = in_data.rename(columns={'mpg':'altdata'}).copy()
in_data_alt['altdata'] = [100, 125, 75]

in_data_alt

Unnamed: 0,altdata
4 Cylinders,100
6 Cylinders,125
8 Cylinders,75


... and build a line trace using `create_trace` form helpers:

In [79]:
from rapid_plotly import helpers

In [80]:
aux_traces = [helpers.simple_line_trace(in_data_alt, yaxis='y2')]
args['alt_y'] = True
args['aux_traces'] = aux_traces

In [81]:
fig = barplot.create_graph(**args)