# [Cufflinks in Python](https://plot.ly/ipython-notebooks/cufflinks/)

**An overview of cufflinks, a library for easy interactive Pandas charting with Plotly.**  
**Cufflinks binds Plotly directly to pandas dataframes.**

### [Plotly Scikit-Learn Library](https://plot.ly/scikit-learn/)

[Plotly login (for online mode)](https://plot.ly/organize/recent)  
(reference/help at bottom)

# Toggle warnings

In [6]:
# A way to toggle warnings on/off
from IPython.display import HTML
HTML('''<script>
code_show_err=false; 
function code_toggle_err() {
 if (code_show_err){
 $('div.output_stderr').hide();
 } else {
 $('div.output_stderr').show();
 }
 code_show_err = !code_show_err
} 
$( document ).ready(code_toggle_err);
</script>
To toggle on/off output_stderr, click <a href="javascript:code_toggle_err()">here</a>.''')

# Setup

In [1]:
import numpy as np
import pandas as pd

In [2]:
import cufflinks as cf
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.graph_objs import *

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


In [3]:
print("plotly version:", __version__)
print("cufflinks version:", cf.__version__)

plotly version: 2.2.3
cufflinks version: 0.12.1


In [4]:
#cf.go_offline() # ignore warning
# cf.go_online() # # switch back to online mode, where graphs are saved on your online plotly account
cf.set_config_file(offline=True, theme='ggplot')

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.


You can also plot your graphs offline inside a Jupyter Notebook Environment. First you need to initiate the Plotly Notebook mode.

Run at the start of every ipython notebook to use plotly.offline. This injects the plotly.js source files into the notebook.

In [5]:
init_notebook_mode(connected=True)

# Dataframes

In [6]:
df = cf.datagen.lines()
df.head()

Unnamed: 0,MZR.VX,AEG.LZ,FZS.ZB,ZMU.WB,RCL.AE
2015-01-01,-0.363784,0.927992,0.207129,-0.351683,-1.017494
2015-01-02,-1.98066,2.001981,-0.391922,-1.413196,-1.182459
2015-01-03,-3.252652,1.451111,0.482875,-1.41649,-3.072943
2015-01-04,-3.314936,1.955955,0.429951,-1.37741,-2.994577
2015-01-05,-2.720162,2.937598,-0.56518,-1.913179,-2.765161


With cufflinks, you can plot figures directly

In [7]:
df.iplot(kind='scatter') # can add filename='cf-simple-line'

#### Offline (without `cf.go_offline()` command):
Notice:
- iplot wrapper
- `asFigure=True`

In [8]:
# offline without needing to turn on offline mode
iplot(df.iplot(asFigure=True, kind='scatter'))

Almost every chart that you make in cufflinks will be created with just one line of code.

In [9]:
df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
df.head(3)

Unnamed: 0,a,b,c,d
0,-1.261676,-0.098961,0.248916,0.47572
1,0.263742,-2.842817,-0.179436,-0.310311
2,2.521531,-1.516106,1.702119,-1.483904


In [10]:
df.scatter_matrix()

Charts created with cufflinks are synced with your online Plotly account. You'll need to configure your credentials to get started. cufflinks can also be configured to work offline in IPython notebooks with Plotly Offline. To get started with Plotly Offline, [download a trial library](http://purchasing.plot.ly/plotly-offline-ipython) and run `cf.go_offline().`

By default, plotly graphs are *public*. Make them private by setting `world_readable` to `False`

In [11]:
# example (doesn't make a difference when offline)
# df.a.iplot(kind='histogram', world_readable=True)
# df.a.iplot(kind='histogram', world_readable=False)

Only *you* (the creator) will be able to see this chart, or change the global, default settings with `cf.set_config_file`

In [12]:
# cf.set_config_file(offline=False, world_readable=True, theme='ggplot')

# Chart Types

# Line Charts

In [13]:
df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B']).cumsum()
print(df.shape)
df.head(3)

(1000, 2)


Unnamed: 0,A,B
0,1.874859,0.266821
1,2.269723,0.98038
2,2.229664,1.980871


In [14]:
df.iplot()

Plot one column vs another with `x` and `y` keywords

In [15]:
df.iplot(x='A', y='B')

# Bar Charts

Download some civic data. A time series log of the 311 complaints in NYC.

In [16]:
df = pd.read_csv('https://raw.githubusercontent.com/plotly/widgets/master/ipython-examples/311_150k.csv', 
                 parse_dates=True, index_col=1)
df.head(3)


Columns (8,39,46,47,48) have mixed types. Specify dtype option on import or set low_memory=False.



Unnamed: 0_level_0,Unique Key,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,...,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Garage Lot Name,Ferry Direction,Ferry Terminal Name,Latitude,Longitude,Location
Created Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2014-11-16 23:46:00,29300358,11/16/2014 11:46:00 PM,DSNY,BCC - Queens East,Derelict Vehicles,14 Derelict Vehicles,Street,11432,80-25 PARSONS BOULEVARD,PARSONS BOULEVARD,...,,,,,,,,40.719411,-73.808882,"(40.719410639341916, -73.80888158860446)"
2014-11-16 02:24:35,29299837,11/16/2014 02:24:35 AM,DOB,Department of Buildings,Building/Use,Illegal Conversion Of Residential Building/Space,,10465,938 HUNTINGTON AVENUE,HUNTINGTON AVENUE,...,,,,,,,,40.827862,-73.830641,"(40.827862046105416, -73.83064067165407)"
2014-11-16 02:17:12,29297857,11/16/2014 02:50:48 AM,NYPD,New York City Police Department,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11201,229 DUFFIELD STREET,DUFFIELD STREET,...,,,,,,,,40.691248,-73.984375,"(40.69124772858873, -73.98437529459297)"


In [17]:
series = df['Complaint Type'].value_counts()[:20]
series.head()

HEAT/HOT WATER            32202
Street Light Condition     7558
Blocked Driveway           6997
UNSANITARY CONDITION       6174
PAINT/PLASTER              5388
Name: Complaint Type, dtype: int64

Plot a `series` directly

In [18]:
series.iplot(kind='bar', yTitle='Number of Complaints', title='NYC 311 Complaints')

Plot a dataframe row as a bar

In [19]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['A', 'B', 'C', 'D'])
print(df.shape)
df.head(3)

(10, 4)


Unnamed: 0,A,B,C,D
0,0.312067,0.451834,0.347525,0.898332
1,0.305509,0.428479,0.096883,0.006325
2,0.822202,0.799736,0.097384,0.575003


In [20]:
row = df.iloc[5] # iloc: positions in index (ints), loc: labels in index
row.iplot(kind='bar')

Call `iplot(kind='bar')` on a dataframe to produce a grouped bar chart

In [21]:
df.iplot(kind='bar')

In [22]:
df.iplot(kind='bar', barmode='stack')

**Remember:** plotly charts are interactive. Click on the legend entries to hide-and-show traces, click-and-drag to zoom, double-click to autoscale, shift-click to drag.

Make your bar charts horizontal with `kind='barh'`

In [23]:
df.iplot(kind='barh', barmode='stack', bargap=.1)

# Themes

cufflinks ships with a few themes. View available themes with `cf.getThemes`, apply them with `cf.set_config_file`

In [24]:
cf.getThemes()

['ggplot', 'pearl', 'solar', 'space', 'white', 'polar', 'henanigans']

In [25]:
cf.set_config_file(theme='pearl')

# Histograms

In [26]:
df = pd.DataFrame({'a': np.random.randn(1000) + 1,
                   'b': np.random.randn(1000),
                   'c': np.random.randn(1000) - 1})
df.head(3)

Unnamed: 0,a,b,c
0,3.538928,1.433054,-0.522191
1,2.055165,0.850717,-2.526639
2,0.961194,1.062676,-0.178962


In [27]:
df.iplot(kind='histogram')

Customize your histogram with

- barmode (overlay | group | stack)
- bins (int)
- histnorm ('' | 'percent' | 'probability' | 'density' | 'probability density')
- histfunc ('count' | 'sum' | 'avg' | 'min' | 'max')

In [28]:
df.iplot(kind='histogram', histnorm='probability density')

In [29]:
# For help:
#?df.iplot

In [30]:
df.iplot(kind='histogram', barmode='stack', bins=100, histnorm='probability')

Like every chart type, split your traces into subplots or small-multiples with `subplots` and optionally `shape`. More on subplots below.

In [31]:
df.iplot(kind='histogram', subplots=True, shape=(3, 1))

# Box Plots

In [32]:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.head(3)

Unnamed: 0,A,B,C,D,E
0,0.991901,0.413994,0.879029,0.606633,0.874725
1,0.989703,0.124067,0.038678,0.171866,0.232292
2,0.364617,0.768769,0.532122,0.154757,0.799811


In [33]:
df.iplot(kind='box')

# Area Charts

To produce stacked area plot, each column must be either all positive or all negative values.

When input data contains `NaN`, it will be automatically filled by 0. If you want to drop or fill by different values, use `dataframe.dropna()` or `dataframe.fillna()` before calling plot.

In [34]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.head(3)

Unnamed: 0,a,b,c,d
0,0.636963,0.246123,0.423652,0.857906
1,0.832647,0.508568,0.488426,0.635165
2,0.653405,0.873978,0.848461,0.210039


In [35]:
df.iplot(kind='area', fill=True)

In [36]:
df.sum(axis=1)

0    2.164644
1    2.464806
2    2.585882
3    1.371905
4    2.085539
5    1.228253
6    1.712424
7    2.274709
8    2.891554
9    1.927716
dtype: float64

For non-stacked area charts, set `kind=scatter` with `fill=True`

In [37]:
df.iplot(kind='scatter', fill=True)

# Scatter Plot

Set `x` and `y` as column names. If `x` isn't supplied, df.index will be used.

In [38]:
df = pd.read_csv('http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt', sep='\t')
df.head(3)

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
0,Afghanistan,1952,8425333.0,Asia,28.801,779.445314
1,Afghanistan,1957,9240934.0,Asia,30.332,820.85303
2,Afghanistan,1962,10267083.0,Asia,31.997,853.10071


In [39]:
df2007 = df[df.year==2007]
df1952 = df[df.year==1952]

In [40]:
df2007.iplot(kind='scatter', mode='markers', x='gdpPercap', y='lifeExp')

Plotting multiple column scatter plots isn't as easy with cufflinks. Here is an example with Plotly's native syntax

In [41]:
fig = {
    'data': [
        {'x': df2007.gdpPercap, 'y': df2007.lifeExp, 'text': df2007.country, 'mode': 'markers', 'name': '2007'},
        {'x': df1952.gdpPercap, 'y': df1952.lifeExp, 'text': df1952.country, 'mode': 'markers', 'name': '1952'}
    ],
    'layout': {
        'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
        'yaxis': {'title': "Life Expectancy"}
    }
}
iplot(fig)

Grouping isn't as easy either. But, with Plotly's native syntax:

In [42]:
iplot(
    {
        'data': [ # list comprehension
            {
                'x': df[df['year']==year]['gdpPercap'],
                'y': df[df['year']==year]['lifeExp'],
                'name': year, 'mode': 'markers',
            } for year in [1952, 1982, 2007]
        ],
        'layout': {
            'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
            'yaxis': {'title': "Life Expectancy"}
        }
})#, filename='scatter-group-by')

# Bubble Charts

(bubble charts display three dimensions of data)

Add `size` to create a bubble chart. Add hover text with the `text` attribute.

In [43]:
cf.set_config_file(theme='ggplot')

# 'pop': population (third dimension)
df2007.iplot(kind='bubble', x='gdpPercap', y='lifeExp', size='pop', 
             text='country', xTitle='GDP per Capita', 
             yTitle='Life Expectancy')

In [44]:
df2007.head()

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
11,Afghanistan,2007,31889923.0,Asia,43.828,974.580338
23,Albania,2007,3600523.0,Europe,76.423,5937.029526
35,Algeria,2007,33333216.0,Africa,72.301,6223.367465
47,Angola,2007,12420476.0,Africa,42.731,4797.231267
59,Argentina,2007,40301927.0,Americas,75.32,12779.37964


# Subplots

`subplots=True` partitions columns into separate subplots. Specify rows and columns with `shape=(rows, cols)` and share axes with `shared_xaxes=True` and `shared_yaxes=True`.

In [45]:
df=cf.datagen.lines(4)
df.head(3)

Unnamed: 0,QEW.FN,COT.ZU,ZEZ.TO,KFB.UK
2015-01-01,0.240923,-0.034704,-0.260626,-0.941465
2015-01-02,0.896229,-0.796522,0.935501,-0.187786
2015-01-03,0.770498,-0.3647,1.435228,-1.140021


In [46]:
df.iplot(subplots=True, shape=(4,1), shared_xaxes=True, fill=True)

Add subplot titles with `subplot_titles` as a list of titles or `True` to use column names.

In [47]:
df.iplot(subplots=True, subplot_titles=True, legend=False)

# Scatter matrix

In [48]:
df.scatter_matrix()

# Heatmaps

In [49]:
df = cf.datagen.heatmap(20,20)
df.head(3)

Unnamed: 0,y_0,y_1,y_2,y_3,y_4,y_5,y_6,y_7,y_8,y_9,y_10,y_11,y_12,y_13,y_14,y_15,y_16,y_17,y_18,y_19
x_0,35.0,36.052072,37.505727,36.44015,43.872862,44.795299,44.524214,45.637315,45.538439,49.649094,46.329585,33.623073,24.024687,18.995035,17.791758,22.55338,21.945877,23.922409,27.536723,29.059062
x_1,34.19609,37.692737,38.390329,40.89896,40.385396,41.027217,39.400006,47.278175,48.553637,40.648538,37.497143,32.46814,34.202212,42.581496,48.367984,49.182305,52.95511,54.257433,58.810176,59.812719
x_2,51.403761,50.969579,50.270829,53.60139,51.822814,51.814167,51.974731,39.700153,44.43224,41.80647,33.578794,33.681471,33.301385,33.589383,31.845768,38.209206,30.712631,29.177566,25.007079,33.495802


In [50]:
df.iplot(kind='heatmap',colorscale='spectral')

# Lines and Shaded Areas

Use `hline` and `vline` for horizontal and verticle lines.

In [51]:
df=cf.datagen.lines(3,columns=['a','b','c'])
df.head(3)

Unnamed: 0,a,b,c
2015-01-01,-0.284937,0.165884,0.765664
2015-01-02,0.371698,-1.684441,0.568019
2015-01-03,-1.163599,-1.03945,0.488387


In [52]:
df.iplot(hline=[2,4],vline=['2015-02-10'])

Draw shaded regions with `hspan`

In [53]:
df.iplot(hspan=[(-1,1),(2,5)])

Extra parameters can be passed in the form of dictionaries, width, fill, color, fillcolor, opacity

In [54]:
df.iplot(vspan={'x0':'2015-02-15','x1':'2015-03-15',
                'color':'rgba(30,30,30,0.3)','fill':True,'opacity':.4})

# Customizing Figures

`cufflinks` is designed for simple one-line charting with Pandas and Plotly. All of the Plotly chart attributes are not directly assignable in the `df.iplot` call signature.

To update attributes of a `cufflinks` chart that aren't available, first convert it to a figure (`asFigure=True`), then tweak it, then plot it with `plotly.plotly.iplot`.

Here is an example of a simple plotly figure. You can find more examples in [our online python documentation](https://plot.ly/python).

In [61]:
iplot({
    'data': [
        Bar(**{
            'x': [1, 2, 3],
            'y': [3, 1, 5],
            'name': 'first trace',
            'type': 'bar'
        }),
        Bar(**{
            'x': [1, 2, 3],
            'y': [4, 3, 6],
            'name': 'second trace',
            'type': 'bar'
        })
    ],
    'layout': Layout(**{
        'title': 'simple example'
    })
})

`cufflinks` generates these figures that describe plotly graphs. For example, this graph:

In [63]:
df.iplot(kind='scatter')

has this description:

In [65]:
figure = df.iplot(kind='scatter', asFigure=True)
print(figure.to_string())

Figure(
    data=Data([
        Scatter(
            x=['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '..'  ],
            y=array([-0.28493706,  0.37169813, -1.1635989 ,  0.56442837, -0...,
            line=Line(
                color='rgba(226, 74, 51, 1.0)',
                dash='solid',
                width=1.3
            ),
            mode='lines',
            name='a',
            text=''
        ),
        Scatter(
            x=['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '..'  ],
            y=array([ 0.16588384, -1.68444147, -1.03944961, -2.32627156, -2...,
            line=Line(
                color='rgba(62, 111, 176, 1.0)',
                dash='solid',
                width=1.3
            ),
            mode='lines',
            name='b',
            text=''
        ),
        Scatter(
            x=['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '..'  ],
            y=array([  7.65664343e-01,   5.68019313e-01,   4.88386603e-01,
 ..,
  

So, if you want to edit any attribute of a Plotly graph from cufflinks, first convert it to a figure and then edit the figure objects. Let's add a yaxis title, tick suffixes, and new legend names to this example:

In [66]:
figure['layout']['yaxis1'].update({'title': 'Price', 'tickprefix': '$'})

for i, trace in enumerate(figure['data']):
    trace['name'] = 'Trace {}'.format(i)
    
iplot(figure)

[See more examples of Plotly graphs](https://plot.ly/python/) or [view the entire reference of valid attributes](https://plot.ly/python/reference/)

# Cufflinks Reference

Cufflinks is [open source on github](https://github.com/santosjorge/cufflinks)!

In [67]:
help(df.iplot)

Help on method _iplot in module cufflinks.plotlytools:

_iplot(data=None, layout=None, filename='', sharing=None, kind='scatter', title='', xTitle='', yTitle='', zTitle='', theme=None, colors=None, colorscale=None, fill=False, width=None, dash='solid', mode='lines', symbol='dot', size=12, barmode='', sortbars=False, bargap=None, bargroupgap=None, bins=None, histnorm='', histfunc='count', orientation='v', boxpoints=False, annotations=None, keys=False, bestfit=False, bestfit_colors=None, mean=False, mean_colors=None, categories='', x='', y='', z='', text='', gridcolor=None, zerolinecolor=None, margin=None, labels=None, values=None, secondary_y='', secondary_y_title='', subplots=False, shape=None, error_x=None, error_y=None, error_type='data', locations=None, lon=None, lat=None, asFrame=False, asDates=False, asFigure=False, asImage=False, dimensions=None, asPlot=False, asUrl=False, online=None, **kwargs) method of pandas.core.frame.DataFrame instance
           Returns a plotly chart eith