___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___
# Plotly and Cufflinks

Plotly is a library that allows you to create interactive plots that you can use in dashboards or websites (you can save them as html files or static images).

## Installation

In order for this all to work, you'll need to install plotly and cufflinks to call plots directly off of a pandas dataframe. These libraries are not currently available through **conda** but are available through **pip**. Install the libraries at your command line/terminal using:

    pip install plotly
    pip install cufflinks

** NOTE: Make sure you only have one installation of Python on your computer when you do this, otherwise the installation may not work. **

## Imports and Set-up

In [1]:
%%bash 
python --version

Python 3.6.10 :: Anaconda, Inc.


In [3]:
from plotly import __version__
print("plotly", __version__)
from cufflinks import __version__
print("cufflinks", __version__)  

plotly 4.14.3
cufflinks 0.17.3


In [4]:
import pandas as pd
import numpy as np
%matplotlib inline

In [5]:
pd, np

(<module 'pandas' from '/home/jyoon/conda3/envs/fastai20/lib/python3.6/site-packages/pandas/__init__.py'>,
 <module 'numpy' from '/home/jyoon/conda3/envs/fastai20/lib/python3.6/site-packages/numpy/__init__.py'>)

In [6]:
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

print(__version__) # requires version >= 1.9.0

4.14.3


In [7]:
plot, iplot

(<function plotly.offline.offline.plot(figure_or_data, show_link=False, link_text='Export to plot.ly', validate=True, output_type='file', include_plotlyjs=True, filename='temp-plot.html', auto_open=True, image=None, image_filename='plot_image', image_width=800, image_height=600, config=None, include_mathjax=False, auto_play=True, animation_opts=None)>,
 <function plotly.offline.offline.iplot(figure_or_data, show_link=False, link_text='Export to plot.ly', validate=True, image=None, filename='plot_image', image_width=800, image_height=600, config=None, auto_play=True, animation_opts=None)>)

In [8]:
import cufflinks as cf

In [9]:
# For Notebooks
init_notebook_mode(connected=True)

In [10]:
# For offline use
cf.go_offline()

### Fake Data

In [11]:
np.random.seed(1)
df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())
df.head()

Unnamed: 0,A,B,C,D
0,1.624345,-0.611756,-0.528172,-1.072969
1,0.865408,-2.301539,1.744812,-0.761207
2,0.319039,-0.24937,1.462108,-2.060141
3,-0.322417,-0.384054,1.133769,-1.099891
4,-0.172428,-0.877858,0.042214,0.582815


In [12]:
df['A'].head()

0    1.624345
1    0.865408
2    0.319039
3   -0.322417
4   -0.172428
Name: A, dtype: float64

In [13]:
df.A.head()

0    1.624345
1    0.865408
2    0.319039
3   -0.322417
4   -0.172428
Name: A, dtype: float64

In [14]:
df[['A', 'B']].head()

Unnamed: 0,A,B
0,1.624345,-0.611756
1,0.865408,-2.301539
2,0.319039,-0.24937
3,-0.322417,-0.384054
4,-0.172428,-0.877858


In [15]:
df.B.head()  # Can't use dot call for more than 1 column

0   -0.611756
1   -2.301539
2   -0.249370
3   -0.384054
4   -0.877858
Name: B, dtype: float64

In [16]:
df['Sum'] = df.sum(axis=1)  # 1 is rows.  
df.head()

Unnamed: 0,A,B,C,D,Sum
0,1.624345,-0.611756,-0.528172,-1.072969,-0.588551
1,0.865408,-2.301539,1.744812,-0.761207,-0.452526
2,0.319039,-0.24937,1.462108,-2.060141,-0.528364
3,-0.322417,-0.384054,1.133769,-1.099891,-0.672593
4,-0.172428,-0.877858,0.042214,0.582815,-0.425258


In [17]:
df['A-B'] = df['A']-df['B']
df.head()

Unnamed: 0,A,B,C,D,Sum,A-B
0,1.624345,-0.611756,-0.528172,-1.072969,-0.588551,2.236102
1,0.865408,-2.301539,1.744812,-0.761207,-0.452526,3.166946
2,0.319039,-0.24937,1.462108,-2.060141,-0.528364,0.568409
3,-0.322417,-0.384054,1.133769,-1.099891,-0.672593,0.061637
4,-0.172428,-0.877858,0.042214,0.582815,-0.425258,0.70543


In [18]:
df[df['A'] > .4].head()  # filter is temporary, returns subset of rows.

Unnamed: 0,A,B,C,D,Sum,A-B
0,1.624345,-0.611756,-0.528172,-1.072969,-0.588551,2.236102
1,0.865408,-2.301539,1.744812,-0.761207,-0.452526,3.166946
6,0.900856,-0.683728,-0.12289,-0.935769,-0.841532,1.584584
14,0.838983,0.931102,0.285587,0.885141,2.940814,-0.092119
16,0.488518,-0.075572,1.131629,1.519817,3.064393,0.56409


In [19]:
df.head()  # df is back to before, without filter. 

Unnamed: 0,A,B,C,D,Sum,A-B
0,1.624345,-0.611756,-0.528172,-1.072969,-0.588551,2.236102
1,0.865408,-2.301539,1.744812,-0.761207,-0.452526,3.166946
2,0.319039,-0.24937,1.462108,-2.060141,-0.528364,0.568409
3,-0.322417,-0.384054,1.133769,-1.099891,-0.672593,0.061637
4,-0.172428,-0.877858,0.042214,0.582815,-0.425258,0.70543


In [20]:
df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())

In [21]:
df.head()

Unnamed: 0,A,B,C,D
0,-1.306534,0.07638,0.367232,1.232899
1,-0.422857,0.086464,-2.142467,-0.830169
2,0.451616,1.104174,-0.281736,2.056356
3,1.760249,-0.060652,-2.413503,-1.777566
4,-0.777859,1.115841,0.310272,-2.094248


In [22]:
df2 = pd.DataFrame({'Category':['A','B','C'],'Values':[32,43,50]})

In [23]:
df2.head()

Unnamed: 0,Category,Values
0,A,32
1,B,43
2,C,50


## Using Cufflinks and iplot()

* scatter
* bar
* box
* spread
* ratio
* heatmap
* surface
* histogram
* bubble

## Scatter

In [35]:
df.iplot(kind='scatter', x='A', y='B', mode='markers', size=10, color='green')

In [36]:
df.iplot(kind='scatter',x='A',y='B',mode='markers',size=10)

## Bar Plots

In [44]:
df2.iplot(kind='bar', x='Category', y='Values', color='blue')

In [28]:
df2.iplot(kind='bar',x='Category',y='Values')

In [33]:
df.count().iplot(kind='bar')

## Boxplots

In [45]:
df.iplot(kind='box')

## 3d Surface

In [46]:
df3 = pd.DataFrame({'x':[1,2,3,4,5],'y':[10,20,30,20,10],'z':[5,4,3,2,1]})
df3.iplot(kind='surface',colorscale='rdylbu')

## Spread

In [50]:
df[['A','B']].iplot(kind='spread')

## histogram

In [50]:
df.B.iplot(kind='hist', bins=25, color='green')

In [56]:
df['A'].iplot(kind='hist',bins=25)

In [69]:
df.iplot(kind='bubble', x='A', y='C', size='B')

In [72]:
import plotly.express as px
px

<module 'plotly.express' from '/home/jyoon/conda3/envs/fastai20/lib/python3.6/site-packages/plotly/express/__init__.py'>

In [89]:
np.random.seed(1)
df4 = pd.DataFrame(np.random.randn(20,4),columns='A B C D'.split())
df4['Name'] = np.arange(20)
df4.head()

Unnamed: 0,A,B,C,D,Name
0,1.624345,-0.611756,-0.528172,-1.072969,0
1,0.865408,-2.301539,1.744812,-0.761207,1
2,0.319039,-0.24937,1.462108,-2.060141,2
3,-0.322417,-0.384054,1.133769,-1.099891,3
4,-0.172428,-0.877858,0.042214,0.582815,4


In [91]:
fig = px.scatter(df4, x='A', y='B', size=abs(df4.C)*10, color=abs(df4.D)*20, 
                opacity=0.5, hover_name='Name', size_max=60)
fig.show()

In [73]:
df = px.data.gapminder()a

fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp",
	         size="pop", color="continent",
                 hover_name="country", log_x=True, size_max=60)
fig.show()

In [117]:
df4.iplot(kind='bubble', x='A', y='B', size='C', color='red')


## scatter_matrix()

Similar to sns.pairplot()

In [71]:
df.scatter_matrix()

# Great Job!