<h1>Introduction to Plotly and Cufflinks</h1>
<h3>
<li>Plotly is an interactive visualization library
<li>Cufflinks connects plotly to Pandas
<li>They must be installed
<li> <a href="https://plot.ly/">Plotly site</a>
<li> <a href="https://github.com/santosjorge/cufflinks">Github Repo for cufflinks</a>   
</h3>

<h2>Setting up Plotly and Cufflinks environment</h2>

In [4]:
!conda install plotly

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/DSX-Python35

  added / updated specs: 
    - plotly


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2019.5.15  |                0         133 KB
    retrying-1.3.3             |           py35_2          15 KB
    plotly-3.10.0              |             py_0        22.9 MB
    ------------------------------------------------------------
                                           Total:        23.0 MB

The following NEW packages will be INSTALLED:

    plotly:          3.10.0-py_0 
    retrying:        1.3.3-py35_2

The following packages will be UPDATED:

    ca-certificates: 2019.1.23-0  --> 2019.5.15-0


Downloading and Extracting Packages
ca-certificates-2019 | 133 KB    | ##################################### | 100% 
retrying-1.3.3       | 15 KB     | ###########################

In [7]:
!pip install cufflinks

Collecting cufflinks
[?25l  Downloading https://files.pythonhosted.org/packages/5e/5a/db3d6523ee870ecc229008b209b6b21231397302de34f9c446929a41f027/cufflinks-0.16.tar.gz (81kB)
[K    100% |████████████████████████████████| 81kB 17.0MB/s ta 0:00:01
Collecting colorlover>=0.2.1 (from cufflinks)
  Downloading https://files.pythonhosted.org/packages/9a/53/f696e4480b1d1de3b1523991dea71cf417c8b19fe70c704da164f3f90972/colorlover-0.3.0-py3-none-any.whl
Building wheels for collected packages: cufflinks
  Running setup.py bdist_wheel for cufflinks ... [?25ldone
[?25h  Stored in directory: /home/dsxuser/.cache/pip/wheels/8d/5a/6f/c97d47dc901071611809eb61aaa477d50a60692dc764dca622
Successfully built cufflinks
[31mtensorflow 1.3.0 requires tensorflow-tensorboard<0.2.0,>=0.1.0, which is not installed.[0m
Installing collected packages: colorlover, cufflinks
Successfully installed colorlover-0.3.0 cufflinks-0.16


In [22]:
import pandas as pd
import numpy as np
%matplotlib inline
from plotly import __version__

In [23]:
print(__version__)  # Version has to be higher than 1.9.x

3.10.0


In [10]:
import cufflinks as cf
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot

In [11]:
# Connect javascript to your notebook
init_notebook_mode(connected=True)

In [12]:
cf.go_offline()  # Allows cufflinks to be used offline

<h2>Creating DataFrames for random data</h2>

In [15]:
# Creating a random df

df = pd.DataFrame(np.random.randn(100,4),columns='A B C D'.split())

In [16]:
df.head()

Unnamed: 0,A,B,C,D
0,1.146868,-1.04919,0.966801,-0.365243
1,-0.676946,1.059804,-1.356431,0.836022
2,-0.451287,0.080131,-0.934378,1.405752
3,-0.445272,0.723041,-0.050367,1.516972
4,-1.4029,-0.903165,-0.33461,2.009809


In [17]:
# Creating another random df

df2 = pd.DataFrame({'Category':['A','B','C'],'Values':[32,43,50]})

In [18]:
df2

Unnamed: 0,Category,Values
0,A,32
1,B,43
2,C,50


<h2>How to use Cufflinks and Plotly</h2>

In [21]:
df2

Unnamed: 0,Category,Values
0,A,32
1,B,43
2,C,50


<h3>Creating an interactive Line Plot</h3>

In [26]:
# Creating an interactive plot, drag mouse across the graph..
# Usage: some dataframe.iplot()

df.iplot()

<h3>Creating an interactive scatter plot </h3>

In [37]:
# Usage: 
# some df.iplot(kind='scatter',x='some column1', y='some column2', mode='markers', (optional)symbol='some symbol (i.e, circle, diamond,square)')

df.iplot(kind='scatter',x='A',y='B',mode='markers', symbol='diamond')

<h3>Creating an interactive bar plot </h3>

In [41]:
df2

Unnamed: 0,Category,Values
0,A,32
1,B,43
2,C,50


In [40]:
# Usage:
# some df.iplot(kind='bar',x='some categorical column', y='some value column')

df2.iplot(kind='bar',x='Category', y='Values')

<h3>Aggregating data to fit into an interactive bar plot </h3>

<h4>There may be times when it will be required to aggregate data in a dataframe to make it fit into a bar plot. In order to make this happen, use the count,sum,groupby,value_count methods</h4>

In [44]:
df.sum().iplot(kind='bar')

<h3>Creating an interactive box plot </h3>

<h4>Use box plot on df with lots of datapoints, to evenly distribute and illustrate the output</h4>

In [46]:
df.head() 

Unnamed: 0,A,B,C,D
0,1.146868,-1.04919,0.966801,-0.365243
1,-0.676946,1.059804,-1.356431,0.836022
2,-0.451287,0.080131,-0.934378,1.405752
3,-0.445272,0.723041,-0.050367,1.516972
4,-1.4029,-0.903165,-0.33461,2.009809


In [47]:
# Usage: some dataframe.iplot(kind='box')

df.iplot(kind='box')

<h3>Creating an interactive 3-D surface plot</h3>

In [50]:
# Creating a random DF to be used in the 3-D surface plot example below

df3 = pd.DataFrame({'x':[1,2,3,4,5], 'y':[10,20,30,20,10],'z':[500,400,300,200,100]})

In [51]:
df3

Unnamed: 0,x,y,z
0,1,10,500
1,2,20,400
2,3,30,300
3,4,20,200
4,5,10,100


In [52]:
# Usage:
# some df.iplot(kind='surface')

df3.iplot(kind='surface')

In [53]:
# Changing the shape, by changing the values in z dict from above

df4 = pd.DataFrame({'x':[1,2,3,4,5], 'y':[10,20,30,20,10],'z':[5,4,3,2,1]})
df4.iplot(kind='surface')

In [55]:
# Changing the colorscale of the graph
# Usage: colorscale='somecolor1somecolor2somecolor3'
# This example uses red yellow blue

df4.iplot(kind='surface',colorscale='rdylbu')

<h3>Creating a histogram interactive plot</h3>

In [56]:
# Usage:
# some df['some column'].iplot(kind='hist')

df['A'].iplot(kind='hist')

In [70]:
# Adding bins to provide more granularity

df['A'].iplot(kind='hist', bins=50)

In [59]:
# Overlapping of distrubuted data when passing the entire df
# Not specifiying a column

# Usage: some df.iplot(kind='hist')

df.iplot(kind='hist')

<h3>Creating a spread interactive plot</h3>
<h4>Similar to stock market analysis graphs</h4>

In [63]:
# Grabbing two columns from the df to use in example below

df[['A','B']].head()

Unnamed: 0,A,B
0,1.146868,-1.04919
1,-0.676946,1.059804
2,-0.451287,0.080131
3,-0.445272,0.723041
4,-1.4029,-0.903165


In [64]:
# Usage:
# some df[['some column1','some column2']].iplot(kind=spread)

df[['A','B']].iplot(kind='spread')

<h3>Creating an interactive bubble plot</h3>

In [72]:
# Usage:
# some df.iplot(kind='bubble', x='some column1', y='some column2', (optional)size='some columnX')

df.iplot(kind='bubble', x='A',y='B',size='C')

<h3>Creating an interactive scatter matrix plot</h3>
<h4> Similar to sns.pairplot()</h4>
<h4> This is *very* kernel intensive, so use with care!</h4>
<h4> Make sure all columns are numerical</h4>

In [69]:
df.scatter_matrix()