# Tutorial Plotly


Plotly es una herramienta de visualización y análisis de datos en línea que permite obtener gráficos interactivos.  En este tutorial simple y esquématico veremos cómo dar los primeros pasos con Plotly para Python en su uso como generador de gráficos. Está basado en un par de tutoriales de la página oficial de Plotly. 


__Disclaimer__: Este tutorial está creado como recordatorio personal. No es riguroso y puede contener errores.



__Importante__: La versión gratuita de Plotly únicamente permite generar 25 gráficos. Cuando lleguemos al tope nos pedirá que eliminemos algún gráfico o sobreescribamos sobre uno ya generado. Los gráficos se pueden eliminar en nuestro perfil de Plotly. 

__Instalación__

1. Nos creamos una cuenta en [Plotly](https://plot.ly/#/). Necesitamos la API que podemos obtener en el perfil creado. Si aparece oculta dar a refescar.  

1. $ pip install plotly

2. $ python

   > import plotly
   
   > plotly.tools.set_credentials_file(username='nombre_usuario', api_key='_api_generada')

3. Chequear en ~/.plotly/.credentials que tenemos las credenciales adecuadas. 

In [1]:
# Importamos librerías

import pandas as pd
import numpy as np
import plotly.plotly as py      # Contiene las funciones que nos ayudarán a comunicarnos con los servidores de Plotly.
import plotly.graph_objs as go  # Contienen las funciones que generarán los objetos gráficos. 

from IPython.display import IFrame 

## Creación de trazas, Data y Layout

In [2]:
# Creamos las trazas que luego representaremos. 
# Una traza solo es el nombre que damos a una colección de datos y de especificaciones . La traza es un objeto en sí. 

trace1 = go.Scatter(x=[1,2,3], y=[4,5,6], marker={'color':'red', 'symbol':104, 'size':10},
                   mode="markers+lines", text=["one", "two", "three"], name='1st Trace')

trace2 = go.Scatter(x=[1,2,3], y=[4.5,3.5,6], marker={'color':'blue', 'symbol':104, 'size':10},
                   mode="markers+lines", text=["one", "two", "three"], name='2st Trace')

In [3]:
# Data contiene todas las trazas que queremos representar.
data = go.Data([trace1,trace2])

# Layout define el aspecto que tendrá el gráfico y los atributos que no están relacionados con los datos. 
layout = go.Layout(title='Test', xaxis={'title':'x1'}, yaxis={'title':'x2'})

# Figure crea el objeto final que representaremos. Crea un objeto de tipo diccionario con la información de Data y Layout
figure = go.Figure(data=data, layout=layout)
py.iplot(figure, filename='pyguide_1')


plotly.graph_objs.Data is deprecated.
Please replace it with a list or tuple of instances of the following types
  - plotly.graph_objs.Scatter
  - plotly.graph_objs.Bar
  - plotly.graph_objs.Area
  - plotly.graph_objs.Histogram
  - etc.



Consider using IPython.display.IFrame instead



In [4]:
# Vemos que figure realmente es un diccionario
figure

Figure({
    'data': [{'marker': {'color': 'red', 'size': 10, 'symbol': 104},
              'mode': 'markers+lines',
              'name': '1st Trace',
              'text': [one, two, three],
              'type': 'scatter',
              'uid': 'aca4480e-6516-4f88-bf6f-124fc392e056',
              'x': [1, 2, 3],
              'y': [4, 5, 6]},
             {'marker': {'color': 'blue', 'size': 10, 'symbol': 104},
              'mode': 'markers+lines',
              'name': '2st Trace',
              'text': [one, two, three],
              'type': 'scatter',
              'uid': '18f20904-a85b-44c9-8d2b-2bbf6d21dc1e',
              'x': [1, 2, 3],
              'y': [4.5, 3.5, 6]}],
    'layout': {'title': {'text': 'Test'}, 'xaxis': {'title': {'text': 'x1'}}, 'yaxis': {'title': {'text': 'x2'}}}
})

## Un ejemplo más complicado con datos de GDP y esperanza de vida por país

Crearemos un gráfico interactivo de puntos

In [5]:
df = pd.read_csv('https://raw.githubusercontent.com/yankev/test/master/life-expectancy-per-GDP-2007.csv')

In [6]:
df.head()

Unnamed: 0.1,Unnamed: 0,continent,country,gdp_percap,life_exp
0,0,Americas,Country: Argentina<br>Population: 40301927.0,12779.37964,75.32
1,1,Americas,Country: Bolivia<br>Population: 9119152.0,3822.137084,65.554
2,2,Americas,Country: Brazil<br>Population: 190010647.0,9065.800825,72.39
3,3,Americas,Country: Canada<br>Population: 33390141.0,36319.23501,80.653
4,4,Americas,Country: Chile<br>Population: 16284741.0,13171.63885,78.553


In [7]:
df['continent'].unique()

array(['Americas', 'Europe'], dtype=object)

In [8]:
americas = df[df['continent']=='Americas']
europe = df[df['continent']=='Europe']

In [9]:
americas.head()

Unnamed: 0.1,Unnamed: 0,continent,country,gdp_percap,life_exp
0,0,Americas,Country: Argentina<br>Population: 40301927.0,12779.37964,75.32
1,1,Americas,Country: Bolivia<br>Population: 9119152.0,3822.137084,65.554
2,2,Americas,Country: Brazil<br>Population: 190010647.0,9065.800825,72.39
3,3,Americas,Country: Canada<br>Population: 33390141.0,36319.23501,80.653
4,4,Americas,Country: Chile<br>Population: 16284741.0,13171.63885,78.553


In [10]:
trace_comp0 = go.Scatter(x = americas.gdp_percap, y = americas.life_exp, 
                         mode='markers', 
                         marker={'color':'navy', 'symbol':0, 'size':12},
                         name='Americas',
                         text=americas.country)

trace_comp1 = go.Scatter(x = europe.gdp_percap, y = europe.life_exp, 
                         mode='markers', 
                         marker={'color':'darkorange', 'symbol':0, 'size':12},
                         name='Europe', 
                         text=europe.country)

In [11]:
data_comp = [trace_comp0, trace_comp1]

layout_comp = go.Layout(title='Life exp. vs. GDP vy country', xaxis={'title':'GDP'}, yaxis={'title':'Life expentancy'})

fig_comp = go.Figure(data=data_comp, layout=layout_comp)
py.iplot(fig_comp, filename='life vs. GDP')

# Cufflinks 

Cufflinks es un paquete muy útil que nos va a permitir trabajar con pandas y Plotly


$ pip install cufflinks

In [12]:
import plotly.tools as tls
import cufflinks as cf
# Escogemos el tema de los gŕaficos: 
cf.set_config_file(theme='ggplot')
#cf.set_config_file(theme='pearl') #fondo gris.
#cf.set_config_file(theme='white') #fondo blanco. 
#cf.set_config_file(theme='solar') #fondo negro. 

In [13]:
# Cargamos un datasets de genes http://www.data-gen.com/es/
df = cf.datagen.lines()

In [14]:
df.head()

Unnamed: 0,YFQ.GN,QRW.XP,GFJ.YP,SQH.FY,TID.MB
2015-01-01,-0.557291,2.263592,1.079831,0.717926,-0.550931
2015-01-02,-0.741325,1.284434,2.444546,0.069252,0.389617
2015-01-03,-1.625984,0.187143,2.490457,0.053704,0.786578
2015-01-04,-1.552797,1.30582,3.869178,-0.237811,1.702188
2015-01-05,-2.266296,2.664333,4.426728,0.112407,1.269569


In [15]:
py.iplot([{
    'x': df.index, 
    'y': df[col], 
    'name': col
} for col in df.columns], filename = 'simple-line')


Consider using IPython.display.IFrame instead



In [16]:
# scatter_matrix es como el pairplot de seaborn
df.scatter_matrix(filename='scatter-matrix', world_readable=True)  
#world_readable=False si queremos que sea gráficos privados

## Gráfico de líneas

In [17]:
df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B']).cumsum()

In [18]:
df.head()

Unnamed: 0,A,B
0,1.468932,-2.00981
1,1.061459,-1.872694
2,1.990768,-2.354977
3,2.225498,-2.910247
4,0.427226,-3.521017


In [19]:
df.iplot(filename='line-example')

## Gráfico de barras

In [20]:
df = pd.read_csv('https://raw.githubusercontent.com/plotly/widgets/master/ipython-examples/311_150k.csv', parse_dates=True, index_col=1)


Columns (8,39,46,47,48) have mixed types. Specify dtype option on import or set low_memory=False.



In [21]:
df.head(3)

Unnamed: 0_level_0,Unique Key,Closed Date,Agency,Agency Name,Complaint Type,Descriptor,Location Type,Incident Zip,Incident Address,Street Name,...,Bridge Highway Name,Bridge Highway Direction,Road Ramp,Bridge Highway Segment,Garage Lot Name,Ferry Direction,Ferry Terminal Name,Latitude,Longitude,Location
Created Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2014-11-16 23:46:00,29300358,11/16/2014 11:46:00 PM,DSNY,BCC - Queens East,Derelict Vehicles,14 Derelict Vehicles,Street,11432,80-25 PARSONS BOULEVARD,PARSONS BOULEVARD,...,,,,,,,,40.719411,-73.808882,"(40.719410639341916, -73.80888158860446)"
2014-11-16 02:24:35,29299837,11/16/2014 02:24:35 AM,DOB,Department of Buildings,Building/Use,Illegal Conversion Of Residential Building/Space,,10465,938 HUNTINGTON AVENUE,HUNTINGTON AVENUE,...,,,,,,,,40.827862,-73.830641,"(40.827862046105416, -73.83064067165407)"
2014-11-16 02:17:12,29297857,11/16/2014 02:50:48 AM,NYPD,New York City Police Department,Illegal Parking,Blocked Sidewalk,Street/Sidewalk,11201,229 DUFFIELD STREET,DUFFIELD STREET,...,,,,,,,,40.691248,-73.984375,"(40.69124772858873, -73.98437529459297)"


In [22]:
series = df['Complaint Type'].value_counts()[:20]

In [23]:
series

HEAT/HOT WATER              32202
Street Light Condition       7558
Blocked Driveway             6997
UNSANITARY CONDITION         6174
PAINT/PLASTER                5388
Illegal Parking              5381
Street Condition             4847
Noise                        4615
PLUMBING                     4284
Water System                 3323
Noise - Commercial           3206
DOOR/WINDOW                  3194
Traffic Signal Condition     2766
WATER LEAK                   2501
Dirty Conditions             2283
ELECTRIC                     2205
Sanitation Condition         2195
DOF Literature Request       2183
Broken Muni Meter            2159
FLOORING/STAIRS              2129
Name: Complaint Type, dtype: int64

In [24]:
series.iplot(kind='bar', filename='barras')

In [25]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['A', 'B', 'C', 'D'])

In [26]:
df.head()

Unnamed: 0,A,B,C,D
0,0.8642,0.343513,0.462888,0.631239
1,0.903306,0.911805,0.161114,0.958743
2,0.711092,0.87188,0.477521,0.121349
3,0.394112,0.615067,0.990442,0.437044
4,0.533061,0.157278,0.496085,0.061839


In [27]:
df_cumsum = df.cumsum()

In [28]:
df_cumsum

Unnamed: 0,A,B,C,D
0,0.8642,0.343513,0.462888,0.631239
1,1.767505,1.255318,0.624002,1.589983
2,2.478597,2.127198,1.101523,1.711332
3,2.872709,2.742265,2.091965,2.148375
4,3.40577,2.899543,2.58805,2.210214
5,3.486672,3.867491,2.904158,2.781289
6,4.434448,4.439313,2.98333,3.553561
7,4.685172,5.001223,3.273358,4.289178
8,5.23243,5.614014,4.041351,5.192164
9,5.587874,6.441123,4.232291,5.614997


In [29]:
df_cumsum.iplot(kind='barh', barmode='stack', bargap=.1, filename='barras 2')