<table style="float:left; border:none">
   <tr style="border:none; background-color: #ffffff">
       <td style="border:none">
           <a href="http://bokeh.pydata.org/">     
           <img 
               src="assets/bokeh-transparent.png" 
               style="width:50px"
           >
           </a>    
       </td>
       <td style="border:none">
           <h1>Bokeh Tutorial</h1>
       </td>
   </tr>
</table>

<div style="float:right;"><h2>04. Data Sources and Transformations</h2></div>

In [1]:
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

In [2]:
output_notebook()

# Overview

We've seen how Bokeh can work well with Python lists, NumPy arrays, Pandas series, etc. At lower levels, these inputs are converted to a Bokeh `ColumnDataSource`. This data type is the central data source object used throughout Bokeh. Although Bokeh often creates them for us transparently, there are times when it is useful to create them explicitly.

In later sections we will see features like hover tooltips, computed transforms, and CustomJS interactions that make use of the `ColumnDataSource`, so let's take a quick look now. 

## Creating with Python Dicts

The `ColumnDataSource` can be imported from `bokeh.models`:

In [4]:
from bokeh.models import ColumnDataSource

The `ColumnDataSource` is a mapping of column names (strings) to sequences of values. Here is a simple example. The mapping is provided by passing a Python `dict` with string keys and simple Python lists as values. The values could also be NumPy arrays, or Pandas sequences.

***NOTE: ALL the columns in a `ColumnDataSource` must always be the SAME length.***


In [5]:
source = ColumnDataSource(data={
    'x' : [1, 2, 3, 4, 5],
    'y' : [3, 7, 8, 5, 1],
})

Up until now we have called functions like `p.circle` by passing in literal lists or arrays of data directly, when we do this, Bokeh creates a `ColumnDataSource` for us, automatically. But it is possible to specify a `ColumnDataSource` explicitly by passing it as the `source` argument to a glyph method. Whenever we do this, if we want a property (like `"x"` or `"y"` or `"fill_color"`) to have a sequence of values, we pass the ***name of the column*** that we would like to use for a property:

In [6]:
p = figure(plot_width=400, plot_height=400)
p.circle('x', 'y', size=20, source=source)
show(p)

In [21]:
# Exercise: create a column data source with NumPy arrays as column values and plot it
import numpy as np

h = np.array([[1,3,2,2,2,2,1],[10,40,60,10,60,70,80],[10,10,10,20,30,40,50],['red','blue', ' green', 'red','yellow', 'black', 'red']])

p=figure(plot_width=500, plot_height=500)

p.diamond(h[0],h[1], size=h[2], color=h[3])

show(p)

## Creating with Pandas DataFrames

It's also simple to create `ColumnDataSource` objects directly from Pandas data frames. To do this, just pass the data frame to  `ColumnDataSource` when you create it:

In [25]:
from bokeh.sampledata.iris import flowers as df

source = ColumnDataSource(df)


Now we can use it as we did above by passing the column names to glhph methods:

In [26]:
p = figure(plot_width=400, plot_height=400)
p.circle('petal_length', 'petal_width', source=source)
show(p)

In [38]:
# Exercise: create a column data source with the autompg sample data frame and plot it

from bokeh.sampledata.autompg import autompg_clean as df
source = ColumnDataSource(df)

p=figure(plot_width=500, plot_height=500)
p.circle("yr","cyl", source=source)
show(p)
df


Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,mfr
0,18.0,8,307.0,130,3504,12.0,70,North America,chevrolet chevelle malibu,chevrolet
1,15.0,8,350.0,165,3693,11.5,70,North America,buick skylark 320,buick
2,18.0,8,318.0,150,3436,11.0,70,North America,plymouth satellite,plymouth
3,16.0,8,304.0,150,3433,12.0,70,North America,amc rebel sst,amc
4,17.0,8,302.0,140,3449,10.5,70,North America,ford torino,ford
5,15.0,8,429.0,198,4341,10.0,70,North America,ford galaxie 500,ford
6,14.0,8,454.0,220,4354,9.0,70,North America,chevrolet impala,chevrolet
7,14.0,8,440.0,215,4312,8.5,70,North America,plymouth fury iii,plymouth
8,14.0,8,455.0,225,4425,10.0,70,North America,pontiac catalina,pontiac
9,15.0,8,390.0,190,3850,8.5,70,North America,amc ambassador dpl,amc
