## Bokeh's `ColumnDataSource` object and more plotting goodies
In the last notebook we passed data into our Bokeh figure as a pair of Pandas series objects. In this example, we instead prepare our plot data as a Bokeh **ColumnDataSource** object. This happens behind the scenes anyway, so by doing this explicitly in our code, we actually get more capabilities in working with it. 

Resource: https://bokeh.pydata.org/en/latest/docs/reference/models/sources.html#bokeh.models.sources.ColumnDataSource

In [1]:
#import the data, as before
import pandas as pd
data = pd.read_csv('./data/gapminder.csv',thousands=',',index_col='Year')

In [2]:
#Import the ColumnDataSource object into our code
from bokeh.models import ColumnDataSource

### Constructing our column data source
* First, we need to import the object into our coding environment
* Then we create it, adding data to it as a dictionary...

In [3]:
#Construct a column data source of x(income),y(life expectancy),and also country)
theCDS = ColumnDataSource({'x':data.loc[2010]['income'],
                           'y':data.loc[2010]['life'],
                           'country':data.loc[2010]['Country']
                          })

With our CDS created and populated with data, we can construct our graph using this object. 

In [4]:
#Import the other Bokeh objects, etc.
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()

In [5]:
#Create a styling dictionary for our plot
PLOT_OPS = {'height':200,
            'x_axis_type':'log',
            'x_range':(100,100000),
            'y_range':(0,100)}

In [6]:
#Create our figure, with stylings from our PLOT_OPS dictionary
p = figure(**PLOT_OPS)
#Create our plot, but pull data from the CDS "source"
p.circle(x='income',
         y='life',
         source=data.loc[2010])
#Show it
show(p)

What advantage to we gain from this? Well, for one, it facilitates [**plot tools**](https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html). 

Let's examine the HoverTool, which enables the user to display value for a feature by hovering over that feature...

In [7]:
#Import the HoverTool object
from bokeh.models import HoverTool

In [9]:
#Configure the Hover tool to display data from the "country" field in the CDS
hover = HoverTool(tooltips='@country')
# **The rest is the same**
p = figure(**PLOT_OPS,tools=[hover])
#p.circle(x='income',y='life',source=data.loc[2010])
p.circle(x='x',
         y='y',
         source=theCDS)
show(p)

For more interesting behavior, see http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#custom-tooltip

### Using a Pandas dataframe as the ColumnDataSource
Instead of constructing a CDS from a dictionary, we can dump the entire dataframe as a column data source...

In [12]:
#Construct a column data source from a dataframe of the 2010 data
theCDS2 = ColumnDataSource(data.loc[2010])
theCDS2.column_names

['Year', 'Country', 'life', 'population', 'income', 'region']

Now we construct our plot as above, except we need to refer to the column names in the dataframe, not ones we assign... 

(I've also applied a size, color, and opacity value to the circles...)

In [64]:
#Configure the Hover tool to display data from the "country" field in the CDS
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size=8,
         color='green',
         alpha=0.6)
show(p)

## Scaling our point features
In the above, the size is static, but we can use a field to assign size values:

In [63]:
#Configure the Hover tool to display data from the "country" field in the CDS
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size='population',
         color='green',
         alpha=0.6)
show(p)

This doesn't quite work because the sizes we provided are rather large. So we need to scale them. Bokeh has a model called [LinearInterpolator](https://bokeh.pydata.org/en/latest/docs/reference/models/transforms.html#bokeh.models.transforms.LinearInterpolator) that can scale our actual population values to something more reasonable for screen pixel values...

In [22]:
from bokeh.models import LinearInterpolator

In [34]:
#Get the min and max values to scale
minPop = data.loc[2010]['population'].min()
maxPop = data.loc[2010]['population'].max()
print(minPop,maxPop)

52428.0 1340968737.0


In [40]:
#Rescale our population data to values between 5 and 50
size_mapper = LinearInterpolator(x=[minPop,maxPop],
                                 y=[5,50])

In [62]:
#Plot again, using the size_mapper transformer
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         #Map the size to the pop'n field, applying the transform
         size={'field':'population','transform':size_mapper},
         color='green',
         alpha=0.6)
show(p)

## Coloring our point features
Giving our point elements a meaningful size already increases the amount of information communicated by our data. But each point is still colored the same, overlooking another dimension of our data we can show. So now let's look at how we can assign meaningful color to the data in our plot. 

To do this we need to import another Bokeh model: the [`CategoricalColorMapper`](https://bokeh.pydata.org/en/latest/docs/reference/models/mappers.html#bokeh-models-mappers). This object allows us to link each value in a list we provide - we'll use countries - to a color in a color palette we specify. The list of color palettes we can use are provided [here](https://bokeh.pydata.org/en/latest/docs/reference/palettes.html). 

In [43]:
#import the CategoricalColorMapper object
from bokeh.models import CategoricalColorMapper

In [51]:
#import all the Bokeh pallettes
from bokeh.palettes import *

In [55]:
#Get the categorical data for our colors: 
regionList = list(data.loc[2010]['region'].unique())

In [57]:
#Link each item in our countries list to a color in a colormap
color_mapper = CategoricalColorMapper(factors=regionList,palette=Spectral6)

In [61]:
#Plot again, now using the color_mapper we just created
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size={'field':'population','transform':size_mapper},
         #Map the color to the region field, using the color_mapper
         color={'field':'region','transform':color_mapper},
         alpha=0.6)
show(p)

## Adding a legend
To add a legend, we simply add a new attribute to our plot: `legend='region'` (where `region` is the field we want to add).

In [76]:
#Plot again, now using the color_mapper we just created
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size={'field':'population','transform':size_mapper},
         color={'field':'region','transform':color_mapper},
         alpha=0.6,
         #Show a legend for the region column
         legend='region'
        )
#Set some legend properties
p.legend.border_line_color = 'red'
#Puts the legend off to the right
p.right = p.legend
show(p)