## Bokeh's `ColumnDataSource` object and more plotting goodies
In the last notebook we passed x and y data into our Bokeh figure as a pair of Pandas *series* objects. here in this notebook, we instead prepare our plot data as a Bokeh **ColumnDataSource** object. This happens behind the scenes anyway, so by doing this explicitly in our code, we actually get more capabilities in working with it. 

Resource: https://bokeh.pydata.org/en/latest/docs/reference/models/sources.html#bokeh.models.sources.ColumnDataSource

In [1]:
#Import the data, as in the previous notebook
import pandas as pd
data = pd.read_csv('./data/gapminder.csv',thousands=',',index_col='Year')

### Constructing our column data source object
* First, we need to import the object into our coding environment
* Then we create it, adding data to it as a dictionary...

In [2]:
#Import the ColumnDataSource object into our code
from bokeh.models import ColumnDataSource

In [3]:
#Construct a column data source of x(income),y(life expectancy),and also country)
theCDS = ColumnDataSource({'x':data.loc[2010]['income'],
                           'y':data.loc[2010]['life'],
                           'country':data.loc[2010]['Country']
                          })

In [4]:
#View the column names of the column data source object
theCDS.column_names

['x', 'y', 'country']

With our CDS created and populated with data, we can construct our graph using this "CDS" object. 

To do that, we again import our Bokeh plotting objects

In [5]:
#Import the other Bokeh objects, etc.
from bokeh.io import output_notebook, show
from bokeh.plotting import figure

In [6]:
#Activate Bokeh
output_notebook()

And now we are ready to construct the plot.
* We begin by creating a dictionary of the figure styling properties. 

In [7]:
#Create a styling dictionary for our plot
PLOT_OPS = {'height':200,
            'x_axis_type':'log',
            'x_range':(100,100000),
            'y_range':(0,100)}

* Now we create our figure (applying the stylings created above) and then add our XY data to our figure. <br>*Note, however, that the data here are coming from our ColumnDataSource object (`theCDS`) with the columns specified by their dictionary keys*. 

In [8]:
#Create our figure, with stylings from our PLOT_OPS dictionary
p = figure(**PLOT_OPS)
#Create our plot, but pull data from the CDS "source"
p.circle(x='x',
         y='y',
         source=theCDS)
#Show it
show(p)

What advantage to we gain from this? Well, for one, it facilitates [**plot tools**](https://bokeh.pydata.org/en/latest/docs/user_guide/tools.html). 

Let's examine the HoverTool, which enables the user to display value for a feature by hovering over that feature...

In [9]:
#Import the HoverTool object
from bokeh.models import HoverTool

In [10]:
#Configure the Hover tool to display data from the "country" field in the CDS
hover = HoverTool(tooltips='@country')
# **The rest is the same - EXCEPT, we limit the Bokeh tools to just the Hover tool**
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='x',
         y='y',
         source=theCDS)
show(p)

For more interesting behavior, see http://bokeh.pydata.org/en/latest/docs/user_guide/tools.html#custom-tooltip

### Using a Pandas dataframe as the ColumnDataSource
Instead of constructing a CDS from a dictionary, we can dump a full Pandas dataframe as a column data source...

In [11]:
#Construct a column data source from a dataframe of the 2010 data
theCDS2 = ColumnDataSource(data.loc[2010])
theCDS2.column_names

['Year', 'Country', 'life', 'population', 'income', 'region']

Now we construct our plot as above, except we need to refer to the column names in the dataframe, not ones we assign... <br>*(I've also applied a **size**, **color**, and **alpha** (opacity) values to the circles...)*

In [12]:
#Configure the Hover tool to display data from the "country" field in the CDS
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size=8,          #Sets the size of the circles, in pixel units
         color='green',   #Sets the color of the circles 
         alpha=0.6)       #Sets the opacity of the circles, in percent
show(p)

## Assigning sizes our point features
In the above, the size is static, but we can use a field to assign size values to add more information to our visualization.

In [13]:
#Configure the Hover tool to display data from the "Country" field in the CDS
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size='population',
         color='green',
         alpha=0.6)
show(p)

#### Scaling the size values of our point features
This doesn't quite work because the sizes we provided are rather large (China and India would be shown as over a billion pixel units!). The solution is to scale the population values to reasonable pixel values. 

Bokeh has a model called [LinearInterpolator](https://bokeh.pydata.org/en/latest/docs/reference/models/transforms.html#bokeh.models.transforms.LinearInterpolator) that can do just that. We specify a min and max limit of our input values (actual population) and the Linear Interpolator will scale these to a second pair of min and max values (e.g. pixel values).

In [14]:
#Import the LinearInterpolator model
from bokeh.models import LinearInterpolator

In [15]:
#Use Pandas to get the min and max population values to set the input scale
minPop = data.loc[2010]['population'].min()
maxPop = data.loc[2010]['population'].max()
print(minPop,maxPop)

52428.0 1340968737.0


In [16]:
#Rescale our population data to values between 5 and 50
size_mapper = LinearInterpolator(x=[minPop,maxPop], #x are the input, unscaled value bookends
                                 y=[5,50])          #y are the new bookends to which we want to fit the data

In [17]:
#Plot again, using the size_mapper transformer
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         #Map the size to the pop'n field, applying the transform
         size={'field':'population','transform':size_mapper},
         color='green',
         alpha=0.6)
show(p)

## Coloring our point features
Giving our point elements a meaningful size already increases the amount of information communicated by our data. But each point is still colored the same, overlooking another dimension of our data we can show. So now let's look at how we can assign meaningful color to the data in our plot. 

To do this we need to import another Bokeh model: the [`CategoricalColorMapper`](https://bokeh.pydata.org/en/latest/docs/reference/models/mappers.html#bokeh-models-mappers). This object allows us to link each value in a list we provide - we'll use countries - to a color in a color palette we specify. The list of color palettes we can use are provided [here](https://bokeh.pydata.org/en/latest/docs/reference/palettes.html). 

In [18]:
#import the CategoricalColorMapper object
from bokeh.models import CategoricalColorMapper

In [19]:
#import the Spectral6 Bokeh pallette
from bokeh.palettes import Spectral6

In [20]:
#Get the categorical data for our colors: 
regionList = list(data.loc[2010]['region'].unique())

In [21]:
#Link each item in our countries list to a color in a colormap
color_mapper = CategoricalColorMapper(factors=regionList,palette=Spectral6)

In [22]:
#Plot again, now using the color_mapper we just created
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size={'field':'population','transform':size_mapper},
         #Map the color to the region field, using the color_mapper
         color={'field':'region','transform':color_mapper},
         alpha=0.6)
show(p)

## Adding a legend
To add a legend, we simply add a new attribute to our plot: `legend='region'` (where `region` is the field we want to add).

Also note that we can modify settings of our plot object by referring to them (`p.legend...`).

In [33]:
#Plot again, now using the color_mapper we just created
hover = HoverTool(tooltips='@Country')
p = figure(**PLOT_OPS,tools=[hover])
p.circle(x='income',
         y='life',
         source=theCDS2,
         size={'field':'population','transform':size_mapper},
         color={'field':'region','transform':color_mapper},
         alpha=0.6,
         #Show a legend for the region column
         legend='region'
        )
#Set some legend properties
p.legend.border_line_color = 'red'
#Puts the legend off to the right
p.right = p.legend
# Add x and y axes labels
p.xaxis.axis_label = "Income"
p.yaxis.axis_label = "Life Expectancy (years)"
#Add a title
p.title.text = "Life Expectancy vs Income: 2010"
#Override default sizes of our plit
p.height=500
p.width =800
#Show the plot
show(p)

If we want to save our plot as a shareable HTML file, we can change the Bokeh io value from `output_notebook` to `output_file`:

In [34]:
#Import the bokeh functionality
from bokeh.io import output_file
#Direct our output to a file, not the notebook
output_file('myPlot.html')
#Show the plot again, this time it will also generate the `myPlot.html` file in your folder.
show(p)

► *Have a look at the source of the generated HTML file (ctrl-U). What does it reveal with respect to how this page was constructed?*

### RECAP
Compare our final figure here with that of the previous notebook: it includes much more information than the basic XY plot of life expectancy vs income. The changes include a few simple (sort of) tweaks to the plot: using one attribute linked to the size of the points and another attribute linked to color the circles. We also allow the user to inspect the country behind a specific circle by hovering over the feature. 

To enable these tweaks though, we had to dig into the structure of Bokeh a bit more, specifically the ColumnDataSource object and how Bokeh links to the data we want to plot. We also see that Bokeh has some nice scaling tools. 