*Note: All output_file() calls have been replaced with output_notebook() so that plots will display inline.*

# The Basics of Bokeh

## Your First Plot

To implement and use Bokeh, we first import some basics that we need from the **bokeh.plotting** module.

**figure** is the core object that we will use to create plots. figure handles the styling of plots, including title, labels, axes, and grids, and it exposes methods for adding data to the plot. The **output_file** function defines how the visualization will be rendered (namely to an html file) and the show function will be invoked when the plot is ready for output. **show** tells Bokeh that all of the data has been added to the plot and it is time to render it.

In [1]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.models.tools import LassoSelectTool, CrosshairTool, HoverTool
output_notebook()

x = [1, 3, 5, 7]
y = [2, 4, 6, 8]

p = figure()

# After instantiating the figure, we call the circle , line, and triangle methods to plot our data. 
p.circle(x, y, size=10, color='red',  legend='circle')
p.line(x, y, color='blue', legend='line')
p.triangle(y, x, color='yellow', size=10, legend='triangle')

# By setting a click_policy on our legend, a user can now click on each legend entry (e.g. circle, line, triangle) 
#to show/hide that piece of data!
p.legend.click_policy='hide'
show(p)

  return f(*args, **kwds)
  return f(*args, **kwds)


# Bokeh and Pandas: Exploring the WWII THOR Dataset

## Loading Data in Pandas

In [2]:
import pandas as pd

df = pd.read_csv('thor_wwii.csv')
# an abridged representation of the loaded data.
df

Unnamed: 0,MSNDATE,THEATER,COUNTRY_FLYING_MISSION,NAF,UNIT_ID,AIRCRAFT_NAME,AC_ATTACKING,TAKEOFF_BASE,TAKEOFF_COUNTRY,TAKEOFF_LATITUDE,TAKEOFF_LONGITUDE,TGT_COUNTRY,TGT_LOCATION,TGT_LATITUDE,TGT_LONGITUDE,TONS_HE,TONS_IC,TONS_FRAG,TOTAL_TONS
0,03/30/1941,ETO,GREAT BRITAIN,RAF,84 SQDN,BLENHEIM,10.0,,,,,ALBANIA,ELBASAN,41.100000,20.070000,0.0,0.0,0.0,0.0
1,11/24/1940,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,DURAZZO,41.320000,19.450000,0.0,0.0,0.0,0.0
2,12/04/1940,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,TEPELENE,40.300000,20.020000,0.0,0.0,0.0,0.0
3,12/31/1940,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,VALONA,40.470000,19.490000,0.0,0.0,0.0,0.0
4,01/06/1941,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,VALONA,40.470000,19.490000,0.0,0.0,0.0,0.0
5,02/12/1941,ETO,GREAT BRITAIN,RAF,84 SQDN,BLENHEIM,9.0,,,,,ALBANIA,ELBASAN,41.100000,20.070000,0.0,0.0,0.0,0.0
6,02/12/1941,ETO,GREAT BRITAIN,RAF,11 SQDN,BLENHEIM,9.0,,,,,ALBANIA,ELBASAN - DUKAJ AREA,41.100000,20.070000,0.0,0.0,0.0,0.0
7,03/04/1941,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,NAVAL UNITS OFF HIMARE,40.000000,19.750000,0.0,0.0,0.0,0.0
8,03/07/1941,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,BESIST,40.560000,19.530000,0.0,0.0,0.0,0.0
9,03/07/1941,ETO,GREAT BRITAIN,RAF,211 SQDN,BLENHEIM,9.0,,,,,ALBANIA,DRAGOTI,40.910000,20.050000,0.0,0.0,0.0,0.0


This shows that we have 178,281 records of missions with 19 columns per record.

In [3]:
# To see what the 19 columns are in full
df.columns.tolist()

['MSNDATE',
 'THEATER',
 'COUNTRY_FLYING_MISSION',
 'NAF',
 'UNIT_ID',
 'AIRCRAFT_NAME',
 'AC_ATTACKING',
 'TAKEOFF_BASE',
 'TAKEOFF_COUNTRY',
 'TAKEOFF_LATITUDE',
 'TAKEOFF_LONGITUDE',
 'TGT_COUNTRY',
 'TGT_LOCATION',
 'TGT_LATITUDE',
 'TGT_LONGITUDE',
 'TONS_HE',
 'TONS_IC',
 'TONS_FRAG',
 'TOTAL_TONS']

MSNDATE (mission date), NAF (numbered airforce responsible for mission), AC_ATTACKING (number of aircraft), TONS_HE (high-explosives), TONS_IC (incendiary devices), TONS_FRAG (fragmentation bombs)

## The Bokeh ColumnDataSource

**ColumnDataSource** links Pandas’ DataFrame with Bokeh visualizations 

In [4]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
output_notebook()

df = pd.read_csv('thor_wwii.csv')

#we don’t want to plot all 170,000+ rows in our scatterplot 
#(which would require a longer processing time to generate and would create a confusing plot 
#due to the volume of overlapping data), we randomly sample 50 rows
sample = df.sample(50)

source = ColumnDataSource(sample)
# create our figure object and call the circle glyph method to plot our data.
p = figure()
# p.circle(x='TOTAL_TONS', y='AC_ATTACKING', 
#          source=source, 
#          size=10, color='green')

# create a scatter plot of the number of attacking aircraft versus the tons of munitions dropped
p.circle(x='AC_ATTACKING', y='TONS_HE', 
         source=source, 
         size='TONS_HE', color='red')  #The size of each dot will then reflect the tons of high explosives used.

# add a title and label our axes
p.title.text = 'Attacking Aircraft and TONS HE Dropped'
p.xaxis.axis_label = 'Tons of TONS_HE Dropped'
p.yaxis.axis_label = 'Number of Attacking Aircraft'

# HoverTool allows you to set a tooltips property which takes a list of tuples.
hover = HoverTool()
hover.tooltips=[
    ('Attack Date', '@MSNDATE'),
    ('Attacking Aircraft', '@AC_ATTACKING'),
    ('Tons of HE', '@TONS_HE'),
    ('Type of Aircraft', '@AIRCRAFT_NAME')
]

p.add_tools(hover)

show(p)

In [5]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool
output_notebook()


df = pd.read_csv('thor_wwii.csv')

#we don’t want to plot all 170,000+ rows in our scatterplot 
#(which would require a longer processing time to generate and would create a confusing plot 
#due to the volume of overlapping data), we randomly sample 50 rows
sample = df.sample(50)

source = ColumnDataSource(sample)
# create our figure object and call the circle glyph method to plot our data.
p = figure()
# p.circle(x='TOTAL_TONS', y='AC_ATTACKING', 
#          source=source, 
#          size=10, color='green')

# create a scatter plot of the number of attacking aircraft versus the tons of munitions dropped
p.circle(x='TOTAL_TONS', y='AC_ATTACKING', 
         source=source, 
         size='TONS_HE', color='green')  #The size of each dot will then reflect the tons of high explosives used.

# add a title and label our axes
p.title.text = 'Attacking Aircraft and Munitions Dropped'
p.xaxis.axis_label = 'Tons of Munitions Dropped'
p.yaxis.axis_label = 'Number of Attacking Aircraft'

# HoverTool allows you to set a tooltips property which takes a list of tuples.
hover = HoverTool()
hover.tooltips=[
    ('Attack Date', '@MSNDATE'),
    ('Attacking Aircraft', '@AC_ATTACKING'),
    ('Tons of Munitions', '@TOTAL_TONS'),
    ('Type of Aircraft', '@AIRCRAFT_NAME')
]

p.add_tools(hover)

show(p)

# Categorical Data and Bar Charts: Munitions Dropped by Country

create a bar chart that shows the total tons of munitions dropped by each country listed in our csv

In [9]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models.tools import HoverTool

from bokeh.palettes import Spectral5
from bokeh.transform import factor_cmap
output_notebook()

df = pd.read_csv('thor_wwii.csv')

#We now need to get from the 170,000+ records of individual missions 
#to one record per attacking country with the total munitions dropped

grouped = df.groupby('COUNTRY_FLYING_MISSION')['TOTAL_TONS', 'TONS_HE', 'TONS_IC', 'TONS_FRAG'].sum()

# convert to kilotons by dividing by 1000
grouped = grouped / 1000

source = ColumnDataSource(grouped)

# create a list of countries from our source object, using source.data and the column name as key
countries = source.data['COUNTRY_FLYING_MISSION'].tolist()
p = figure(x_range=countries)

# This creates a special color map that matches an individual color to each category
color_map = factor_cmap(field_name='COUNTRY_FLYING_MISSION', 
                    palette=Spectral5, factors=countries)

#Instead of using a y parameter, however, the vbar method takes a top parameter. 
#A bottom parameter can equally be specified, but if left out, its default value is 0.
p.vbar(x='COUNTRY_FLYING_MISSION', top='TOTAL_TONS', source=source, width=0.70, color=color_map)

p.title.text ='Munitions Dropped by Allied Country'
p.xaxis.axis_label = 'Country'
p.yaxis.axis_label = 'Kilotons of Munitions'

hover = HoverTool()
hover.tooltips = [
    ("Totals", "@TONS_HE High Explosive / @TONS_IC Incendiary / @TONS_FRAG 	Fragmentation")]

hover.mode = 'vline'

p.add_tools(hover)

show(p)

# Stacked Bar Charts and Sub-sampling Data: Types of Munitions Dropped by Country

In [7]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral3
output_notebook()

df = pd.read_csv('thor_wwii.csv')

filter = df['COUNTRY_FLYING_MISSION'].isin(('USA','GREAT BRITAIN'))
df = df[filter]

grouped = df.groupby('COUNTRY_FLYING_MISSION')['TONS_IC', 'TONS_FRAG', 'TONS_HE'].sum()
grouped = grouped / 1000

source = ColumnDataSource(grouped)
countries = source.data['COUNTRY_FLYING_MISSION'].tolist()
p = figure(x_range=countries)

p.vbar_stack(stackers=['TONS_HE', 'TONS_FRAG', 'TONS_IC'], 
             x='COUNTRY_FLYING_MISSION', source=source, 
             legend = ['High Explosive', 'Fragmentation', 'Incendiary'],
             width=0.5, color=Spectral3)

p.title.text ='Types of Munitions Dropped by Allied Country'
p.legend.location = 'top_left'

p.xaxis.axis_label = 'Country'
p.xgrid.grid_line_color = None	#remove the x grid lines

p.yaxis.axis_label = 'Kilotons of Munitions'

show(p)

# Time-Series, Annotations, and Multiple Plots: Bombing Operations over Time

In [8]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral3
output_notebook()

df = pd.read_csv('thor_wwii.csv')

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

grouped = df.groupby('MSNDATE')['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG'].sum()
grouped = grouped/1000

source = ColumnDataSource(grouped)

p = figure(x_axis_type='datetime')

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, legend='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_width=2, source=source, color=Spectral3[1], legend='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Spectral3[2], legend='Incendiary')

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

show(p)

## Resampling Time-Series Data

In [9]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral3
output_notebook()

df = pd.read_csv('thor_wwii.csv')

#make sure MSNDATE is a datetime format
df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')

grouped = df.groupby(pd.Grouper(key='MSNDATE', freq='M'))['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG'].sum()
grouped = grouped/1000

source = ColumnDataSource(grouped)

p = figure(x_axis_type='datetime')

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, legend='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_width=2, source=source, color=Spectral3[1], legend='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Spectral3[2], legend='Incendiary')

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

show(p)

## Annotating Trends in Plots

In [10]:
import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from datetime import datetime
from bokeh.palettes import Spectral3
output_file('eto_operations.html')

df = pd.read_csv('thor_wwii.csv')

#filter for the European Theater of Operations
filter = df['THEATER']=='ETO'
df = df[filter]

df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')
group = df.groupby(pd.Grouper(key='MSNDATE', freq='M'))['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG'].sum()
group = group / 1000

source = ColumnDataSource(group)

p = figure(x_axis_type="datetime")

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, legend='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_width=2, source=source, color=Spectral3[1], legend='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Spectral3[2], legend='Incendiary')

p.title.text = 'European Theater of Operations'

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

show(p)

In [11]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource
from bokeh.models import BoxAnnotation, Label
from datetime import datetime
from bokeh.palettes import Spectral3
output_notebook()

df = pd.read_csv('thor_wwii.csv')

#filter for the European Theater of Operations
filter = df['THEATER']=='ETO'
df = df[filter]

df['MSNDATE'] = pd.to_datetime(df['MSNDATE'], format='%m/%d/%Y')
group = df.groupby(pd.Grouper(key='MSNDATE', freq='M'))['TOTAL_TONS', 'TONS_IC', 'TONS_FRAG'].sum()
group = group / 1000

source = ColumnDataSource(group)

p = figure(x_axis_type="datetime")

p.line(x='MSNDATE', y='TOTAL_TONS', line_width=2, source=source, legend='All Munitions')
p.line(x='MSNDATE', y='TONS_FRAG', line_width=2, source=source, color=Spectral3[1], legend='Fragmentation')
p.line(x='MSNDATE', y='TONS_IC', line_width=2, source=source, color=Spectral3[2], legend='Incendiary')

p.title.text = 'European Theater of Operations'

p.yaxis.axis_label = 'Kilotons of Munitions Dropped'

box_left = pd.to_datetime('6-6-1944')
box_right = pd.to_datetime('16-12-1944')

box = BoxAnnotation(left=box_left, right=box_right,
                    line_width=1, line_color='black', line_dash='dashed',
                    fill_alpha=0.2, fill_color='orange')

p.add_layout(box)
show(p)

# Spatial Data: Mapping Target Locations

In [12]:
import pandas as pd
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import ColumnDataSource, Range1d
from bokeh.layouts import layout
from bokeh.palettes import Spectral3
from bokeh.tile_providers import CARTODBPOSITRON
from pyproj import Proj, transform
from bokeh.models.tools import HoverTool

def LongLat_to_EN(long, lat):
    try:
      easting, northing = transform(
        Proj(init='epsg:4326'), Proj(init='epsg:3857'), long, lat)
      return easting, northing
    except:
      return None, None

df = pd.read_csv('thor_wwii.csv')
#convert all lat/long to webmercator and store in new column
df['E'], df['N'] = zip(*df.apply(lambda x: LongLat_to_EN(x['TGT_LONGITUDE'], x['TGT_LATITUDE']), axis=1))

grouped = df.groupby(['E', 'N'])['TONS_FRAG', 'TONS_IC'].sum().reset_index()

filter = grouped['TONS_FRAG']!=0
grouped = grouped[filter]

source = ColumnDataSource(grouped)

left = -2150000
right = 18000000
bottom = -5300000
top = 11000000

p = figure(x_range=Range1d(left, right), y_range=Range1d(bottom, top))
p.add_tile(CARTODBPOSITRON)

p.circle(x='E', y='N', source=source, line_color='grey', fill_color=Spectral3[1])

p.axis.visible = False

hover = HoverTool(tooltips=[
    ("Fragmentation Bombs", "@TONS_FRAG tons")
])

p.add_tools(hover)


output_notebook()
show(p)

ModuleNotFoundError: No module named 'pyproj'