<table style="float:left; border:none">
   <tr style="border:none; background-color: #ffffff">
       <td style="border:none">
           <a href="http://bokeh.pydata.org/">     
           <img 
               src="assets/bokeh-transparent.png" 
               style="width:50px"
           >
           </a>    
       </td>
       <td style="border:none">
           <h1>Bokeh 教程</h1>
       </td>
   </tr>
</table>

<div style="float:right;"><h2>07. 柱状图和分类数据图</h2></div>

In [1]:
from bokeh.io import show, output_notebook
from bokeh.plotting import figure

output_notebook()

## 基本柱状图

柱状图是一种常见而重要的绘图类型。Bokeh可以非常简单地创建各种堆叠或嵌套的柱状图，并处理普通的分类数据。

下面的例子展示了一个简单的柱状图的绘制方法，采用 `vbar` 方法画竖条（有一个对应的 `hbar` 画横条）。我们还设置一些绘图属性使图表更好看，请参考 [Styling and Theming](02 - Styling and Theming.ipynb) 中视觉属性的相关信息。

In [2]:
# Here is a list of categorical values (or factors)
fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']

# Set the x_range to the list of categories above
p = figure(x_range=fruits, plot_height=250, title="Fruit Counts")

# Categorical values can also be used as coordinates
p.vbar(x=fruits, top=[5, 3, 4, 2, 4, 6], width=0.9)

# Set some properties to make the plot look better
p.xgrid.grid_line_color = None
p.y_range.start = 0

show(p)

如果我们想创建一个分类范围的图表，我们把分类值的有序列表传给 `figure`，如 `x_range=['a', 'b', 'c']`。在上图中，我们给 `x_range` 传入水果列表，然后我们可以看到其体现为x轴。

`vbar` 方法参数是作为柱条的中心的 `x` 位置，`top`，`bottom`（默认值为0）和 `width`。当我们在这里使用一个分类范围时，每个类别隐含的宽度为1，所以，像我们在这里所做的那样，设置宽度= 0.9，可使柱条收缩留出空隙。（另一种选择是在范围内添加一些空隙。）

In [3]:
# Exercise: Create your own simple bar chart


由于 `vbar` 是一个glyph方法，就像其它glyph一样，我们可以使用 `ColumnDataSource`。在下面的例子中，我们把数据（包括颜色数据）放在 `ColumnDataSource` 中以驱动我们的图表。我们还添加了一个图例，请参见 [Adding Annotations.ipynb](03 - Adding Annotations.ipynb) 中有关的图例和其他注释的更多信息。

In [4]:
from bokeh.models import ColumnDataSource
from bokeh.palettes import Spectral6

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
counts = [5, 3, 4, 2, 4, 6]

source = ColumnDataSource(data=dict(fruits=fruits, counts=counts, color=Spectral6))

p = figure(x_range=fruits, plot_height=250, y_range=(0, 9), title="Fruit Counts")
p.vbar(x='fruits', top='counts', width=0.9, color='color', legend="fruits", source=source)

p.xgrid.grid_line_color = None
p.legend.orientation = "horizontal"
p.legend.location = "top_center"

show(p)

In [5]:
# Exercise: Create your own simple bar chart driven by a ColumnDataSource


## 堆叠柱

这是会经常用到的。

In [6]:
from bokeh.palettes import GnBu3, OrRd3

years = ['2015', '2016', '2017']

exports = {'fruits' : fruits,
           '2015'   : [2, 1, 4, 3, 2, 4],
           '2016'   : [5, 3, 4, 2, 4, 6],
           '2017'   : [3, 2, 4, 4, 5, 3]}
imports = {'fruits' : fruits,
           '2015'   : [-1, 0, -1, -3, -2, -1],
           '2016'   : [-2, -1, -3, -1, -2, -2],
           '2017'   : [-1, -2, -1, 0, -2, -2]}

p = figure(y_range=fruits, plot_height=250, x_range=(-16, 16), title="Fruit import/export, by year")

p.hbar_stack(years, y='fruits', height=0.9, color=GnBu3, source=ColumnDataSource(exports),
             legend=["%s exports" % x for x in years])

p.hbar_stack(years, y='fruits', height=0.9, color=OrRd3, source=ColumnDataSource(imports),
             legend=["%s imports" % x for x in years])

p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None
p.legend.location = "center_left"

show(p)

请注意，通过如下语句，我们在分类范围 *周围* 添加了一下空隙（在轴两端）

```
p.y_range.range_padding = 0.1
```

In [8]:
# Create a stacked bar chart with a single call to vbar_stack


## 分组柱状图



In [9]:
from bokeh.models import FactorRange

fruits = ['Apples', 'Pears', 'Nectarines', 'Plums', 'Grapes', 'Strawberries']
years = ['2015', '2016', '2017']

data = {'fruits' : fruits,
        '2015'   : [2, 1, 4, 3, 2, 4],
        '2016'   : [5, 3, 3, 2, 4, 6],
        '2017'   : [3, 2, 4, 4, 5, 3]}

# this creates [ ("Apples", "2015"), ("Apples", "2016"), ("Apples", "2017"), ("Pears", "2015), ... ]
x = [ (fruit, year) for fruit in fruits for year in years ]
counts = sum(zip(data['2015'], data['2016'], data['2017']), ()) # like an hstack

source = ColumnDataSource(data=dict(x=x, counts=counts))

p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year")

p.vbar(x='x', top='counts', width=0.9, source=source)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

In [9]:
# Exercise: Make the chart above have a different color for each year by adding colors to the ColumnDataSource


另一个设置柱条颜色的方法是用数据转换。我们最早看到数据转换是前面章节 [数据源和数据转换](04 - Data Sources and Transformations.ipynb)。这里，我们用一个新的`factor_cmap`来接收一个列名用于颜色映射，以及定义颜色映射的调色板和要素。

此外，如果需要，我们可以设置它只映射子要素。例如，在这个例子里，我们不希望给每一个 `(fruit, year)` 不同的阴影，相反，我们只想根据 `year` 给出阴影。所以我们通过`start=1` 和 `end=2` 来指定颜色映射时各要素的切片范围。然后，我们把结果作为`fill_color` 的值，以根据底层数据自动应用不同颜色：

```
    fill_color=factor_cmap('x', palette=['firebrick', 'olive', 'navy'], factors=years, start=1, end=2))
```

In [10]:
from bokeh.transform import factor_cmap

p = figure(x_range=FactorRange(*x), plot_height=250, title="Fruit Counts by Year")

p.vbar(x='x', top='counts', width=0.9, source=source, line_color="white",

       # use the palette to colormap based on the the x[1:2] values
       fill_color=factor_cmap('x', palette=['firebrick', 'olive', 'navy'], factors=years, start=1, end=2))

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None

show(p)

还可以使用另一种称为“视觉回避”的技术来实现分组的柱状图。这是有用的，例如，如果你只希望有水果类型的轴标签，而不包括轴上的年份。本教程不涉及该技术，但您可以在 [用户指南](http://bokeh.pydata.org/en/dev/docs/user_guide/categorical.html#visual-dodge) 中找到相关信息。

## Mixing Categorical Levels（混合分类级别）

In [11]:
factors = [("Q1", "jan"), ("Q1", "feb"), ("Q1", "mar"),
           ("Q2", "apr"), ("Q2", "may"), ("Q2", "jun"),
           ("Q3", "jul"), ("Q3", "aug"), ("Q3", "sep"),
           ("Q4", "oct"), ("Q4", "nov"), ("Q4", "dec")]

p = figure(x_range=FactorRange(*factors), plot_height=250)

x = [ 10, 12, 16, 9, 10, 8, 12, 13, 14, 14, 12, 16 ]
p.vbar(x=factors, top=x, width=0.9, alpha=0.5)

qs, aves = ["Q1", "Q2", "Q3", "Q4"], [12, 9, 13, 14]
p.line(x=qs, y=aves, color="red", line_width=3)
p.circle(x=qs, y=aves, line_color="red", fill_color="white", size=10)

p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None

show(p)

## 使用 Pandas `GroupBy`

In [12]:
from bokeh.sampledata.autompg import autompg_clean as df

df.cyl = df.cyl.astype(str)
df.head()

Unnamed: 0,mpg,cyl,displ,hp,weight,accel,yr,origin,name,mfr
0,18.0,8,307.0,130,3504,12.0,70,North America,chevrolet chevelle malibu,chevrolet
1,15.0,8,350.0,165,3693,11.5,70,North America,buick skylark 320,buick
2,18.0,8,318.0,150,3436,11.0,70,North America,plymouth satellite,plymouth
3,16.0,8,304.0,150,3433,12.0,70,North America,amc rebel sst,amc
4,17.0,8,302.0,140,3449,10.5,70,North America,ford torino,ford


In [13]:
from bokeh.palettes import Spectral5


group = df.groupby(('cyl'))

source = ColumnDataSource(group)
cyl_cmap = factor_cmap('cyl', palette=Spectral5, factors=sorted(df.cyl.unique()))

p = figure(plot_height=350, x_range=group)
p.vbar(x='cyl', top='mpg_mean', width=1, line_color="white", 
       fill_color=cyl_cmap, source=source)

p.xgrid.grid_line_color = None
p.xaxis.axis_label = "number of cylinders"
p.yaxis.axis_label = "Mean MPG"
p.y_range.start = 0

show(p)

In [14]:
# Exercise: Use the same dataset to make a similar plot of mean horsepower (hp) by origin


## Catgorical Scatterplots（分类散列图）

In [15]:
from bokeh.sampledata.commits import data

data.head()

Unnamed: 0_level_0,day,time
datetime,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-04-22 15:11:58-05:00,Sat,15:11:58
2017-04-21 14:20:57-05:00,Fri,14:20:57
2017-04-20 14:35:08-05:00,Thu,14:35:08
2017-04-20 10:34:29-05:00,Thu,10:34:29
2017-04-20 09:17:23-05:00,Thu,09:17:23


In [16]:
from bokeh.transform import jitter

DAYS = ['Sun', 'Sat', 'Fri', 'Thu', 'Wed', 'Tue', 'Mon']

source = ColumnDataSource(data)

p = figure(plot_width=800, plot_height=300, y_range=DAYS, x_axis_type='datetime', 
           title="Commits by Time of Day (US/Central) 2012—2016")

p.circle(x='time', y=jitter('day', width=0.6, range=p.y_range),  source=source, alpha=0.3)

p.xaxis[0].formatter.days = ['%Hh']
p.x_range.range_padding = 0
p.ygrid.grid_line_color = None

show(p)

In [17]:
# Exercise:
