# Activity 6.01: Plotting Mean Car Prices of Manufacturers

This activity will combine everything that you have learned about Bokeh so far. We will use this knowledge to create a visualization that displays the mean price of each car manufacturer of our dataset.

Our automobile dataset contains the following columns:

- make: Manufacturer of the car
- fuel-type: Diesel or gas
- num-of-doors: Number of doors
- body-style: Body style of the car, for example, convertible
- engine-location: Front or rear
- length: Continuous from 141.1 to 208.1
- width: Continuous from 60.3 to 72.3
- height: Continuous from 47.8 to 59.8
- num-of-cylinders: Number of cylinders, for example, eight
- horsepower: Amount of horsepower
- peak-rpm: Maximum RPM
- city-mpg: Fuel consumption in the city
- highway-mpg: Fuel consumption on the highway
- price: Price of the car

Note that we will use only the make and price columns in our activity.

In the process, we will first plot all cars with their prices and then slowly develop a more sophisticated visualization that also uses color to visually focus the manufacturers with the highest mean prices.

In [24]:
import pandas as pd 
import numpy as np 
import bokeh.plotting as plt 
import bokeh.io as io
from bokeh.models import LinearColorMapper

io.output_notebook()

In [14]:
# load dataset
df = pd.read_csv('../../Datasets/automobiles.csv')
df

Unnamed: 0,make,fuel-type,num-of-doors,body-style,engine-location,length,width,height,num-of-cylinders,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,alfa-romero,gas,two,convertible,front,168.8,64.1,48.8,four,111,5000,21,27,13495
1,alfa-romero,gas,two,convertible,front,168.8,64.1,48.8,four,111,5000,21,27,16500
2,alfa-romero,gas,two,hatchback,front,171.2,65.5,52.4,six,154,5000,19,26,16500
3,audi,gas,four,sedan,front,176.6,66.2,54.3,four,102,5500,24,30,13950
4,audi,gas,four,sedan,front,176.6,66.4,54.3,five,115,5500,18,22,17450
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,volvo,gas,four,sedan,front,188.8,68.9,55.5,four,114,5400,23,28,16845
197,volvo,gas,four,sedan,front,188.8,68.8,55.5,four,160,5300,19,25,19045
198,volvo,gas,four,sedan,front,188.8,68.9,55.5,six,134,5500,18,23,21485
199,volvo,diesel,four,sedan,front,188.8,68.9,55.5,six,106,4800,26,27,22470


In [15]:
# add 'index' column
df['index'] = df.index
df

Unnamed: 0,make,fuel-type,num-of-doors,body-style,engine-location,length,width,height,num-of-cylinders,horsepower,peak-rpm,city-mpg,highway-mpg,price,index
0,alfa-romero,gas,two,convertible,front,168.8,64.1,48.8,four,111,5000,21,27,13495,0
1,alfa-romero,gas,two,convertible,front,168.8,64.1,48.8,four,111,5000,21,27,16500,1
2,alfa-romero,gas,two,hatchback,front,171.2,65.5,52.4,six,154,5000,19,26,16500,2
3,audi,gas,four,sedan,front,176.6,66.2,54.3,four,102,5500,24,30,13950,3
4,audi,gas,four,sedan,front,176.6,66.4,54.3,five,115,5500,18,22,17450,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
196,volvo,gas,four,sedan,front,188.8,68.9,55.5,four,114,5400,23,28,16845,196
197,volvo,gas,four,sedan,front,188.8,68.8,55.5,four,160,5300,19,25,19045,197
198,volvo,gas,four,sedan,front,188.8,68.9,55.5,six,134,5500,18,23,21485,198
199,volvo,diesel,four,sedan,front,188.8,68.9,55.5,six,106,4800,26,27,22470,199


Create a new figure and plot each car using a scatter plot with the index and price column. Give the visualization a title of Car prices and name the x-axis Car Index. The y-axis should be named Price.

In [16]:
plot = plt.figure(
    title='Car prices', 
    x_axis_label='Car Index', 
    y_axis_label='Price'
)
 
plot.scatter(
    x=df['index'], 
    y=df['price'],
    size=10
)
 
plt.show(plot)

## Grouping cars from manufacturers together

Group the dataset using ```groupby``` and the column ```make```. Then use the ```mean``` method to get the mean value for each column. We don't want the make column to be used as an index, so provide the ```as_index=False``` argument to ```groupby```.

In [17]:
df_by_make = df.groupby(by=['make'], as_index=False).mean()
df_by_make

Unnamed: 0,make,length,width,height,city-mpg,highway-mpg,price,index
0,alfa-romero,169.6,64.566667,50.0,20.333333,26.666667,15498.333333,1.0
1,audi,184.766667,68.85,54.833333,19.333333,24.5,17859.166667,5.5
2,bmw,184.5,66.475,54.825,19.375,25.375,26118.75,12.5
3,chevrolet,151.933333,62.5,52.4,41.0,46.333333,6007.0,18.0
4,dodge,160.988889,64.166667,51.644444,28.0,34.111111,7875.444444,24.0
5,honda,160.769231,64.384615,53.238462,30.384615,35.461538,8184.692308,35.0
6,isuzu,171.65,63.5,52.45,24.0,29.0,8916.5,42.5
7,jaguar,196.966667,69.933333,51.133333,14.333333,18.333333,34600.0,45.0
8,mazda,170.805882,65.588235,53.358824,25.705882,31.941176,10652.882353,55.0
9,mercedes-benz,195.2625,71.0625,55.725,18.5,21.0,33647.0,67.5


Create a new figure with a ```title``` of Car Manufacturer Mean Prices, an ```x-axis``` of Car Manufacturer, and a ```y-label``` of Mean Price. In addition to that, handle the categorical data by providing the ```x_range``` argument to the figure with the make column.

In [23]:
plot2 = plt.figure(
    title='Car Manufacturer Mean Prices', 
    x_axis_label='Car Manufacturer', 
    y_axis_label='Mean Price',
    x_range=df_by_make['make']
)
 
plot2.scatter(
    x=df_by_make['make'], 
    y=df_by_make['price'],
    size=10
)

plot2.xaxis.major_label_orientation = 1.57 # pi/2

plt.show(plot2)

## Adding color

Import and set up a new ```LinearColorMapper``` object with a palette of Magma256, and the min and max prices for the low and high arguments.

Create a new figure with the same name, labels, and x_range as before.

Plot each manufacturer and provide a size argument with a size of 15.

Provide the color argument to the scatter method and use the field and transform attributes to provide the column (y) and the color_mapper.

Set the label orientation to vertical.

In [27]:
# color points by value
color_mapper = LinearColorMapper(
    palette='Magma256', 
    low=min(df_by_make['price']), 
    high=max(df_by_make['price'])
)

In [29]:
plot3 = plt.figure(
    title='Car Manufacturer Mean Prices', 
    x_axis_label='Car Manufacturer', 
    y_axis_label='Mean Price',
    x_range=df_by_make['make']
)
 
plot3.scatter(
    x=df_by_make['make'], 
    y=df_by_make['price'],
    color={'field': 'y', 'transform': color_mapper},
    size=15
)

plot3.xaxis.major_label_orientation = 1.57 # pi/2

plt.show(plot3)