## Introduction to Altair (part 1)

In this section, we will focus on getting some basic ideas of Altair and getting familiar about how to generate basic figures with it. Also, we still discuss how to change the graphical marks and encoding channels in Altair. 

In [1]:
#pip install altair

In [2]:
#pip install vega_datasets

In [3]:
# Install the module
import pandas as pd
import altair as alt
import warnings
warnings.simplefilter(action = 'ignore', category = FutureWarning)

In [4]:
# Import a data from vega-datasets
from vega_datasets import data  # import vega_datasets
cars = data.cars()              # load cars data as a Pandas data frame
cars.head()

Unnamed: 0,Name,Miles_per_Gallon,Cylinders,Displacement,Horsepower,Weight_in_lbs,Acceleration,Year,Origin
0,chevrolet chevelle malibu,18.0,8,307.0,130.0,3504,12.0,1970-01-01,USA
1,buick skylark 320,15.0,8,350.0,165.0,3693,11.5,1970-01-01,USA
2,plymouth satellite,18.0,8,318.0,150.0,3436,11.0,1970-01-01,USA
3,amc rebel sst,16.0,8,304.0,150.0,3433,12.0,1970-01-01,USA
4,ford torino,17.0,8,302.0,140.0,3449,10.5,1970-01-01,USA


#### Fundamental object: Chart

The fundamental object in Altair is the Chart, which takes a data frame as a single argument.

In [5]:
chart = alt.Chart(cars)

#### Marks and Encodings

We first indicate what kind of mark (geometric shape) we want to use to represent the data.

In [6]:
alt.Chart(cars).mark_point()

In [7]:
alt.Chart(cars).mark_point().encode(
    y = 'Miles_per_Gallon'
)

In [8]:
alt.Chart(cars).mark_point().encode(
    y = 'Miles_per_Gallon',
    x = 'Horsepower'
)

In [9]:
alt.Chart(cars).mark_point().encode(
    y = 'Miles_per_Gallon',
    x = 'Cylinders'
)

##### Define types of the attributes

Different type of attributes may lead to different data visualization. 

- ':N' indicates a nominal type (unordered, categorical data),
- ':O' indicates an ordinal type (rank-ordered data),
- ':Q' indicates a quantitative type (numerical data with meaningful magnitudes), and
- ':T' indicates a temporal type (date/time data)

In [10]:
alt.Chart(cars).mark_point().encode(
    alt.Y('Miles_per_Gallon:Q'),
    alt.X('Cylinders:O')
)

##### Data Transformation: Aggregation and Filter

You can generate plot of summary statistics without calculating it. The options include: 

- sum
- average/mean
- median
- q1,q3
- min,max
- count

You can also include filter to exclude any part of the data or use calculate to generate new attribute to plot. 

In [11]:
alt.Chart(cars).mark_point().encode(
    alt.Y('mean(Miles_per_Gallon):Q'),
    alt.X('Cylinders:O')
)

In [12]:
alt.Chart(cars).mark_point().encode(
    alt.Y('mean(Miles_per_Gallon):Q'),
    alt.X('Cylinders:O')
).transform_filter('year(datum.Year) < 1975')

In [13]:
alt.Chart(cars).mark_point().transform_filter('year(datum.Year) < 1975').encode(
    alt.Y('mean(Miles_per_Gallon):Q'),
    alt.X('Cylinders:O')
)

In [14]:
alt.Chart(cars).mark_point().transform_calculate(
    Horsepower10 = 'datum.Horsepower * 10'
).encode(
    alt.Y('Miles_per_Gallon:Q'),
    alt.X('Horsepower10:Q')
)

In [15]:
alt.Chart(cars).mark_point().transform_calculate(
    Horsepowerlog = 'log(datum.Horsepower)'
).encode(
    alt.Y('Miles_per_Gallon:Q'),
    alt.X('Horsepowerlog:Q')
)

In [16]:
alt.Chart(cars).mark_point().encode(
    alt.Y('Miles_per_Gallon:Q'),
    alt.X('Horsepower:Q', scale = alt.Scale(type = 'log'))
)

#### Title and labels

In [17]:
alt.Chart(cars).mark_point().encode(
    alt.Y('Miles_per_Gallon:Q', title = 'Miles per Gallon'),
    alt.X('Horsepower:Q', title = 'Horsepower')
).properties(
    title = 'Horsepower vs. Miles per Gallon',
    width = 150, height = 150
)

#### Exercise

The iris dataset contains the length and width of sepals and petals for three flower species. We will load this dataset using vega_datasets.

In [22]:
# Load the iris dataset
iris = data.iris()

# Display the first few rows of the dataset
iris.head()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


Letâ€™s create a basic scatter plot to visualize the relationship between petalLength and petalWidth. Then we will apply a transformation to normalize the petalLength and petalWidth before encoding them into a scatter plot. Normalizing data can help in comparing features that have different scales.

In [23]:
iris.describe()

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [27]:
alt.Chart(iris).mark_point().transform_calculate(
    norm_petalLength = '(datum.petalLength - 1)/(7-1)',
    norm_petalWidth = '(datum.petalWidth - 0.1)/(2.5-0.1)'
).encode(
    alt.X('norm_petalLength:Q', title = "Normalized Petal Length"),
    alt.Y('norm_petalWidth:Q', title = 'Normalized Pental Width')
).properties(
    title = 'Normalized Petal Length and Petal Width'
)

#### Choices of mark and encoding

###### Encoding Channels

- x: Horizontal (x-axis) position of the mark.
- y: Vertical (y-axis) position of the mark.
- size: Size of the mark. May correspond to area or length, depending on the mark type.
- color: Mark color, specified as a legal CSS color.
- opacity: Mark opacity, ranging from 0 (fully transparent) to 1 (fully opaque).
- shape: Plotting symbol shape for point marks.
- tooltip: Tooltip text to display upon mouse hover over the mark.
- order: Mark ordering, determines line/area point order and drawing order.
- column: Facet the data into horizontally-aligned subplots.
- row: Facet the data into vertically-aligned subplots.

In [41]:
alt.Chart(cars).mark_point().encode(
    alt.Y('Miles_per_Gallon:Q'), 
    alt.X('Horsepower:Q'),
    alt.Color("Cylinders:N"),
    alt.Size('Acceleration:Q'), 
    alt.Shape('Origin:N'),
    alt.OpacityValue(0.5),
    alt.Order('Acceleration:Q', sort = 'descending'),
    #alt.Column('Origin:N')
    alt.Tooltip(['Name','Miles_per_Gallon:Q', 'Horsepower:Q', 'Origin:N'])
).properties(
    width = 150, height = 150
).facet(
    facet = 'Origin:N',
    columns = 2
)

###### Marks

- mark_point() - Scatter plot points with configurable shapes.
- mark_circle() - Scatter plot points as filled circles.
- mark_square() - Scatter plot points as filled squares.
- mark_tick() - Vertical or horizontal tick marks.
- mark_bar() - Rectangular bars.
- mark_boxplot() - Boxplot
- mark_line() - Connected line segments.
- mark_area() - Filled areas defined by a top-line and a baseline.
- mark_rect() - Filled rectangles, useful for heatmaps.
- mark_rule() - Vertical or horizontal lines spanning the axis.
- mark_text() - Scatter plot points represented by text.

In [42]:
alt.Chart(cars).mark_point().encode(
    alt.Y('Miles_per_Gallon:Q'), 
    alt.X('Horsepower:Q'),
).properties(
    width = 150, height = 150
)

In [45]:
alt.Chart(cars).mark_circle(
    color = 'red', 
    size = 8
).encode(
    alt.Y('Miles_per_Gallon:Q'), 
    alt.X('Horsepower:Q'),
).properties(
    width = 150, height = 150
)

In [47]:
alt.Chart(cars).mark_square(
    angle = 45
).encode(
    alt.Y('Miles_per_Gallon:Q'), 
    alt.X('Horsepower:Q'),
).properties(
    width = 150, height = 150
)

In [48]:
alt.Chart(cars).mark_tick().encode(
    alt.Y('Miles_per_Gallon:Q'), 
    alt.X('Horsepower:Q'),
).properties(
    width = 150, height = 150
)

In [52]:
alt.Chart(cars).mark_bar().encode(
    alt.Y('average(Miles_per_Gallon):Q'),
    #alt.Y('count()'),
    alt.X('Cylinders:N')
).properties(width = 150, height = 150)

In [56]:
alt.Chart(cars).mark_line(
    color = 'red',
    strokeWidth = 3,
    interpolate = "monotone"
).encode(
    alt.Y('average(Miles_per_Gallon):Q'),
    alt.X('Cylinders:N')
).properties(width = 150, height = 150)

In [58]:
alt.Chart(cars).mark_area().encode(
    alt.Y('average(Miles_per_Gallon):Q'),
    alt.X('Cylinders:N'), 
    alt.Color('Origin:N')
).properties(width = 150, height = 150)

In [60]:
alt.Chart(cars).mark_boxplot().encode(
    alt.Y('Miles_per_Gallon:Q'),
    alt.X('Cylinders:N')
).properties(width = 150, height = 150)

In [63]:
alt.Chart(cars).mark_bar().encode(
    alt.X('Miles_per_Gallon:Q', bin = alt.BinParams(maxbins = 10)),
    alt.Y('count()')
)

### Adding multiple plots together and save

In [68]:
# Year vs. average(MPG), point and line

line = alt.Chart(cars).mark_line().encode(
    alt.X('Year:T'),
    alt.Y('average(Miles_per_Gallon):Q')
)

point = alt.Chart(cars).mark_circle().encode(
    alt.X('Year:T'),
    alt.Y('average(Miles_per_Gallon):Q')
)

In [69]:
line + point

In [70]:
line | point

In [72]:
chart_final = line + point
chart_final.save('chart.html')