# Data Visualization with Gadfly.js

[Gadfly.jl documentation](http://gadflyjl.org/stable/)

First we'll load the packages we need.

In [None]:
using Gadfly, RDatasets

Now we'll load the `mpg` dataset from `ggplot2` and look at the first few rows.

In [None]:
mpg = dataset("ggplot2", "mpg")
show(first(mpg, 6), allcols=true)

And we'll make a scatterplot of `Displ` against `Hwy`. To do that we call the `plot` function passing in a dataframe, mappings of columns of the dataframe to `x` and `y`, and the "geometry" for the plot. In this case we specify `Geom.point`.

In [None]:
plot(mpg, x=:Displ, y=:Hwy, Geom.point)

# Grammar of Graphics

Gadfly.jl is based on `ggplot2`, which is based on a *grammar of graphics*.

Let's start with the three most central elements of this grammar:

* Data
* Mapping of variables to aesthestic attributes
* Layers of geometric elements

In the plot above, the data variables `Displ` and `Hwy` were mapped to `x` and `y` locations of geometric elements of points.

Now let's look at some additional mappings like `color`, `shape`, and `size` that can also be used with points.

In [None]:
plot(mpg, x=:Displ, y=:Hwy, color=:Class, Geom.point)

In [None]:
plot(mpg, x=:Displ, y=:Hwy, shape=:Cyl, Geom.point)

In [None]:
plot(mpg, x=:Displ, y=:Hwy, size=:Drv, Geom.point)

There is nothing, other than perhaps good taste, to stop us from combining these.

In [None]:
plot(mpg, x=:Displ, y=:Hwy, color=:Class, shape=:Cyl, size=:Drv, Geom.point)

A better way to visualize multiple variables might be to use facets (another element of the grammar of graphics).

In [None]:
plot(mpg, x=:Displ, y=:Hwy, xgroup=:Drv, Geom.subplot_grid(Geom.point))

We can also plot multiple layers. For example, here we use `Geom.smooth` to add a linear regression line.

In [None]:
scatter = layer(mpg, x=:Displ, y=:Hwy, Geom.point)
line = layer(mpg, x=:Displ, y=:Hwy, Geom.smooth(method=:lm), style(default_color=colorant"red"))
plot(scatter, line)

Take note of the call to the `style` function to override the `default_color` of the current theme. Colors in Julia can be specified several ways, but the preferred way when calling a function is as above, `colorant"red"`. Many different color names are supported.

### Additional Geoms

In [None]:
plot(mpg, x=:Drv, y=:Hwy, Geom.point)

In [None]:
plot(mpg, x=:Drv, y=:Hwy, Geom.beeswarm)

In [None]:
plot(mpg, x=:Drv, y=:Hwy, Geom.boxplot)

### Histograms and densities

In [None]:
plot(mpg, x=:Hwy, Geom.histogram)

In [None]:
plot(mpg, x=:Hwy, Geom.density)

### Compare distributions of subgroups

In [None]:
plot(mpg, x=:Displ, color=:Drv, Geom.histogram)

Here again, facets may be a better way to compare subgroups.

In [None]:
plot(mpg, x=:Displ, color=:Drv, ygroup=:Drv, Geom.subplot_grid(Geom.histogram))

### More information

Other elements of the grammar of graphics include

* Scales - two-way mappings of values in data space to and from values in aesthetic space
* Coordinate systems - cartesian, polar, map projections (Gadfly only supports cartesian for now)
* Themes - everything else (font size, background color, etc.)

See the [Gadfly.jl documentation](http://gadflyjl.org/stable/) for more information.