Column scatter / beeswarm-style plots in ggplot2
R Makefile
Switch branches/tags
Clone or download
eclarke Merge pull request #37 from idemockle/fix_na_limits
Fix case where user passes in NA upper and/or lower bounds
Latest commit 7e620a7 Jun 14, 2018

README.md

Beeswarm-style plots with ggplot2

Build Status

Introduction

Beeswarm plots (aka column scatter plots or violin scatter plots) are a way of plotting points that would ordinarily overlap so that they fall next to each other instead. In addition to reducing overplotting, it helps visualize the density of the data at each point (similar to a violin plot), while still showing each data point individually.

ggbeeswarm provides two different methods to create beeswarm-style plots using ggplot2. It does this by adding two new ggplot geom objects:

  • geom_quasirandom: Uses a van der Corput sequence or Tukey texturing (Tukey and Tukey "Strips displaying empirical distributions: I. textured dot strips") to space the dots to avoid overplotting. This uses sherrillmix/vipor.

  • geom_beeswarm: Uses the beeswarm library to do point-size based offset.

Features:

  • Can handle categorical variables on the y-axis (thanks @smsaladi, @koncina)
  • Automatically dodges if a grouping variable is categorical and dodge.width is specified (thanks @josesho)

See the examples below.

Installation

This package is on CRAN so install should be a simple:

install.packages('ggbeeswarm')

If you want the development version from GitHub, you can do:

devtools::install_github("eclarke/ggbeeswarm")

Examples

Here is a comparison between geom_jitter and geom_quasirandom on the iris dataset:

set.seed(12345)
library(ggplot2)
library(ggbeeswarm)
#compare to jitter
ggplot(iris,aes(Species, Sepal.Length)) + geom_jitter()

plot of chunk ggplot2-compare

ggplot(iris,aes(Species, Sepal.Length)) + geom_quasirandom()

plot of chunk ggplot2-compare

geom_quasirandom()

Using geom_quasirandom:

#default geom_quasirandom
ggplot(mpg,aes(class, hwy)) + geom_quasirandom()

plot of chunk ggplot2-examples

# With categorical y-axis
ggplot(mpg,aes(hwy, class)) + geom_quasirandom(groupOnX=FALSE)

plot of chunk ggplot2-examples

# Some groups may have only a few points. Use `varwidth=TRUE` to adjust width dynamically.
ggplot(mpg,aes(class, hwy)) + geom_quasirandom(varwidth = TRUE)

plot of chunk ggplot2-examples

# Automatic dodging
sub_mpg <- mpg[mpg$class %in% c("midsize", "pickup", "suv"),]
ggplot(sub_mpg, aes(class, displ, color=factor(cyl))) + geom_quasirandom(dodge.width=1)

plot of chunk ggplot2-examples

Alternative methods

geom_quasirandom can also use several other methods to distribute points. For example:

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "tukey") + 
    ggtitle("Tukey texture")

plot of chunk ggplot2-methods

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "tukeyDense") + 
    ggtitle("Tukey + density")

plot of chunk ggplot2-methods

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "frowney") + 
    ggtitle("Banded frowns")

plot of chunk ggplot2-methods

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "smiley") + 
    ggtitle("Banded smiles")

plot of chunk ggplot2-methods

ggplot(iris, aes(Species, Sepal.Length)) + geom_quasirandom(method = "pseudorandom") + 
    ggtitle("Jittered density")

plot of chunk ggplot2-methods

geom_beeswarm()

Using geom_beeswarm:

ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm()

plot of chunk ggplot2-beeswarm

ggplot(mpg,aes(class, hwy)) + geom_beeswarm()

plot of chunk ggplot2-beeswarm

# With categorical y-axis
ggplot(mpg,aes(hwy, class)) + geom_beeswarm(cex=1.2,groupOnX=FALSE)

plot of chunk ggplot2-beeswarm

# ggplot doesn't pass any information about the actual device size of the points
# to the underlying layout code, so it's important to manually adjust the `cex` 
# parameter for best results
# Also watch out for points escaping from the plot with geom_beeswarm
ggplot(mpg,aes(class, hwy)) + geom_beeswarm(cex=1.1)

plot of chunk ggplot2-beeswarm

ggplot(iris,aes(Species, Sepal.Length)) + geom_beeswarm(cex=1.2,priority='density')

plot of chunk ggplot2-beeswarm

# With automatic dodging
ggplot(sub_mpg, aes(class, displ, color=factor(cyl))) + geom_beeswarm(dodge.width=0.5)

plot of chunk ggplot2-beeswarm


Authors: Erik Clarke and Scott Sherrill-Mix