Skip to content
📈 📊 Introduces geom_pointdensity(): A Cross Between a Scatter Plot and a 2D Density Plot.
R C
Branch: master
Clone or download
Latest commit 085197d Nov 26, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
R fix #4 (adjust param in stat_pointdensity) Oct 25, 2019
img added docs Aug 27, 2019
man
src adding docs, for real this time Aug 28, 2019
.Rbuildignore initial commit Jul 4, 2019
.gitignore initial commit Jul 4, 2019
DESCRIPTION add CRAN badge Aug 29, 2019
LICENSE Create LICENSE Aug 6, 2019
NAMESPACE adding docs, for real this time Aug 28, 2019
README.md Update README.md Nov 26, 2019
ggpointdensity.Rproj initial commit Jul 4, 2019

README.md

ggpointdensity

CRAN_Status_Badge Downloads

Introduces geom_pointdensity(): A cross between a scatter plot and a 2D density plot.

Installation

To install the package, type this command in R:

install.packages("ggpointdensity")

# Alternatively, you can install the latest
# development version from GitHub:
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("LKremer/ggpointdensity")

Motivation

There are several ways to visualize data points on a 2D coordinate system: If you have lots of data points on top of each other, geom_point() fails to give you an estimate of how many points are overlapping. geom_density2d() and geom_bin2d() solve this issue, but they make it impossible to investigate individual outlier points, which may be of interest.

geom_pointdensity() aims to solve this problem by combining the best of both worlds: individual points are colored by the number of neighboring points. This allows you to see the overall distribution, as well as individual points.

Changelog

Added method argument and renamed the n_neighbor stat to density. The available options are method="auto", method="default" and method="kde2d". default is the regular n_neighbor calculation as in the CRAN package. kde2d uses 2D kernel density estimation to estimate the point density (credits to @slowkow). This method is slower for few points, but faster for many (ca. >20k) points. By default, method="auto" picks either kde2d or default depending on the number of points.

Demo

Generate some toy data and visualize it with geom_pointdensity():

library(ggplot2)
library(dplyr)
library(viridis)
library(ggpointdensity)

dat <- bind_rows(
  tibble(x = rnorm(7000, sd = 1),
         y = rnorm(7000, sd = 10),
         group = "foo"),
  tibble(x = rnorm(3000, mean = 1, sd = .5),
         y = rnorm(3000, mean = 7, sd = 5),
         group = "bar"))

ggplot(data = dat, mapping = aes(x = x, y = y)) +
  geom_pointdensity() +
  scale_color_viridis()

Each point is colored according to the number of neighboring points. (Note: this here is the dev branch, where I decided to plot the density estimate instead of n_neighbors now.) The distance threshold to consider two points as neighbors (smoothing bandwidth) can be adjusted with the adjust argument, where adjust = 0.5 means use half of the default bandwidth.

ggplot(data = dat, mapping = aes(x = x, y = y)) +
  geom_pointdensity(adjust = .1) +
  scale_color_viridis()
 
ggplot(data = dat, mapping = aes(x = x, y = y)) +
  geom_pointdensity(adjust = 4) +
  scale_color_viridis()

Of course you can combine the geom with standard ggplot2 features such as facets...

# Facetting by group
ggplot(data = dat, mapping = aes(x = x, y = y)) +
  geom_pointdensity() +
  scale_color_viridis() +
  facet_wrap( ~ group)

... or point shape and size:

dat_subset <- sample_frac(dat, .1)  # smaller data set
ggplot(data = dat_subset, mapping = aes(x = x, y = y)) +
  geom_pointdensity(size = 3, shape = 17) +
  scale_color_viridis()

Zooming into the axis works as well, keep in mind that xlim() and ylim() change the density since they remove data points. It may be better to use coord_cartesian() instead.

ggplot(data = dat, mapping = aes(x = x, y = y)) +
  geom_pointdensity() +
  scale_color_viridis() +
  xlim(c(-1, 3)) + ylim(c(-5, 15))

ggplot(data = dat, mapping = aes(x = x, y = y)) +
  geom_pointdensity() +
  scale_color_viridis() +
  coord_cartesian(xlim = c(-1, 3), ylim = c(-5, 15))

Authors

Lukas PM Kremer (@LPMKremer) and Simon Anders (@s_anders_m), 2019

You can’t perform that action at this time.