Skip to content

gdagstn/oveRlay

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

R-CMD-check

oveRlay

Annotate 2D point clouds using overlay polygons.

Motivation and methodology

This R package (in development) offers an alternative to the concave hull geom in ggforce by taking an entirely different approach:

  • creates a regularly spaced grid of squares, whose side is defined by a step size (a fraction of the range of your data)
  • calculates which squares contain points, with user-defined tolerance, and keeps the vertices
  • uses isoband to calculate a contour joining vertices on the grid
  • calculates holes, i.e. which polygons are inside other polygons, so that their space can be "subtracted" when plotting by ggplot2
  • optional smoothing (via smoothr), offset (via polyclip) and joining of overlapping polygons (again via polyclip)

The resulting polygon (or sets of polygons) follow the shapes of the point cloud quite closely and, if a point or set of points is far enough, another disjointed polygon is created. This allows to deal with outliers nicely; the current ggforce implementation of geom_mark_hull() is super fast, but will enclose all points no matter how far they are. Moreover, and to the best of my knowledge, the concave hull in ggforce does not allow holes.

The step size controls how granular the resulting polygon set is: small step sizes will create shapes that follow the data more closely (at the expense of computing speed and, eventually, usability).

Here is an example of how different step sizes behave on the same set of 2D rnorm(1000):

Screenshot 2022-05-19 at 8 05 26 PM

Installation

You can install oveRlay through devtools:

library(devtools)
install_github("gdagstn/oveRlay")

Examples

Using oveRlay is easy:

library(oveRlay)

dat <- matrix(rnorm(1000), ncol = 2)

overlay <- makeOverlay(dat, min_pts = 1, stepsize = 0.06, minsize = 4)
plot(dat, pch = 16, cex = 0.5, xlim = range(overlay[,1:2]), ylim = range(overlay[,1:2]))
for(i in unique(overlay$cluster)) polygon(overlay[overlay$cluster == i, 1:2])

overlay polygon

Decreasing step size (increasing granularity)

overlay <- makeOverlay(dat, min_pts = 1, stepsize = 0.02, minsize = 4)
plot(dat, pch = 16, cex = 0.5, xlim = range(overlay[,1:2]), ylim = range(overlay[,1:2]))
for(i in unique(overlay$cluster)) polygon(overlay[overlay$cluster == i, 1:2])

overlay with decreased step size

Increasing offset

overlay <- makeOverlay(dat,min_pts = 1, stepsize = 0.02, minsize = 4, offset_prop = 0.08)
plot(dat, pch = 16, cex = 0.5, xlim = range(overlay[,1:2]), ylim = range(overlay[,1:2]))
for(i in unique(overlay$cluster)) polygon(overlay[overlay$cluster == i, 1:2])

overlay with increased offset

Increasing offset without joining polygons:

overlay <- makeOverlay(dat,min_pts = 1, stepsize = 0.02, minsize = 4, offset_prop = 0.08, join_polys = FALSE)
plot(dat, pch = 16, cex = 0.5, xlim = range(overlay[,1:2]), ylim = range(overlay[,1:2]))
for(i in unique(overlay$cluster)) polygon(overlay[overlay$cluster == i, 1:2])

overlay without joining

However, the output of makeOverlay() is mostly designed to be used with ggplot2. When using it as an input to geom_polygon() it is important to use group = "cluster_hole" and subgroup = "id_hole" in a call to aes_string() (or aes without quotes). The use of group and subgroup allows drawing the polygon geom with holes.

library(ggplot2)

dat =  as.data.frame(matrix(rnorm(1000), ncol = 2))
colnames(dat) = c("x", "y")

# Make a hole in the distribution
dat = dat[abs(dat$x) > 0.5 | abs(dat$y) > 0.5,]

overlay <- makeOverlay(dat, min_pts = 1, stepsize = 0.02, minsize = 4, offset_prop = 0.01, join_polys = TRUE)

ggplot(data = dat, aes(x = x, y = y)) + 
  geom_point() + 
  geom_polygon(data = overlay,
               aes_string(x = "x", y = "y", group = "cluster_hole", subgroup = "id_hole"),
               color = "red", fill = "red", alpha = 0.3) + 
  theme_bw()

Screenshot 2022-05-19 at 10 46 00 PM

To do

  • Handling more than one group of points at the same time
  • Handling overlapping regions with different categories
  • Integration with UMAP/tSNE visualizations in single cell packages
  • Recentering of single-point polygons

Acknowledgments

  • Claus Wilke and Thomas Lin Pedersen for the isoband package
  • Thomas Lin Pedersen for the ggforce package
  • Adrian Baddeley for the polyclip package, and Angus Johnson for the original Clipper library
  • Matthew Strimas Mackey for the smoothr package
  • Hadley Wickham, Winston Chang, Lionel Henry, Thomas Lin Pedesen, Kohske Takahashi, Claus Wilke, Kara Woo, and Hiroaki Yutani for the ggplot2 package