cell-abstraction mode #11

mdsumner · 2017-07-14T15:49:51Z

I'd love to see a "cell-abstraction" version of fasterize that returned the cell number of object/s, as a classified data frame i.e. cell, polygon-ID.

This is awesome because

overlapping objects don't have to be "resolved", they are recorded (i.e. one cell, many features)
the raster cells aren't returned in full necessarily, but only where they intersect a feature
tidy verbs for downstream raster extraction and per-cell / per-feature / per-raster-slice summary for many varied workflows ...

Here's an example of the current behaviour, multiple overlapping polygons are "resolved" by applying "sum" to the many-to-one relation:

library(fasterize)
p1 <- rbind(c(-180,-20), c(-140,55), c(10, 0), c(-140,-60), c(-180,-20))
hole <- rbind(c(-150,-20), c(-100,-10), c(-110,20), c(-150,-20))
p1 <- list(p1, hole)
p2 <- list(rbind(c(-10,0), c(140,60), c(160,0), c(140,-55), c(-10,0)))
p3 <- list(rbind(c(-125,0), c(0,60), c(40,5), c(15,-45), c(-125,0)))
library(sf)
pols <- st_sf(value = rep(1,3),
              geometry = st_sfc(lapply(list(p1, p2, p3), st_polygon)))
r0 <- raster(pols, res = 1)

## here is a literal rasterization
r <- fasterize(pols, r0, field = "value", fun="sum")

Here is the same (mostly) rasterization in "cell-abstraction" mode.

bend fasterize to abstraction, instantiate the entire grid and decompose

pols$polygonID <- seq_len(nrow(pols))
## here is a cell-abstraction version, fast, because fasterize

cells0 <- dplyr::bind_rows(lapply(seq_len(nrow(pols)), 
       function(i) tibble::tibble(cell = which(!is.na(raster::values(fasterize(pols[i, , drop = FALSE ], r0, field = "polygonID", fun = "first")))), 
                                  polygonID = i)))

bend raster's powerful but un-tidy cell* functions to use, this is done "sparsely" with cell-logic

## here is a cell-abstraction version, slow but slightly neater code with less wasted pixel creation
## and no need to filter out the missing values
cells1 <- dplyr::bind_rows(lapply(raster::cellFromPolygon(r0, as(pols, "Spatial")), 
                                 function(x) tibble::tibble(cell = as.integer(x))), .id = "polygon_rowid")

A comparison of the two results above. This kind of decomposition might help explore exactly what and where for #6

library(dplyr)
## resolution of elements affects the result, also f/rasterize are slightly different 
cells0 %>% group_by(polygonID) %>% summarize(finite_element_area = prod(raster::res(r)) * n())
# # A tibble: 3 x 2
# polygonID finite_element_area
# <int>               <dbl>
#   1         1               10135
# 2         2                9765
# 3         3                8700
cells1 %>% group_by(polygon_rowid) %>% summarize(finite_element_area = prod(raster::res(r)) * n())

# # A tibble: 3 x 2
# polygon_rowid finite_element_area
# <chr>               <dbl>
#   1             1               10115
# 2             2                9785
# 3             3                8700

st_area(pols)
#[1] 10125  9775  8700

Finally the real goal, an in-progress generall "cellnumbers" function that would use fasterize's future "cell-abstraction", rather than as now converting back to Spatial.

## here is a tidier but still slow version (that would benefit massively from fasterize support!)

library(tabularaster)  ## devtools::install_github("hypertidy/tabularaster")
cn <- cellnumbers(r0, pols) 
cn %>% group_by(object_) %>% summarize(n())
# # A tibble: 3 x 2
# object_ `n()`
# <int> <int>
#   1       1 10115
# 2       2  9785
# 3       3  8700
# Warning message:
#   projections not the same 
# x: NA
# query: NA

I could do the lapply trick as in 2. above for tabularaster, but I think this is immensely powerful and at any rate will be required for really massive grids, when we really don't need all those empty pixels. This kind of decomposition to data frames for rasters is a good way to flexibly implement sparse rasters, and rasters distributed over higher dimension, sparse or otherwise.

I'd like to explore the C++ for this, but for the time being this is a placeholder in case it's of interest and / or easy-ish.

Thanks!

mdsumner · 2017-07-14T23:38:36Z

Actually I've seen a way to do this by apply fasterize iteratively to each feature, which saves some of the cell/value instantiation (i.e. when features are small compared to the raster extent).

There's a need for cell-remapping between local global extents that's been on my mind for a while, and I think that would transcend this request as it stands - and might be pretty straightforward. This is perhaps a "wrong question" request, but they do help to get to the right question some times.

mdsumner · 2018-03-21T13:07:20Z

Analogous would be to return a set of poly, rownumber, xstarts and xends - it would be tiny, with no need to ever fill any raster data and allow very complex custom functions. I'll do this soonish, now I understand how it works!

noamross · 2018-03-23T14:31:26Z

So I understand, your intention here is another separate function, that would return a data frame? Within C++ it would use the Edge/Edgelist types, and the parts of the C++ rasterize function itself, but rather than running through the matrix it would return this data frame of poly, row, xstart, xend?

So, on one hand I want to minimize code duplication and we should see if we should modularize parts of the rasterize C++ function to minimize that so that you're not repeating much besides the inside of the loop.

However also, this seems like a specialty/developer-focused output, with the goal of producing something like the outputs in tabularaster. It goes a bit against my goals of keeping this package simple by not supporting anything but basic sf/raster/starts input and output types. Maybe the better way to go would be to export the C++ functions and classes via Rcpp::interfaces so that another package can use them to produce different outputs like this.

mdsumner · 2018-03-24T00:50:56Z

Sounds good, it's pretty minor duplication and I can see my way to doing it now. I'll keep an eye on your API exposure and let you know how I go.

mdsumner · 2018-03-26T15:27:29Z

Just for the record here, I was pretty naive about what's involved, but I'm learning heaps ;)

mdsumner · 2022-12-14T05:51:34Z

I have done this here: https://github.com/hypertidy/controlledburn

will see about folding it back in, exposing the API, etc.

noamross mentioned this issue Mar 23, 2018

Export a C++ API #21

Open

This was referenced Dec 5, 2022

Not in-memory formats to read/write #10

Open

Fasterize to file? #29

Open

mdsumner self-assigned this Dec 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cell-abstraction mode #11

cell-abstraction mode #11

mdsumner commented Jul 14, 2017 •

edited

Loading

mdsumner commented Jul 14, 2017

mdsumner commented Mar 21, 2018

noamross commented Mar 23, 2018

mdsumner commented Mar 24, 2018 •

edited

Loading

mdsumner commented Mar 26, 2018

mdsumner commented Dec 14, 2022

cell-abstraction mode #11

cell-abstraction mode #11

Comments

mdsumner commented Jul 14, 2017 • edited Loading

mdsumner commented Jul 14, 2017

mdsumner commented Mar 21, 2018

noamross commented Mar 23, 2018

mdsumner commented Mar 24, 2018 • edited Loading

mdsumner commented Mar 26, 2018

mdsumner commented Dec 14, 2022

mdsumner commented Jul 14, 2017 •

edited

Loading

mdsumner commented Mar 24, 2018 •

edited

Loading