Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance with sp objects #28

Closed
ateucher opened this issue Feb 16, 2016 · 6 comments
Closed

Performance with sp objects #28

ateucher opened this issue Feb 16, 2016 · 6 comments

Comments

@ateucher
Copy link
Owner

Seems really slow compared to eg:

writeOGR(foo, "foo")
system("mapshaper foo -simplify 0.01 keep-shapes -o foo_simp")
readOGR("foo_simp")

Might be related to writing really long geojson strings??

@ateucher
Copy link
Owner Author

Profiling shows major bottleneck is in geojsonio::geojson_json, in sp_to_GeoJSON. Need to profile that to see if can be improved.

@ateucher
Copy link
Owner Author

Is there a way to get around writing to disk by writing to stdout and capturing it as a character vector?

cities <- readOGR(system.file("vectors", package = "rgdal")[1], "cities")
writeOGR(cities[1:10,], "/vsistdout/", "cities", driver="GeoJSON")

@javrucebo
Copy link

The main bottleneck seems actually to be due to a - seemingly unnecessary - transformation loop in the geojsonio package.

As you have already stated: The critical command in the rmapshaper package is this one:

# rmapshaper/R/simplify.R function ms_simplify_sp line 157
geojson <- sp_to_GeoJSON(input)

And inside sp_to_GeoJSON it is the geojson_json function which uses the biggest part of the overall time

# rmapshaper/R/zzz.R function sp_to_GeoJSON line 53-55
proj <- sp::proj4string(sp)
js <- geojsonio::geojson_json(sp)
structure(js, proj4 = proj)

So let's see what geojson_json is doing:

# geojsonio/R/geosjson_json.R function geojson.SpatialPolygonsDataFrame line 193
to_json(geojson_rw(input), ...)

it is transforming the output from geojson_rw to JSON. So let' look inside geojson_rw

# geojsonio/R/zzz.R function geojson_rw 306-308
tmp <- tempfile(fileext = ".geojson")
suppressMessages(geojson_write(input, file = tmp))
jsonlite::fromJSON(tmp, simplifyDataFrame = FALSE, simplifyMatrix = FALSE, ...)

So what we are basically doing is to transform the Spatial class to geojson via geojson_write then read it into an R structure via fromJSON and then transform it back to JSON via to_json.

So let's write a faster version of ms_simplify [leaving aside all the additional checks and just concentrate on SpatialPolygonsDataFrame for sake of illustration:

ms_simplify_fast <- function(sp, ...) {
  # this is mimicking sp_to_GeoJSON without the additional loop JSON -> R -> JSON
  proj4 <- proj4string(sp)
  tmp <- tempfile(fileext = ".geojson")
  suppressMessages(geojsonio::geojson_write(sp, file = tmp))
  gj <- structure(readChar(tmp, file.info(tmp)$size),  class = c("json", "geo_json"))
  file.remove(tmp)
  # here we run the simplification function
  gj_simp <- rmapshaper::ms_simplify(gj, ...)
  # this is same as GeoJSON_to_sp
  sp <- suppressWarnings(
    suppressMessages(
      rgdal::readOGR(gj_simp, "OGRGeoJSON", verbose = FALSE,
                     disambiguateFIDs = TRUE, p4s = proj4)
    ))
  # this is same as curly_brace_na(sp)
  sp@data[sp@data == "{ }"] <- NA
  sp
}

I do not know if the transforming back and forth adds anything additional (or leaves out something ...), but in any case for the SpatialPolygonsDataFrames I have tested the fast version with, the visual result is the same in the original and the new version.

Benchmark the fast function with a couple of different maps:

bench_simplify <- function(country) {
  map <- raster::getData('GADM', level='1', country=country)
  print(microbenchmark::microbenchmark(original=simp_original <- ms_simplify(map),
                                 fast=simp_fast <- ms_simplify_fast(map),
                                 times=5) ,
        unit="s", signif=3)
}

and look at the result:

> bench_simplify('AND')
Unit: seconds
     expr  min   lq     mean median   uq  max neval cld
 original 1.53 1.53 1.538892   1.54 1.54 1.55     5   b
     fast 1.44 1.44 1.447659   1.45 1.45 1.45     5  a 
> bench_simplify('BEL')
Unit: seconds
     expr  min   lq     mean median   uq  max neval cld
 original 2.52 2.52 2.530740   2.52 2.53 2.58     5   b
     fast 1.55 1.55 1.561705   1.56 1.56 1.59     5  a 
> bench_simplify('NGA')
Unit: seconds
     expr   min    lq      mean median   uq  max neval cld
 original 28.50 28.70 28.689826  28.70 28.7 28.9     5   b
     fast  4.75  4.77  4.899344   4.88  5.0  5.1     5  a 
> bench_simplify('ITA')
Unit: seconds
     expr   min    lq      mean median    uq   max neval cld
 original 43.10 43.70 44.310370  44.10 44.20 46.40     5   b
     fast  6.64  6.78  6.951413   6.95  7.05  7.34     5  a 
> bench_simplify('SWE')
Unit: seconds
     expr   min    lq     mean median   uq  max neval cld
 original 62.90 63.50 63.47698   63.5 63.6 63.8     5   b
     fast  9.85  9.98 10.23675   10.4 10.4 10.6     5  a 

which is a nice speedup ;-)

@ateucher
Copy link
Owner Author

ateucher commented Jun 2, 2016

Thanks @javrucebo! That is a very good point - we shouldn't need to go through the intermediate json object. I will double-check that jsonlite::fromJSON isn't doing some important checking/validation, but I will likely implement the change (or you can submit a PR if you like). it's probably worth opening an issue over at ropensci/geojsonio about it as well.

There is also a major bottleneck with writing a very large sp object to disk as geojson rgdal::writeOGR (500+mb sp object with thousands of polygons resulting in a 1GB+ geojson file). Writing as shp is pretty quick, so I assume it has something to do with writing one monster string all at once. That problem is a bit more intractable to me at this point... See ropensci/geojsonio#77

@ateucher
Copy link
Owner Author

ateucher commented Jun 2, 2016

Actually funny enough I just double-checked and the writing as geojson isn't so bad... so either my memory is faulty or rgdal had a recent update that fixed it...

@ateucher
Copy link
Owner Author

Largely fixed in geojsonio by @javrucebo suggestions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants