Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JS: Parse directly to flat array? #51

Closed
kylebarron opened this issue Apr 4, 2020 · 9 comments
Closed

JS: Parse directly to flat array? #51

kylebarron opened this issue Apr 4, 2020 · 9 comments

Comments

@kylebarron
Copy link
Contributor

Is https://github.com/bjornharrtell/flatgeobuf/blob/master/src/ts/geojson.ts the current JS/TS API? I'm curious if it's possible to access the parsed data without creating individual Feature objects.

In particular, when using Deck.gl, a high performance GPU-accelerated geospatial visualization library, performance is best when it's possible to keep geometries as flat typed arrays, since 1) you don't have to pay the time cost for individual object creation and 2) the data needs to be in flat typed arrays to be uploaded to the GPU.

Given what I understand of the format, it would seem possible to read the metadata and create flat typed arrays very fast. I presume the number of coordinates of each feature is known from the initial metadata?

@bjornharrtell
Copy link
Member

Absolutely, you can access the coordinates of a Geometry directly as a Float64Array. For the GeoJSON target this is done at https://github.com/bjornharrtell/flatgeobuf/blob/0ce5c837491a41a0f6771ab150bbd7eb0e9a8c26/src/ts/geojson/geometry.ts#L62. The GeoJSON target is an abstraction above raw flatbuffer access. The xyArray method is part of the generated flatbuffers API (https://github.com/bjornharrtell/flatgeobuf/blob/0ce5c837491a41a0f6771ab150bbd7eb0e9a8c26/src/ts/feature_generated.ts#L700) and it's direct access, no decoding, no copying needed.

That said, FlatGeobuf is not a format that explicitly targets GPU rendering. For GPU rendering polygons should probably be pre-tesselated and a format is constrained to GPU rendering could likely be made more optimal for that case.

@kylebarron
Copy link
Contributor Author

kylebarron commented Apr 5, 2020

Very cool! I plan to explore it more in the future.

GPU rendering

Yes, you're right. I believe Deck.gl does polygon tessellation internally before passing the data to the GPU, but that's abstracted away from the user and working with flat typed arrays is still faster than many small JS objects.

@tim-salabim
Copy link

ping @dcooley @robertleeplummerjr

@kylebarron
Copy link
Contributor Author

Are z/m coordinates stored in separate flat arrays?

I see here that
https://github.com/bjornharrtell/flatgeobuf/blob/0ce5c837491a41a0f6771ab150bbd7eb0e9a8c26/src/ts/feature_generated.ts#L722-L728
zArray is separate from xyArray.

@bjornharrtell
Copy link
Member

@kylebarron yes.

AFAIK there are two competing memory models in the wild. One is this way, i.e one array for two dimensional data and separate arrays per additional dimension. The other way is a single array that can hold a variable number of dimensions as "stride". It was a difficult choice because I couldn't find any existing rationale for the models. One possible benefit of separating the dimensions is that less memory is required if you are only interested reading out two dimensional data. Also, for two dimensional data the memory models are coinciding.

To use FlatGeobuf directly without needing to transform the data you need to use the same memory model. Examples of software that uses the same memory model as FlatGeobuf are GDAL and QGIS.

@kylebarron
Copy link
Contributor Author

Yes, in Deck.gl we use the stride approach. Along with each flat array, we have metadata describing the coordinate dimension. I think our thought process is that if the source data exists in 3 dimensions, the user is more likely to want all 3 dimensions. So I expect that we'll do one copy of the data, especially when we need to combine the xy and the z array into one interleaved array.

That said, I don't have an objective opinion as to which is better, so I'm not complaining as to how you implemented it.

@bjornharrtell
Copy link
Member

Yeah I guess it depends if you are doing mostly 2D or mostly 3D. Not sure about dimensions above that though. It's sad that this breaks the zero copy benefit though, depending on use case. Would be good if standardization efforts also included the memory model. In the initial version of FlatGeobuf spec I actually did use the stride model and I'm not sure I made the right choice of going with the other model.

As discussed elsewhere though, FlatGeobuf isn't really about being a render optimized format even though it might be efficient enough to be used as that in many use cases.

@kylebarron
Copy link
Contributor Author

Quick question: it isn't possible to get the entire XY array of a MultiPolygon at once with zero copy, right? Testing on a MultiPolygon it seems I must use .parts(i) for every part.

geometry.xyArray()
// null
geometry.parts(0).xyArray()
// Float64Array(10) [
//   2, 2, 2, 3, 3,
//   3, 3, 2, 2, 2
// ]
geometry.parts(1).xyArray()
// Float64Array(10) [
//   4, 4, 4, 5, 5,
//   5, 5, 4, 4, 4
// ]

@bjornharrtell
Copy link
Member

bjornharrtell commented Aug 4, 2020

Semi-correct 🙂 while the multi parts are indeed separated there is no copying needed/involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants