Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large JSON TIN support #3732

Closed
GlenRice-NOAA opened this issue May 15, 2023 · 10 comments
Closed

Large JSON TIN support #3732

GlenRice-NOAA opened this issue May 15, 2023 · 10 comments
Labels

Comments

@GlenRice-NOAA
Copy link

We are running into trouble extending the PROJ structure where we have large JSON TIN files like we were doing in ticket #3616. The limit seems to be set here, but this limit is labeled as arbitrary.

Our current use case has an 80 MB file for the Alaska region. In the short term we would like to explore increasing this limit (perhaps as part of some local configuration?), but perhaps in the long run there could be support for some Cloud Optimized TIN format for the CDN?

@rouault
Copy link
Member

rouault commented May 15, 2023

I could raise the limit, but did you check by patching locally that increasing it or removing it results in acceptable enough performance ? I guess it could take several seconds to load such JSON file

but perhaps in the long run there could be support for some Cloud Optimized TIN format for the CDN?

Potentially. R&D required to find or design an appropriate format and plug that into PROJ

@GlenRice-NOAA
Copy link
Author

did you check by patching locally that increasing it or removing it results in acceptable enough performance ?

No, we have not patch locally due to a lack of local utilities. Because of the file size we are pretty sure this is the issue, but we also wanted to see if there are other concerns or thoughts around this limit.

@rouault
Copy link
Member

rouault commented May 16, 2023

@GlenRice-NOAA Can you share this 80 MB JSON so I can have a try ?

@GlenRice-NOAA
Copy link
Author

Looks like we were able to reduce to ~60 MB each. Here are the examples.
AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.zip
AK_EEZ_ERTDM23_EPSG6318_MHW-LMSL.zip
AK_EEZ_ERTDM23_EPSG6318_LMSL-xGeoid20B.zip

@rouault
Copy link
Member

rouault commented May 16, 2023

Limit raised to 100 MB in #3736
You could still reduce the file size as currently your coordinates are with 13 decimal digits, ie a 1e-8 metre precision...

Side node: I'm not sure if there's something wrong in your triangulation or in PROJ implementation but it seems I only manage to transform coordinates at exact vertices triangles, not their inside:
cf

$ echo -144.0550994873047 70.07471466064453 0 |  bin/cct +proj=tinshift +file=./AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.json
    -144.0551        70.0747       -0.0964           inf
$ echo -144.0550994873  70.07471466064  0 |  bin/cct +proj=tinshift +file=./AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.json
# Record 0 TRANSFORMATION ERROR: -144.0550994873 70.07471466064 0
 ((null))

@Rjialceky
Copy link

Thanks for the side note. Your second test point is outside the triangulated domain...

image

...and we didn't have a fallback_strategy parameter specified—that's possible in format_version >= 1.1. I added to the AK_EEZ_ERTDM23_EPSG6318_MLLW-LMSL.json the item "fallback_strategy": "nearest_side", and bumped the version item accordingly: "format_version": "1.1". I tested this on a <10 MB version of the JSON tin with success.

@geoglrb
Copy link

geoglrb commented May 16, 2023

Hello! I am watching this issue/fix with interest, as I think support for larger files--and ideally perhaps someday support for cloud optimized TINs, or even better, for something like a multi-scaled tile pyramid for various levels of approximation--has applicability to work that I and some others want to do with TINshifting as well. Thanks!

@jjimenezshaw
Copy link
Contributor

Hi. I was looking at the code, and I saw a possible optimization in performance: "remember" the last triangle used, and check it before doing any search. If the user is usually moving on a small area, it is probably changing to a different triangle not so often. That reduces search time an cache pollution. But I do not know if it is worth it. (do you have any gut feeling?) I should do some tests to measure how fast/slow is really the search. But if you consider that it is currently fast enough with a big mesh @GlenRice-NOAA , I will not even try it.

In addition to that, I was surprised about the memory consumption that @rouault mentioned with the 60 MB file: more than 1 GB. I understand that it is due to the quadtree. I know a quick and lean algorithm, but needs the total mesh to be convex (it can be filled with "invalid" triangles to become convex, but must be complete).

@rouault
Copy link
Member

rouault commented May 19, 2023

"remember" the last triangle used, and check it before doing any search.

Hard to tell exactly how efficient it would be. Really depends on use cases, but that's quite cheap to implement.

An alternative to the quadtree is that if you have a triangulation where for each edge you know the identifier of the 2 triangles that share it, you can "walk" into the triangulation. It might perhaps require the triangulation to be a Delaunay one, but I'm not sure (GDAL's linear interpolation method uses libqhull to create a Delaunay triangulation and uses that technique to find in which triangle a points falls into, given an initial guess. That works well for well behaved triangulations, but as soon as you've a degenerate or close to be degenerate triangle, numeric precisions issues occur and you've to revert to something else, like a brute force search)
Given the input we currently have, the quadtree approach has the benefit of being simple and reasonably performant.

was surprised about the memory consumption that @rouault mentioned with the 60 MB file: more than 1 GB. I understand that it is due to the quadtree.

I haven't profiled but my bet would rather be on the JSON in-memory representation rather than the quadree. At least with libjson-c, I know that the RAM consumption of a GeoJSON file can typically be 10 to 20 times larger than its file size. I don't know about the nlohmann:json library we use. If each JSON object, ie each coordinate value, needs a dynamic allocation, that creates tons of smalls objects, and pointers to them.

@jjimenezshaw
Copy link
Contributor

That "walking" algorithm is the one I had in mind. The triangulation does not have to be Delaunay, just convex. I guess that it may not work very well with degenerated triangles.

I will test the performance of the current algorithm and the idea of "remember" the last triangle used.

rouault added a commit that referenced this issue May 23, 2023
tinshift: raise maximum size of JSON file to 100 MB (fixes #3732)
rouault added a commit that referenced this issue May 23, 2023
Loading a 60 MB JSON file takes ~ 3 seconds on a release build, and
consumes ~ 1.3 GB RAM according to Valgrind
rouault added a commit that referenced this issue May 24, 2023
[Backport 9.2] tinshift: raise maximum size of JSON file to 100 MB (fixes #3732)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants