-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: performance challenge with FI 2m DEM via VRT #90
Comments
Yes that is much too slow! You could try moving a single tile file to its own folder, adding a new opentopodata dataset for that folder, and see how long it takes to query points within that file's bounds? That will tell us if the slowness comes from the VRT or the underlying data files. That said, I'm almost certain the VRT is the issue. The 2m Finland DEM will mean about 7000 tiles in your VRT. Unfortunately gdal (and therefore opentopodata) is surprisingly slow and naive in parsing large VRTs, I suspect that is the bottleneck. It's been on my todo-list to improve this situation (by manually parsing VRTs instead of relying on gdal, or building an in-memory spatial index, or nested vrts) but I probably won't be getting round to that any time soon sorry. Because the Finish CRS is nicely gridded, it is possible to use the files directly with opentopodata if you're willing to rename them according to the lower-left corner. File
If you rename the file to datasets:
- name: finland-2m-renamed
path: data/finland-2m-renamed/
filename_tile_size: 6000 # Metres
filename_epsg: 3067 # ETRS-TM35FIN then opentopdata can find the files directly without a VRT. I know that it's not ideal to rename your files, especially when you have to open them with gdal/rasterio to find the bounds! |
@ajnisbet very much appreciate your super fast reply with detailed instructions. Your guess on VRT being the problem was indeed spot on. I tested with a single renamed tile, queried with 95 coordinates, response came back in ~360ms with my faster server. I also tested (just to be sure) the elevation values for those same 95 coords - small differences between vrt vs. no-vrt approach (range -0.21m to 0.47m, mean 0.03m, median 0.01m, std deviation 0.11m so good for my needs). Now I just need to build a little script to do the renaming. I somehow recall seeing something like that either in docs or in some issue discussion? Anyways, I can get it sorted one way or another. Again, thanks for your support! It was also good to learn that GDAL is having the problem with VRT performance. I'm using quite a bit of other data in Finnish ETRS-TM35FIN as VRT because it is convenient - but apparently not very performant! |
For others potentially facing the same challenge, here is a simple Python script for renaming files to SRTM naming convention utilising the information in the GeoTIFF-files:
|
@ajnisbet After I did the renaming I still did not get it working with the whole tileset - it only worked with a single tile test that I did previously. The problem was obviously (though I chased some other reasons as well...) in the file naming. Eventually with some code reading and print outs I figured out that my tileset did not comply with the lower left of each tile being a multiple of the "base" (in my case 6000 m). Rather for easting there was an offset of 2000 as was visible also in your example tile earlier in this thread: lower left 356000 (divided by 6000 gives quotient of 59, remainder of 2000). I made a quick hack to utils.py / |
Would a config option like (Again sorry for the awkward API! This project started off just for SRTM tiles, and has been hackily extended to cover more usecases until I add proper spatial indexing). |
@ajnisbet it would, but it would have to be separate for northing and easting (for Finnish 2m DEM there is an offset of 2000 only for the easting, northing is based on 0). And absolutely no need to feel sorry or bad for anything! You have done quite some heavy lifting to create OpenTopoData, hopefully the rest of us can contribute a bit into further enhancing it - and with OSS one can always customize / extend it as needed (BTW very easy to understand code with helpful comments). |
Ok good point about the differing offsets, I also don't want to support And thank you! |
@ajnisbet Ok, understood. Anyways I made a fork (tile_size_offset branch) with a code change for supporting |
This would probably need all of
and I think would only help a small number of usecases. Closing in favour of #91 |
First: thanks for developing and open sourcing OpenTopoData - great documentation, very easy to set-up.
I'm using a high resolution Finland DEM data with 2m grid. I built a VRT for the data set which is in Finnish ETRS-TM35FIN CRS, tiles are 3k x 3k with LZW compression, biggest tiles over 25MB.
I'm feeding to elevation API coordinates from a route planner (just a little less dense than a GPX track would be) so coordinates are close to each other and many / most for my test cases should fit a single tile. My challenge is the performance: for 95 coordinates (which is my chunking threshold) I get about 12 - 15sec response time on my 6 core (Intel 8700K) / 64GB / data on fast HDD server, when using my other test server a low end 2vCPU 2GB cloud server with cloud SSD I'm getting for same data a 26sec response time. Single coordinate on faster server is about 460ms, 10 coordinates 1663ms.
I would be very grateful for ideas how to improve the performance. Is the VRT / CRS / tile format the cause for the dramatically lower performance than the benchmark in the documentation pages?
The text was updated successfully, but these errors were encountered: