-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option for merging output into a single file #8
Comments
The jobs parameter was actually working (at least on phyghtmap 2.23, the latest one); but due to the constraints put on parallelism (notably to handle single output), it was actually quite difficult to really use more than 2-3 CPUs at any given time. This is why I took the decision to remove single file output as:
I could probably re-introduce the single output option without adding too much complexity, but it means I probably won't bother handling parallelization in this case. |
Is it still handling the node numbering right? Did you have any problems with merging using osmconvert? If current approach and then merge with osmconvert then delete the single files is much faster vs writing to single file up front I think it's okay. Should mention in instructions however how you intend them to be merged (for my usecase single files is not an option - but if I know it's reliable to merge fine too). Ah okay - I could not see any speed diffrence between jobs=2 or jobs=12 (hexacore CPU with 12 threads). Jobs=1 was much slower on 2.23. |
I kept the logic to avoid nodes & ways numbering overlap, so the resulting files should merge nicely. I didn't try though. |
Well merging with osmconvert only works for o5m files and is quite slow... Multiple pbf files cannot be merged. Writing to 05m is much slower than writing to pbf however. So a bit unexact as Germany is small and I didn't look at seconds - but roughly it takes twice the time for me - 4 minutes instead of 2 minutes just writing to multiple pbf files. Here is a my sample command for germany (note somehow bash has a problem with _ so I need to set a variable for it): Underline=_ So maybe in that case writing pbf directly would be faster? I don't know any tool that is faster than osmconvert. |
Following up here (instead of the closed Topic on Europe)- the single file option will be needed because otherwis it is not possible to create continents into a single file. Compiling Europe 10m interval to o5m with pyhgtmap took 2:20 hours plus 18minuts to merge them with osmium. So for continents, Russia, China, Canada, USA and maybe Brazil if you want it in a single file - it would be needed to have a "slow" output into a single pbf file. And yeah it's clear writing to pbf is much faster than writing to o5m. That's running --max-nodes-per-tile=0. I still wonder a bit about the comment - Actual writing of output file is now the most time-consuming part - because the time difference above for Europe between 05m and pbf is certainly not down to writing to HDD. While my HDD isn't blazing fast it can write 200MB per second (continous) or maybe 50MB/s for less continous and has a 512MB buffer that would speed up even more for files less than 1GB in size (server grade HDD) |
I'm off for a week, I'll have a look to the single file output when back. Concerning the file generation, it's not the IO taking time (it's actually using another thread with pyosmium, and is done in batches), but the computing. Pyosmium interface requires a function call per node, and for millions of nodes this takes a lot of CPU. I think in the latest profiling I did this now more than half of the total processing time. |
More details concerning this point:
Profiling the generation of a single output from 2 view1 local files (with
At best, parallelization could allow processing 2 in parallel of (1+3), which would be ~25% improvement of the overall elapsed time. Not really worth the added complexity until one find a way to optimize the actual PBF output part. |
Thanks a lot for adding it back. |
Phyghtmap used to output to a single -pbf file. Would it be possible to have the same behaviour again instead of one output file per input file?
I know I could use osmconvert to do this - but I think it would be easier if pyhgtmap can output directly to a single file.
And yeah - great improvements overall and much faster now! Is the jobs parameter doing anything? In phyghtmap it used to be broken that I could max a value of 2 but no more (didn't check this for a long time - so maybe was solved at some point).
The text was updated successfully, but these errors were encountered: