Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDALWarp doesn't use multiple cores #778

Closed
giovannicimolin opened this issue Mar 7, 2018 · 24 comments
Closed

GDALWarp doesn't use multiple cores #778

giovannicimolin opened this issue Mar 7, 2018 · 24 comments
Projects

Comments

@giovannicimolin
Copy link
Contributor

giovannicimolin commented Mar 7, 2018

Hey!

I'm running a very big project to identify bottlenecks on opendronemap and I've found out that gdalwarp is not running on multiple cores.

Output line from OpenDroneMap:
[DEBUG] running gdalwarp -cutline /code/odm_georeferencing/odm_georeferenced_model.bounds.shp -crop_to_cutline -co NUM_THREADS=ALL_CPUS -co BIGTIFF=IF_SAFER -co BLOCKYSIZE=512 -co COMPRESS=DEFLATE -co BLOCKXSIZE=512 -co TILED=YES -co PREDICTOR=2 /code/odm_orthophoto/odm_orthophoto.original.tif /code/odm_orthophoto/odm_orthophoto.tif
htop screenshot:
asdfadf

Isn't -co NUM_THREADS=ALL_CPUS suposed to make it run on multiple threads?

@giovannicimolin
Copy link
Contributor Author

giovannicimolin commented Mar 7, 2018

Found something here: http://osgeo-org.1560.x6.nabble.com/gdal-dev-gdalwarp-and-gdaladdo-in-multi-threaded-mode-td5252818.html

Apparently there's 2 parameters on gdalwarp to speed up calculations using multiple processors/cores:
-multi
and
-wo NUM_THREADS=ALL_CPUS

@giovannicimolin
Copy link
Contributor Author

This may not be a bug: if this task is I/O bound there's not much we can do besides using faster disks...

@giovannicimolin
Copy link
Contributor Author

The task doesn't seem to be I/O bound, i'ts only showing some 4 to 10 Mb/s read peaks on an aws instance with instance store (500Mb/s nominal speed).
hdd

@dakotabenjamin
Copy link
Member

I find myself constantly shaking my fists at GDAL. The code is found here:
https://github.com/OpenDroneMap/OpenDroneMap/blob/master/opendm/cropper.py#L46

Can you try adding -multi like below:

            run('gdalwarp -cutline {shapefile_path} '
                '-crop_to_cutline '
                '-multi '
                '{options} '
                '{geotiffInput} '
                '{geotiffOutput} '.format(**kwargs))

@giovannicimolin
Copy link
Contributor Author

giovannicimolin commented Mar 7, 2018

Nor -multi neither -co NUM_THREADS=ALL_CPUS or -wo NUM_THREADS=ALL_CPUS seems to make it work on multiple cores.

@dakotabenjamin
Copy link
Member

OK then. @pierotofy contributed this code, perhaps he can provide some insight

@pierotofy
Copy link
Member

Use multithreaded warping implementation. Two threads will be used to process chunks of image and perform input/output operation simultaneously. Note that computation is not multithreaded itself. To do that, you can use the -wo NUM_THREADS=val/ALL_CPUS option, which can be combined with -multi

http://www.gdal.org/gdalwarp.html

So try to pass both?

@pierotofy
Copy link
Member

Also what version of gdalwarp are we running?

@giovannicimolin
Copy link
Contributor Author

giovannicimolin commented Mar 9, 2018

I've tried passing both options too with no sucess.
We're using GDAL 2.1.3

@pierotofy
Copy link
Member

Did some digging, what happens if you pass:

-co GDAL_NUM_THREADS=ALL_CPUS ?

-wo --> options passed to warp algorithm (which doesn't affect speed here, because we don't warp anything, we are just cropping)
-co --> options passed to output driver, and GDAL_NUM_THREADS is set to do compression on the main thread (slow)
-multi --> Enables multithreaded warping implementation (uses two threads for input and output), but again, since we don't do warping, I don't think this helps us.

If it works, could you open a PR?

@pierotofy
Copy link
Member

pierotofy commented Mar 9, 2018

Would be interesting to also see if performance increases when passing -co GTIFF_VIRTUAL_MEM_IO=IF_ENOUGH_RAM and -co GTIFF_DIRECT_IO=YES. http://www.gdal.org/frmt_gtiff.html

@giovannicimolin
Copy link
Contributor Author

giovannicimolin commented Mar 9, 2018

For -co GTIFF_VIRTUAL_MEM_IO=IF_ENOUGH_RAM I get:
Warning 6: driver GTiff does not support creation option GTIFF_VIRTUAL_MEM_IO

For -co GTIFF_DIRECT_IO=YES I get:
Warning 6: driver GTiff does not support creation option GTIFF_DIRECT_IO

Also this -co GDAL_NUM_THREADS=ALL_CPUS doesn't work:
Warning 6: driver GTiff does not support creation option GDAL_NUM_THREADS

@pierotofy
Copy link
Member

pierotofy commented Mar 9, 2018

If passing:

-multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS -oo NUM_THREADS=ALL_CPUS -doo NUM_THREADS=ALL_CPUS

Doesn't improve performance, then I'm not sure what's the cause. When I pass both -co and -wo I see full usage of all my cores. 😕

@pierotofy
Copy link
Member

Running GDAL 2.2.3, released 2017/11/20 on my machine.

@giovannicimolin
Copy link
Contributor Author

giovannicimolin commented Mar 9, 2018

GDAL Version:
➜ test gdalwarp --version GDAL 2.2.3, released 2017/11/20

Command used:
gdalwarp -cutline odm_georeferenced_model.bounds.shp -crop_to_cutline -multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS -oo NUM_THREADS=ALL_CPUS -doo NUM_THREADS=ALL_CPUS -co BIGTIFF=IF_SAFER -co BLOCKYSIZE=512 -co COMPRESS=DEFLATE -co BLOCKXSIZE=512 -co TILED=YES -co PREDICTOR=2 -co GTIFF_VIRTUAL_MEM_IO=IF_ENOUGH_RAM -co GTIFF_DIRECT_IO=YES odm_orthophoto.original.tif odm_orthophoto.tif

Runs only on 1 core.
GDAL was built from SVN branch 2.2.

gdalwarp

@pierotofy
Copy link
Member

Thanks for the screenshots/info.

Mm, could you share your odm_georeferenced_model.bounds.shp and odm_orthophoto.original.tif file? Trying to understand why I'm observing different results 😄

@giovannicimolin
Copy link
Contributor Author

I've sent you the requested files on a private channel on Gitter, as I can't make them publicly available.

@pierotofy
Copy link
Member

pierotofy commented Mar 13, 2018

So, the performance is almost certainly I/O and memory bound based on my observations. This is especially true for larger GeoTIFFs (which is what you are testing with).

Options --> Time for 1 tick of processing

-wo NUM_THREADS=ALL_CPUS --> 2:59
-co NUM_THREADS=ALL_CPUS --> 3:05
-co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS --> 3:06
-multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS --> 3:07
--config GDAL_CACHEMAX 500 -wm 500 --> 1:34 (makes sense, since it loads more blocks into memory)
-multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS --config GDAL_CACHEMAX 500 -wm 500 --> 1:34 (no improvements here, thus memory bound)
--config GDAL_CACHEMAX 3000 -wm 3000 --> 0.33 (3G of RAM required here however)
--config GDAL_CACHEMAX 9000 -wm 9000 --> 0.21 (9GB, enough to load your GeoTIFF in memory all at once)

So the bottom line is that I don't think (hope somebody proves me wrong) there's not much to be gained by adding more cores (this might have been true if we were doing warping, but since we're just cropping I suspect most of the time is spent just doing I/O).

We should tweak GDAL_CACHEMAX and -wm, but we need to be careful, choosing too high of a value will make the program fail (bad). Perhaps we can use Python to query the available memory, divide by 3 and use that.

PR for this would be welcome if anyone wants to take a stab at it.

@giovannicimolin
Copy link
Contributor Author

giovannicimolin commented Mar 15, 2018

--config GDAL_CACHEMAX $VALUE -wm $VALUE
Where $VALUE is half the free memory.

Can I put these parameters on all gdal_options?
I believe it will improve overall performance on all GDAL operations if the parameters are supported.

PR Incoming... 😄

@pierotofy
Copy link
Member

I would recommend using %X for GDAL_CACHEMAX: This option controls the default GDAL raster block cache size. If its value is small (less than 100000), it is assumed to be measured in megabytes, otherwise in bytes. Starting with GDAL 2.1, the value can be set to "X%" to mean X% of the usable physical RAM. Note that this value is only consulted the first time the cache size is requested overriding the initial default (40MB up to GDAL 2.0, 5% of the usable physical RAM starting with GDAL 2.1) https://trac.osgeo.org/gdal/wiki/ConfigOptions

wm requires more thought. https://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp

`The -wm flag affects the warping algorithm. The warper will total up the memory required to hold the input and output image arrays and any auxilary masking arrays and if they are larger than the "warp memory" allowed it will subdivide the chunk into smaller chunks and try again.

If the -wm value is very small there is some extra overhead in doing many small chunks so setting it larger is better but it is a matter of diminishing returns.`

So adding more is not necessarily going to improve performance. I wouldn't add it to all commands unless you can measure a tangible improvement in performance.

@smathermather
Copy link
Contributor

Perhaps related, but are we using -co "BLOCKXSIZE=value" -co "BLOCKYSIZE=value" when we create the initial tif? This could help in ensuring we have small chunks to stream through further operations, and might help with memory bound operations. I usually set it to -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512" but have set it as high as 4096.

@smathermather
Copy link
Contributor

I should say, I've never tested its effect on future GDAL operations, but it'd be good practice to use BLOCKXSIZE and BLOCKYSIZE for serving the data to web services, viewing in QGIS, etc..

@sbonaime
Copy link
Contributor

I just compared gdalwarp (GDAL 3.2.2, released 2021/03/05) with and without -multi option with NUM_THREADS=10 with some test data. I can see a 20% time reduction for this step

10 cpu
84.96s user 15.52s system 218% cpu 45.997 total
83.63s user 16.18s system 212% cpu 46.973 total

multi and 10 cpu
79.12s user 15.75s system 247% cpu 38.315 total
78.46s user 15.73s system 237% cpu 39.653 total

See also this threed
https://gis.stackexchange.com/questions/239101/does-gdal-support-parallel-processing

@pierotofy Can you add this option ?
Thanks

@pierotofy
Copy link
Member

Thanks @sbonaime, but did you test these benchmarks within ODM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Backlog
  
Done
Development

No branches or pull requests

5 participants