GDALWarp doesn't use multiple cores #778

giovannicimolin · 2018-03-07T18:56:34Z

Hey!

I'm running a very big project to identify bottlenecks on opendronemap and I've found out that gdalwarp is not running on multiple cores.

Output line from OpenDroneMap:
[DEBUG] running gdalwarp -cutline /code/odm_georeferencing/odm_georeferenced_model.bounds.shp -crop_to_cutline -co NUM_THREADS=ALL_CPUS -co BIGTIFF=IF_SAFER -co BLOCKYSIZE=512 -co COMPRESS=DEFLATE -co BLOCKXSIZE=512 -co TILED=YES -co PREDICTOR=2 /code/odm_orthophoto/odm_orthophoto.original.tif /code/odm_orthophoto/odm_orthophoto.tif
htop screenshot:

Isn't -co NUM_THREADS=ALL_CPUS suposed to make it run on multiple threads?

The text was updated successfully, but these errors were encountered:

giovannicimolin · 2018-03-07T19:12:54Z

Found something here: http://osgeo-org.1560.x6.nabble.com/gdal-dev-gdalwarp-and-gdaladdo-in-multi-threaded-mode-td5252818.html

Apparently there's 2 parameters on gdalwarp to speed up calculations using multiple processors/cores:
-multi
and
-wo NUM_THREADS=ALL_CPUS

giovannicimolin · 2018-03-07T19:50:03Z

This may not be a bug: if this task is I/O bound there's not much we can do besides using faster disks...

giovannicimolin · 2018-03-07T19:50:26Z

The task doesn't seem to be I/O bound, i'ts only showing some 4 to 10 Mb/s read peaks on an aws instance with instance store (500Mb/s nominal speed).

dakotabenjamin · 2018-03-07T19:51:52Z

I find myself constantly shaking my fists at GDAL. The code is found here:
https://github.com/OpenDroneMap/OpenDroneMap/blob/master/opendm/cropper.py#L46

Can you try adding -multi like below:

            run('gdalwarp -cutline {shapefile_path} '
                '-crop_to_cutline '
                '-multi '
                '{options} '
                '{geotiffInput} '
                '{geotiffOutput} '.format(**kwargs))

giovannicimolin · 2018-03-07T19:59:03Z

Nor -multi neither -co NUM_THREADS=ALL_CPUS or -wo NUM_THREADS=ALL_CPUS seems to make it work on multiple cores.

dakotabenjamin · 2018-03-07T20:00:25Z

OK then. @pierotofy contributed this code, perhaps he can provide some insight

pierotofy · 2018-03-07T21:25:45Z

Use multithreaded warping implementation. Two threads will be used to process chunks of image and perform input/output operation simultaneously. Note that computation is not multithreaded itself. To do that, you can use the -wo NUM_THREADS=val/ALL_CPUS option, which can be combined with -multi

http://www.gdal.org/gdalwarp.html

So try to pass both?

pierotofy · 2018-03-07T21:31:15Z

Also what version of gdalwarp are we running?

giovannicimolin · 2018-03-09T12:54:44Z

I've tried passing both options too with no sucess.
We're using GDAL 2.1.3

pierotofy · 2018-03-09T16:32:36Z

Did some digging, what happens if you pass:

-co GDAL_NUM_THREADS=ALL_CPUS ?

-wo --> options passed to warp algorithm (which doesn't affect speed here, because we don't warp anything, we are just cropping)
-co --> options passed to output driver, and GDAL_NUM_THREADS is set to do compression on the main thread (slow)
-multi --> Enables multithreaded warping implementation (uses two threads for input and output), but again, since we don't do warping, I don't think this helps us.

If it works, could you open a PR?

pierotofy · 2018-03-09T16:34:26Z

Would be interesting to also see if performance increases when passing -co GTIFF_VIRTUAL_MEM_IO=IF_ENOUGH_RAM and -co GTIFF_DIRECT_IO=YES. http://www.gdal.org/frmt_gtiff.html

giovannicimolin · 2018-03-09T16:48:00Z

For -co GTIFF_VIRTUAL_MEM_IO=IF_ENOUGH_RAM I get:
Warning 6: driver GTiff does not support creation option GTIFF_VIRTUAL_MEM_IO

For -co GTIFF_DIRECT_IO=YES I get:
Warning 6: driver GTiff does not support creation option GTIFF_DIRECT_IO

Also this -co GDAL_NUM_THREADS=ALL_CPUS doesn't work:
Warning 6: driver GTiff does not support creation option GDAL_NUM_THREADS

pierotofy · 2018-03-09T17:31:19Z

If passing:

-multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS -oo NUM_THREADS=ALL_CPUS -doo NUM_THREADS=ALL_CPUS

Doesn't improve performance, then I'm not sure what's the cause. When I pass both -co and -wo I see full usage of all my cores. 😕

pierotofy · 2018-03-09T17:31:49Z

Running GDAL 2.2.3, released 2017/11/20 on my machine.

giovannicimolin · 2018-03-09T18:15:19Z

GDAL Version:
➜ test gdalwarp --version GDAL 2.2.3, released 2017/11/20

Command used:
gdalwarp -cutline odm_georeferenced_model.bounds.shp -crop_to_cutline -multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS -oo NUM_THREADS=ALL_CPUS -doo NUM_THREADS=ALL_CPUS -co BIGTIFF=IF_SAFER -co BLOCKYSIZE=512 -co COMPRESS=DEFLATE -co BLOCKXSIZE=512 -co TILED=YES -co PREDICTOR=2 -co GTIFF_VIRTUAL_MEM_IO=IF_ENOUGH_RAM -co GTIFF_DIRECT_IO=YES odm_orthophoto.original.tif odm_orthophoto.tif

Runs only on 1 core.
GDAL was built from SVN branch 2.2.

pierotofy · 2018-03-09T21:53:08Z

Thanks for the screenshots/info.

Mm, could you share your odm_georeferenced_model.bounds.shp and odm_orthophoto.original.tif file? Trying to understand why I'm observing different results 😄

giovannicimolin · 2018-03-12T20:37:39Z

I've sent you the requested files on a private channel on Gitter, as I can't make them publicly available.

pierotofy · 2018-03-13T02:33:05Z

So, the performance is almost certainly I/O and memory bound based on my observations. This is especially true for larger GeoTIFFs (which is what you are testing with).

Options --> Time for 1 tick of processing

-wo NUM_THREADS=ALL_CPUS --> 2:59
-co NUM_THREADS=ALL_CPUS --> 3:05
-co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS --> 3:06
-multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS --> 3:07
--config GDAL_CACHEMAX 500 -wm 500 --> 1:34 (makes sense, since it loads more blocks into memory)
-multi -co NUM_THREADS=ALL_CPUS -wo NUM_THREADS=ALL_CPUS --config GDAL_CACHEMAX 500 -wm 500 --> 1:34 (no improvements here, thus memory bound)
--config GDAL_CACHEMAX 3000 -wm 3000 --> 0.33 (3G of RAM required here however)
--config GDAL_CACHEMAX 9000 -wm 9000 --> 0.21 (9GB, enough to load your GeoTIFF in memory all at once)

So the bottom line is that I don't think (hope somebody proves me wrong) there's not much to be gained by adding more cores (this might have been true if we were doing warping, but since we're just cropping I suspect most of the time is spent just doing I/O).

We should tweak GDAL_CACHEMAX and -wm, but we need to be careful, choosing too high of a value will make the program fail (bad). Perhaps we can use Python to query the available memory, divide by 3 and use that.

PR for this would be welcome if anyone wants to take a stab at it.

giovannicimolin · 2018-03-15T16:12:17Z

--config GDAL_CACHEMAX $VALUE -wm $VALUE
Where $VALUE is half the free memory.

Can I put these parameters on all gdal_options?
I believe it will improve overall performance on all GDAL operations if the parameters are supported.

PR Incoming... 😄

pierotofy · 2018-03-16T13:20:01Z

I would recommend using %X for GDAL_CACHEMAX: This option controls the default GDAL raster block cache size. If its value is small (less than 100000), it is assumed to be measured in megabytes, otherwise in bytes. Starting with GDAL 2.1, the value can be set to "X%" to mean X% of the usable physical RAM. Note that this value is only consulted the first time the cache size is requested overriding the initial default (40MB up to GDAL 2.0, 5% of the usable physical RAM starting with GDAL 2.1) https://trac.osgeo.org/gdal/wiki/ConfigOptions

wm requires more thought. https://trac.osgeo.org/gdal/wiki/UserDocs/GdalWarp

`The -wm flag affects the warping algorithm. The warper will total up the memory required to hold the input and output image arrays and any auxilary masking arrays and if they are larger than the "warp memory" allowed it will subdivide the chunk into smaller chunks and try again.

If the -wm value is very small there is some extra overhead in doing many small chunks so setting it larger is better but it is a matter of diminishing returns.`

So adding more is not necessarily going to improve performance. I wouldn't add it to all commands unless you can measure a tangible improvement in performance.

smathermather · 2018-03-16T16:23:24Z

Perhaps related, but are we using -co "BLOCKXSIZE=value" -co "BLOCKYSIZE=value" when we create the initial tif? This could help in ensuring we have small chunks to stream through further operations, and might help with memory bound operations. I usually set it to -co "BLOCKXSIZE=512" -co "BLOCKYSIZE=512" but have set it as high as 4096.

smathermather · 2018-03-16T16:25:01Z

I should say, I've never tested its effect on future GDAL operations, but it'd be good practice to use BLOCKXSIZE and BLOCKYSIZE for serving the data to web services, viewing in QGIS, etc..

sbonaime · 2023-06-27T11:30:41Z

I just compared gdalwarp (GDAL 3.2.2, released 2021/03/05) with and without -multi option with NUM_THREADS=10 with some test data. I can see a 20% time reduction for this step

10 cpu
84.96s user 15.52s system 218% cpu 45.997 total
83.63s user 16.18s system 212% cpu 46.973 total

multi and 10 cpu
79.12s user 15.75s system 247% cpu 38.315 total
78.46s user 15.73s system 237% cpu 39.653 total

See also this threed
https://gis.stackexchange.com/questions/239101/does-gdal-support-parallel-processing

@pierotofy Can you add this option ?
Thanks

pierotofy · 2023-06-27T16:31:55Z

Thanks @sbonaime, but did you test these benchmarks within ODM?

pierotofy added enhancement help wanted labels Mar 13, 2018

dakotabenjamin added this to To do in Backlog Mar 17, 2018

giovannicimolin mentioned this issue Mar 19, 2018

Make GDAL use 50% of the available memory when cropping #786

Merged

pierotofy closed this as completed in #786 Mar 20, 2018

dakotabenjamin moved this from To do to Done in Backlog Mar 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GDALWarp doesn't use multiple cores #778

GDALWarp doesn't use multiple cores #778

giovannicimolin commented Mar 7, 2018 •

edited

giovannicimolin commented Mar 7, 2018 •

edited

giovannicimolin commented Mar 7, 2018

giovannicimolin commented Mar 7, 2018

dakotabenjamin commented Mar 7, 2018

giovannicimolin commented Mar 7, 2018 •

edited

dakotabenjamin commented Mar 7, 2018

pierotofy commented Mar 7, 2018

pierotofy commented Mar 7, 2018

giovannicimolin commented Mar 9, 2018 •

edited

pierotofy commented Mar 9, 2018

pierotofy commented Mar 9, 2018 •

edited

giovannicimolin commented Mar 9, 2018 •

edited

pierotofy commented Mar 9, 2018 •

edited

pierotofy commented Mar 9, 2018

giovannicimolin commented Mar 9, 2018 •

edited

pierotofy commented Mar 9, 2018

giovannicimolin commented Mar 12, 2018

pierotofy commented Mar 13, 2018 •

edited

giovannicimolin commented Mar 15, 2018 •

edited

pierotofy commented Mar 16, 2018

smathermather commented Mar 16, 2018

smathermather commented Mar 16, 2018

sbonaime commented Jun 27, 2023

pierotofy commented Jun 27, 2023

GDALWarp doesn't use multiple cores #778

GDALWarp doesn't use multiple cores #778

Comments

giovannicimolin commented Mar 7, 2018 • edited

giovannicimolin commented Mar 7, 2018 • edited

giovannicimolin commented Mar 7, 2018

giovannicimolin commented Mar 7, 2018

dakotabenjamin commented Mar 7, 2018

giovannicimolin commented Mar 7, 2018 • edited

dakotabenjamin commented Mar 7, 2018

pierotofy commented Mar 7, 2018

pierotofy commented Mar 7, 2018

giovannicimolin commented Mar 9, 2018 • edited

pierotofy commented Mar 9, 2018

pierotofy commented Mar 9, 2018 • edited

giovannicimolin commented Mar 9, 2018 • edited

pierotofy commented Mar 9, 2018 • edited

pierotofy commented Mar 9, 2018

giovannicimolin commented Mar 9, 2018 • edited

pierotofy commented Mar 9, 2018

giovannicimolin commented Mar 12, 2018

pierotofy commented Mar 13, 2018 • edited

giovannicimolin commented Mar 15, 2018 • edited

pierotofy commented Mar 16, 2018

smathermather commented Mar 16, 2018

smathermather commented Mar 16, 2018

sbonaime commented Jun 27, 2023

pierotofy commented Jun 27, 2023

giovannicimolin commented Mar 7, 2018 •

edited

giovannicimolin commented Mar 7, 2018 •

edited

giovannicimolin commented Mar 7, 2018 •

edited

giovannicimolin commented Mar 9, 2018 •

edited

pierotofy commented Mar 9, 2018 •

edited

giovannicimolin commented Mar 9, 2018 •

edited

pierotofy commented Mar 9, 2018 •

edited

giovannicimolin commented Mar 9, 2018 •

edited

pierotofy commented Mar 13, 2018 •

edited

giovannicimolin commented Mar 15, 2018 •

edited