Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ogr2ogr: use Arrow interface in reading and writing when possible #8544

Merged
merged 1 commit into from
Oct 12, 2023

Conversation

rouault
Copy link
Member

@rouault rouault commented Oct 11, 2023

That is when:

  • the input driver reports OLCFastGetArrowStream
  • and the output driver reports OLCFastWriteArrowBatch
  • and the input ArrowSchema is accepted by the target layer
  • and there are basically no ogr2ogr options (other than -dsco, -lco, -limit, -where and -spat).

Note that the efficiency in case of -where and -spat can be less than
using the classic OGRFeature strategy when the source is Parquet and a
lot of features are discarded.

With that Parquet->Parquet or GPKG->Parquet translation can be up to 3x times faster.

$ time ogr2ogr out.parquet nz-building-outlines.parquet --config OGR2OGR_USE_ARROW_API NO

real 0m11,246s
user 0m10,934s
sys 0m0,974s

$ time ogr2ogr out.parquet nz-building-outlines.parquet

real 0m4,311s
user 0m3,968s
sys 0m0,889s

$ time ogr2ogr out.parquet nz-building-outlines.gpkg --config OGR2OGR_USE_ARROW_API NO

real 0m12,120s
user 0m11,360s
sys 0m0,764s

$ time ogr2ogr out.parquet nz-building-outlines.gpkg

real 0m3,853s
user 0m5,053s
sys 0m1,028s

@PostholerCom
Copy link

This is huge! Thanks Even!

@rouault rouault force-pushed the ogr2ogr_arrow branch 2 times, most recently from 825568b to db19301 Compare October 11, 2023 21:50
That is when:
- the input driver reports OLCFastGetArrowStream
- and the output driver reports OLCFastWriteArrowBatch
- and the input ArrowSchema is accepted by the target layer
- and there are basically no ogr2ogr options (other than -dsco, -lco, -limit, -where and -spat).

Note that the efficiency in case of -where and -spat can be less than
using the classic OGRFeature strategy when the source is Parquet and a
lot of features are discarded.

With that Parquet->Parquet or GPKG->Parquet translation can be up to 3x
times faster.

$ time ogr2ogr out.parquet nz-building-outlines.parquet --config OGR2OGR_USE_ARROW_API NO

real    0m11,246s
user    0m10,934s
sys     0m0,974s

$ time ogr2ogr out.parquet nz-building-outlines.parquet

real    0m4,311s
user    0m3,968s
sys     0m0,889s

$ time ogr2ogr out.parquet nz-building-outlines.gpkg --config OGR2OGR_USE_ARROW_API NO

real    0m12,120s
user    0m11,360s
sys     0m0,764s

$ time ogr2ogr out.parquet nz-building-outlines.gpkg

real    0m3,853s
user    0m5,053s
sys     0m1,028s
@rouault rouault merged commit 2142523 into OSGeo:master Oct 12, 2023
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants