Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with OSM Driver and interleaved error message and output #2100

Closed
Robinlovelace opened this issue Dec 12, 2019 · 14 comments
Closed

Issues with OSM Driver and interleaved error message and output #2100

Robinlovelace opened this issue Dec 12, 2019 · 14 comments

Comments

@Robinlovelace
Copy link

Dear GDAL dev team, I think there is an issue with the osm driver. The error message it produces does not seem to be useful (it says Use OGR_INTERLEAVED_READING=YES when that option is specified) and when using INTERLEAVED_READING=YES, which I think is correctly documented here https://gdal.org/drivers/vector/osm.html#interleaved-reading , the results seem to have no features.

Please could you try to reproduce the (hopefully reproducible) example below?

# Aim: demonstrate issues with the OSM driver

gdalinfo --version
# $ GDAL 2.4.2, released 2019/06/28

# get data
wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf

ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf lines > x2
# $ Warning 6: driver OSM does not support open option OGR_INTERLEAVED_READING
# $ ERROR 1: Too many features have accumulated in points layer. Use OGR_INTERLEAVED_READING=YES mode
ogrinfo -oo INTERLEAVED_READING=YES *pbf lines > x3
wc x3
# $  27  65 740 x3 # contains no features

This may be related to #1785.

See here for use case and some additional tests from within R: r-spatial/sf#1213

@jratike80
Copy link
Collaborator

The name of the open option is not the same as the name of the config option. It is documented but you must be awake when reading.

This works for me with GDAL 3.1.0dev

ogrinfo -oo INTERLEAVED_READING=YES greater-london-latest.osm.pbf
INFO: Open of `greater-london-latest.osm.pbf'
      using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon)
5: other_relations (Geometry Collection)

@Robinlovelace
Copy link
Author

Hi @jratike80 I am awake, but new to working with GDAL at the command line, apologies. I think the commit above makes the message clearer, many thanks for the quick fix @rouault.

The second part of the question was related to the output, which for me is:

ogrinfo -oo INTERLEAVED_READING=YES *pbf lines
INFO: Open of `greater-london-latest.osm.pbf'
      using driver `OSM' successful.

Layer name: lines
Geometry: Line String
Feature Count: -1
Extent: (-0.511482, 51.285540) - (0.335437, 51.693440)
Layer SRS WKT:
GEOGCS["WGS 84",
    DATUM["WGS_1984",
        SPHEROID["WGS 84",6378137,298.257223563,
            AUTHORITY["EPSG","7030"]],
        AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.0174532925199433,
        AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
osm_id: String (0.0)
name: String (0.0)
highway: String (0.0)
waterway: String (0.0)
aerialway: String (0.0)
barrier: String (0.0)
man_made: String (0.0)
z_order: Integer (0.0)
other_tags: String (0.0)

As I say I'm new to this, but would you expect to see more data? Running

ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf lines > x2

Seems to generate 10+MB of data but yield the error messages noted above. The question is how to make GDAL output the data with the interleaving mode? Apologies for a beginners question and hoping to learn.

@Robinlovelace
Copy link
Author

P.s. I can also reproduce the output in your reproducible example @jratike80 on my older version of GDAL:

 ogrinfo -oo INTERLEAVED_READING=YES greater-london-latest.osm.pbf
INFO: Open of `greater-london-latest.osm.pbf'
      using driver `OSM' successful.
1: points (Point)
2: lines (Line String)
3: multilinestrings (Multi Line String)
4: multipolygons (Multi Polygon)
5: other_relations (Geometry Collection)

@jratike80
Copy link
Collaborator

I was blind myself for the difference between OGR_INTERLEAVED_READING and INTERLEAVED_READING for a long time.

I don't know why ogrinfo does not list the data. Perhaps Feature Count: -1 means that it is too ineffective to show full ogrinfo output from .pbf format.

You can pretty fast convert all data (remembering the importance of osmconf.ini) into geopackage and read ogrinfo from that

ogr2ogr -f gpkg -oo INTERLEAVED_READING=YES osm.gpkg greater-london-latest.osm.pbf

@Robinlovelace
Copy link
Author

Thanks for the reply @jratike80 but it still does not resolve the issue: how to use GDAL to read-in a medium sized dataset using the interleaved mode that is needed for the result to contain all the features in large files? The commands that one would expect to work from the documentation yield an empty object, is this not not a bug or was it designed to work like this?

Very useful to see how to convert the file into a GeoPackage file, but that would not be computationally efficient in cases when you need to read in large files.

Perhaps Feature Count: -1 means that it is too ineffective to show full ogrinfo output from .pbf format.

I'm not quit sure what this means but should the output of ogrinfo be different depending on the user options specified for reading-in the data?

You can pretty fast convert all data (remembering the importance of osmconf.ini) into geopackage and read ogrinfo from that

Can you do that on a per layer basis?

Nudge @edzer who developed the sf package which has an issue that I think can only be resolved if GDAL is able to read large pbf files.

@rouault
Copy link
Member

rouault commented Dec 14, 2019

You can trigger random layer reading with the -rl switch of ogrinfo: https://gdal.org/programs/ogrinfo.html#cmdoption-ogrinfo-rl

$ ogrinfo -rl greater-london-latest.osm.pbf lines

But as noted by @jratike80 , direct reading from (large enough) .osm.pbf file is inefficient due to how the format is structured. Conversion to something else like GeoPackage is the recommended way of using them.
You can extract just the lines with (no need for -oo INTERLEAVED_READING=YES as ogr2ogr is smart enough to use the appropriate reading mode for .osm.pbf files):
$ ogr2ogr out.gpkg greater-london-latest.osm.pbf lines

@Robinlovelace
Copy link
Author

Many thanks for the quick and enlightening reply Even. I will try that out now and let you know how I get on.

@Robinlovelace
Copy link
Author

Yes that seems to solve the issue:

ogrinfo -rl greater-london-latest.osm.pbf lines > x4                               
wc x4
#$  2442025   8868966 110216785 x4

@Robinlovelace
Copy link
Author

Robinlovelace commented Dec 14, 2019

And, as you say the conversion is quicker than the read:

read = function() system("ogrinfo -rl greater-london-latest.osm.pbf lines > x4")
convert = function() system("ogr2ogr out.gpkg greater-london-latest.osm.pbf lines")
bench::mark(iterations = 1, check = FALSE, read(), convert())
0...10...20...30...40...50...60...70...80...90...100 - done.
0...10...20...30...40...50...60...70...80...90...100 - done.
# A tibble: 2 x 13
  expression     min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result   memory       time   gc         
  <bch:expr> <bch:t> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>   <list>       <list> <list>     
1 read()       9.48s    9.48s     0.106        0B        0     1     0      9.48s <int [1… <df[,3] [0 … <bch:… <tibble [1…
2 convert()    6.25s    6.25s     0.160        0B        0     1     0      6.25s <int [1… <df[,3] [0 … <bch:… <tibble [1…

@Robinlovelace
Copy link
Author

Hi @rouault apologies but I have another follow-up question: how does one make the conversion respect the .ini file? I see CONFIG_FILE argument in https://gdal.org/drivers/vector/osm.html but not in https://gdal.org/programs/ogr2ogr.html

@rouault
Copy link
Member

rouault commented Jan 6, 2020

Use the -oo switch of ogr2ogr : https://gdal.org/programs/ogr2ogr.html#cmdoption-ogr2ogr-oo
ogr2ogr -oo CONFIG_FILE=/foo/bar/osm.ini ...

@Robinlovelace
Copy link
Author

Robinlovelace commented Jan 6, 2020

Hi @rouault thanks for the pointer, that works, but I'm getting some strange results (way more features in the output when the CONFIG_FILE switch is on) and cannot see how to activate the CONFIG_FILE and INTERLEAVED_READING switches at the same time, as shown in the reproducible example below.

# get data
wget http://download.geofabrik.de/europe/great-britain/england/greater-london-latest.osm.pbf

# too many features error
# ogrinfo -oo OGR_INTERLEAVED_READING=YES *pbf lines > x2

# no data
# ogrinfo -oo INTERLEAVED_READING=YES *pbf lines > x3

# with random layers
# ogrinfo -rl greater-london-latest.osm.pbf lines > x4                               
# ogrinfo -so x4
# head x4 -n 32

# convert format - works but no custom .ini
ogr2ogr -f gpkg -oo INTERLEAVED_READING=YES osm.gpkg greater-london-latest.osm.pbf
ogrinfo -so osm.gpkg lines

# convert format - works with custom osmconf2.ini file - with these contents:
# [lines]
# # common attributes
# osm_id=yes
# osm_version=no
# osm_timestamp=no
# osm_uid=no
# osm_user=no
# osm_changeset=no
# 
# # keys to report as OGR fields
# attributes=name,highway,waterway,aerialway,barrier,man_made,maxspeed,oneway,building,surface,landuse,natural,start_date,wall,service,lanes,layer,tracktype,bridge,foot,bicycle,lit,railway,footway


ogr2ogr -f gpkg -oo CONFIG_FILE=osmconf2.ini  osm2.gpkg greater-london-latest.osm.pbf
ogrinfo -so osm2.gpkg lines # why does it have double the number of features?

ogr2ogr -f gpkg -oo CONFIG_FILE=osmconf2.ini INTERLEAVED_READING=YES osm3.gpkg greater-london-latest.osm.pbf
ogrinfo -so osm2.gpkg lines # fails

@rouault
Copy link
Member

rouault commented Jan 6, 2020

You need to specify -oo in front of each open option: -oo CONFIG_FILE=osmconf2.ini -oo INTERLEAVED_READING=YES

@Robinlovelace
Copy link
Author

Many thanks Even, I'm gradually getting my head around this and am very grateful. Will help hugely with our work on sustainable transport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants