Hadoop friendly architecture / directly load OSM data #120

geoHeil · 2018-08-24T14:10:51Z

How hard do you think would it be to build an add-on that instead of loading from postGIS would load the data from parquet files directly stored inside hadoop?

https://github.com/adrianulbona/osm-parquetizer

This data format is published daily at: http://osm-data.skobbler.net already in converted format.

oldrev · 2018-08-26T09:01:20Z

Hi, the answer is not that hard, you could do it by implements your own RoadReader interface.

Copy and modify the PostGISReader class is a good start.

jongiddy · 2018-08-27T15:43:30Z

Barefoot does some transformation of the PBF files to create an efficient dataset for its use. It would be great to be able to do that conversion in a Hadoop cluster, but I don't think it is trivial.

However, once you have the correctly formatted files in the Hadoop cluster, it should be fairly easy to create a new Parquet-aware RoadReader.

I do the initial processing on a local VM, using the map/osm/import.sh script to import PBF data into PostgreSQL, then https://github.com/jongiddy/barefoot-map-db-file to export from PostgreSQL to a single .bfmap file, which I then upload to HDFS. My Spark jobs use https://github.com/jongiddy/barefoot-hdfs-reader to read the map data from HDFS.

geoHeil · 2018-09-02T08:33:31Z

Thanks That is great news. jongiddy <notifications@github.com> schrieb am Mo. 27. Aug. 2018 um 17:44:

…

Barefoot does some transformation of the PBF files to create an efficient dataset for its use. It would be great to be able to do that conversion in a Hadoop cluster, but I don't think it is trivial. However, once you have the correctly formatted files in the Hadoop cluster, it should be fairly easy to create a new Parquet-aware RoadReader . I do the initial processing on a local VM, using the map/osm/import.sh script to import PBF data into PostgreSQL, then https://github.com/jongiddy/barefoot-map-db-file to export from PostgreSQL to a single .bfmap file, which I then upload to HDFS. My Spark jobs use https://github.com/jongiddy/barefoot-hdfs-reader to read the map data from HDFS. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#120 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABnc9DB_9Jyr6r-Zcr4EMkordjTG0b6Jks5uVBOmgaJpZM4WLbGN> .

geoHeil · 2018-09-12T04:50:12Z

@jongiddy do I understand correctly, that the map is always loaded completely into memory (and especially for the whole world) requires fairly large amount of RAM for the executors?

Also when looking into the hadoop native file format. The driver would need to collect the whole parquet file and then broadcast it?

smattheis · 2018-09-23T08:46:36Z

@jongiddy and @oldrev already pointed out the relevant aspects. (Thanks!) I have only one note to add: The pre-processing step is mostly a transformation of OSM roads into a routable format which means splitting roads into edges of a graph. In OSM, roads are often long and cross intersections, as e.g. at the intersection of https://www.openstreetmap.org/way/33954504 and https://www.openstreetmap.org/way/31662854, such that a road must be split into multiple edges to represent the intersection and to allow turns. This pre-processing is done by the import scripts @jongiddy mentioned. A direct import into HDFS would need to implement that pre-processing step. Further, with the road readers you can define a subregion to be loaded into RAM or to save it in a HDFS file. However, routing and map matching across subregions is something that is not supported in the moment. This means, it won't help if you want to have a large map and just want organize in tiles. It only helps if you need, for some use case, ONLY a subregion of the map data you have imported into a map server initially.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hadoop friendly architecture / directly load OSM data #120

Hadoop friendly architecture / directly load OSM data #120

geoHeil commented Aug 24, 2018

oldrev commented Aug 26, 2018

jongiddy commented Aug 27, 2018

geoHeil commented Sep 2, 2018 via email

geoHeil commented Sep 12, 2018

smattheis commented Sep 23, 2018 •

edited

Loading

Hadoop friendly architecture / directly load OSM data #120

Hadoop friendly architecture / directly load OSM data #120

Comments

geoHeil commented Aug 24, 2018

oldrev commented Aug 26, 2018

jongiddy commented Aug 27, 2018

geoHeil commented Sep 2, 2018 via email

geoHeil commented Sep 12, 2018

smattheis commented Sep 23, 2018 • edited Loading

smattheis commented Sep 23, 2018 •

edited

Loading