-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hadoop friendly architecture / directly load OSM data #120
Comments
Hi, the answer is not that hard, you could do it by implements your own RoadReader interface. Copy and modify the PostGISReader class is a good start. |
Barefoot does some transformation of the PBF files to create an efficient dataset for its use. It would be great to be able to do that conversion in a Hadoop cluster, but I don't think it is trivial. However, once you have the correctly formatted files in the Hadoop cluster, it should be fairly easy to create a new Parquet-aware I do the initial processing on a local VM, using the |
Thanks
That is great news.
jongiddy <notifications@github.com> schrieb am Mo. 27. Aug. 2018 um 17:44:
… Barefoot does some transformation of the PBF files to create an efficient
dataset for its use. It would be great to be able to do that conversion in
a Hadoop cluster, but I don't think it is trivial.
However, once you have the correctly formatted files in the Hadoop
cluster, it should be fairly easy to create a new Parquet-aware RoadReader
.
I do the initial processing on a local VM, using the map/osm/import.sh
script to import PBF data into PostgreSQL, then
https://github.com/jongiddy/barefoot-map-db-file to export from
PostgreSQL to a single .bfmap file, which I then upload to HDFS. My Spark
jobs use https://github.com/jongiddy/barefoot-hdfs-reader to read the map
data from HDFS.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#120 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABnc9DB_9Jyr6r-Zcr4EMkordjTG0b6Jks5uVBOmgaJpZM4WLbGN>
.
|
@jongiddy do I understand correctly, that the map is always loaded completely into memory (and especially for the whole world) requires fairly large amount of RAM for the executors? Also when looking into the hadoop native file format. The driver would need to collect the whole parquet file and then broadcast it? |
@jongiddy and @oldrev already pointed out the relevant aspects. (Thanks!) I have only one note to add: The pre-processing step is mostly a transformation of OSM roads into a routable format which means splitting roads into edges of a graph. In OSM, roads are often long and cross intersections, as e.g. at the intersection of https://www.openstreetmap.org/way/33954504 and https://www.openstreetmap.org/way/31662854, such that a road must be split into multiple edges to represent the intersection and to allow turns. This pre-processing is done by the import scripts @jongiddy mentioned. A direct import into HDFS would need to implement that pre-processing step. Further, with the road readers you can define a subregion to be loaded into RAM or to save it in a HDFS file. However, routing and map matching across subregions is something that is not supported in the moment. This means, it won't help if you want to have a large map and just want organize in tiles. It only helps if you need, for some use case, ONLY a subregion of the map data you have imported into a map server initially. |
How hard do you think would it be to build an add-on that instead of loading from postGIS would load the data from parquet files directly stored inside hadoop?
https://github.com/adrianulbona/osm-parquetizer
This data format is published daily at: http://osm-data.skobbler.net already in converted format.
The text was updated successfully, but these errors were encountered: