OpenStreetMap History Splitter
This splitter has been developed to split full-experimental planet dumps but it's also possible to split regular planet dumps with it. It's based on the readers and writers of the Jochen Topfs great osmium framework.
This is the tool used to create the hosted extracts.
The splitter currently supports splitting by bounding-boxes, .poly-files known from osmosis and .osm polygon files (.osm files containing only closed ways). It implementes two different cutting algorithms (hard- and softcut), which of softcut is the default.
Dumps created using that algorithm have the following characteristics:
- ways are cropped at bbox boundaries
- relations contain only members that exist in the extract
- ways and relations are reference-complete
- relations referring to relations that come later in the file are missing this references
- ways that have only one node inside the bbox are missing from the output
- only versions of an object that are inside the bboxes are in the extract, some versions of an object may be missing (not history-complete)
- ways stay complete, all used nodes are included (reference-complete)
- relations contains all members, even such that does not exist in the extract (not reference-complete)
- if one version of an object is inside the bbox, all versions are included in the extract (history-complete)
- dual pass processing required
In order to compile the splitter, you'll first need the osmium framework and most of its prequisites:
- zlib (for PBF support)
Debian/Ubuntu: zlib1g zlib1g-dev
- Expat (for parsing XML files)
Debian/Ubuntu: libexpat1 libexpat1-dev
- libxml (for writing XML files)
- GEOS (for polygon checks)
Debian/Ubuntu: libgeos-3.2.0 (older versions might work) libgeos-dev
- Google sparsehash http://code.google.com/p/google-sparsehash/ Debian/Ubuntu: libsparsehash-dev
- Google protocol buffers (for PBF support)
http://code.google.com/p/protobuf/ (at least Version 2.3.0 needed)
Debian/Ubuntu: libprotobuf6 libprotobuf-dev protobuf-compiler
Also see http://wiki.openstreetmap.org/wiki/PBF_Format
- OSMPBF (for PBF support)
You need to build this first.
Osmium does not need to be built, it just needs to be referenced in the Makefile. You'll also want the pbf support as .pbf-files can be written between 7 and 20 times faster then .xml.bz2-files. For this you'll need a version of OSM-binary that supports storing history information.
When you have all prequisites in place, just run make to build the splitter.
After building the splitter you'll have a single binary: osm-history-splitter. The binary takes two parameters and a few options. The splitter is called like that:
./osm-history-splitter input.osm.pbf output.config
the splitter reads through input.osm.pbf and splitts it into the extracts listet in output.config. Optionally the following switches are supported:
- --hardcut - enable hardcut mode (default)
- --softcut - enable softcut mode
- --debug - enable debug output
The config-file-format is simple and line-based. Empty lines and lines beginning with # are ignored. A config-file might looks like this:
woerrstadt.osh.pbf BBOX 8.1010,49.8303,8.1359,49.8567 gau-odernheim.osh OSM clipbounds/aaa_test/go.osm germany.osh POLY clipbounds/europe/germany.poly
each line consists of three items, separated by spaces:
- the destination path and filename. The file-extension used specifies the generated file format (.osm, .osh, .osm.bz2, .osh.bz2, .osm.pbf, .osh.pbf)
- the type of extract (BBOX or POLY)
- the extract specification
- for BBOX: boundaries of the bbox, eg. -180,-90,180,90 for the whole world
- for OSM: path to an .osm file from which all closed ways are taken as outlines of a MultiPolygon. Relations are not taken into account, so holes are not possible.
- for POLY: path to the .poly file
If you are planning to do a huge number of extracts (something like the Geofabrik does), the split-all-clipbounds.py may be your friend. It scans through the clipbounds directory looking for .poly files (.osm files possible), automatically generates config-files and runs the splitter. It does obey the nesting-rules (ie europe/germany.osm.pbf is generated from europe.osm.pbf) and also ensures the files are created in the correct order.
If you have any questions just ask at email@example.com or via the Github messaging system.