Skip to content


Subversion checkout URL

You can clone with
Download ZIP
Simple XML parser to shove OpenStreetMap changeset metadata dump files into a postgres database
Branch: master

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.


ChangesetMD is a simple XML parser written in python that takes the weekly changeset metadata dump file from and shoves the data into a simple postgres database so it can be queried.

WARNING: This is pretty much my first python project ever beyond "hello world" so... you have been warned.


ChangesetMD works with python 2.7.

Aside from postgresql, ChangesetMD depends on the python libraries psycopg2 and lxml. On Debian-based systems this means installing the python-psycopg2 and python-lxml packages.

If you want to parse the changeset file without first unzipping it, you will also need to install the bz2file library since the built in bz2 library can not handle multi-stream bzip files.

ChangesetMD expects a postgres database to be set up for it. It can likely co-exist within another database if desired. Otherwise, As the postgres user execute:

createdb changesets

It is easiest if your OS user has access to this database. I just created a user and made myself a superuser. Probably not best practices.

createuser <username>


The first time you run it, you will need to include the -c | --create option to create the table:

python -d <database> -c

The create function can be combined with the file option to immediately parse a file.

To parse the file, use the -f | --file option. After the first run to create the tables, you can either use -t | --truncate to clear out the tables and import a new file or you can use the -i | --incremental option which will skip existing changesets in the database and only insert new ones. This is faster.

python -d <database> -t -f /tmp/changeset-latest.osm


python -d <database> -i -f /tmp/changeset-latest.osm

Optional database user/password/host arguments can be used to access a postgres database in other ways.


  • Prints a status message every 10,000 records.
  • Takes 1-2 hours to import the current dump on a decent home computer.
  • Might be faster to process the XML into a flat file and then use the postgres COPY command to do a bulk load but this would make incremental updates a little harder
  • I have commonly queried fields indexed. Depending on what you want to do, you may need more indexes.
  • The incremental mode is much faster if you want to update an existing database. Depending on how old the data in the database is it could be as little as 5ish minutes.

Table Structure

ChangesetMD populates two tables with the following structure:

Primary table of all changesets with the following columns:

  • id: changeset ID
  • created_at/closed_at: create/closed time
  • num_changes: number of objects changed
  • min_lat/max_lat/min_lon/max_lon: description of the changeset bbox in decimal degrees
  • user_name: OSM username
  • user_id: numeric OSM user ID
  • tags: an hstore column holding all the tags of the changeset

Note that all fields except for id and created_at can be null.

All comments made on changesets via the new commenting system

  • comment_changeset_id: Foreign key to the changeset ID
  • comment_user_id: numeric OSM user ID
  • comment_user_name: OSM username
  • comment_date: timestamp of when the comment was created

If you are unfamiliar with hstore and how to query it, see the postgres documentation

Example queries:

Count how many changesets have a comment tag:

FROM osm_changeset
WHERE tags ? 'comment';

Find all changesets that were created by JOSM:

FROM osm_changeset
WHERE tags -> 'created_by' LIKE 'JOSM%';


Copyright (C) 2012 Toby Murray

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

See the GNU Affero General Public License for more details:

Something went wrong with that request. Please try again.