Skip to content

Managing Data Sources

ninowalker edited this page Oct 1, 2010 · 8 revisions

Managing Datasources with DataSourcesConfig

What is DataSourcesConfig?

DataSourcesConfig is an XML tag similar to the Stylesheet tag that allows you to externalize elements in an easy to manage format.

Why?

Datasources pose a number of challenges.

  1. Production and development environments are often differ enough to require separate database connection parameters, paths, etc.
  2. Local development is often easier with shapefiles - and you don't want 10 duplicates of a 400GB file floating around.
  3. A common technique for simplifying Datasources and managing the environment changes is to use XML entities. This works, but isn't easy.
  4. Sharing stylesheets is easy, but sharing datasource definitions is very cumbersome.
  5. At a certain point brains start exploding as an MML grows to large.

Sold already? You can quickly convert your existing mml:

$ cascadenik-extract-dscfg.py existing.mml new.mml datasources.cfg

In a nutshell

  1. Datasources, their parameters, and SRS are represented in an INI type file and given a name
  2. MML files declare the data source config files that define the datasources they use
  3. Layers associate themselves with a Datasource by declaring a source_name= attribute
  4. Cascadenik handles the rest
<Map>
  <DataSourcesConfig>
# either inline or as a separate file using the src= attribute, like a Stylesheet tag

# name a data source, and define its parameters
[natural_earth_land_110m]
type = shape
file = http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m/physical/110m-land.zip
layer_srs = epsg:4326

  </DataSourcesConfig>

  <Stylesheet ... />
  <!-- Reference the datasource by name in any number of layers -->
  <Layer class="land" source_name="natural_earth_land_110m" />
</Map>

Syntax

Below is a quick example:

# this is a comment
[DEFAULT] # this declares variables for the document
postgis_dbname = gis

[this_is_a_datasource_declaration] # and the <Datasource> parameters follow
dbname = %(postgis_dbname)s # this dereferences a variable
estimate_extent = false
port = 5432
table = (SELECT *, y(astext(way)) AS latitude
                         FROM planet_osm_point
                         WHERE (railway IN ('station', 'subway_entrance')
                                OR aeroway IN ('aerodrome', 'airport'))
                           AND name IS NOT NULL
                         ORDER BY z_order ASC, latitude DESC) AS rail_points
etc=etc...

For more detail on sytnax, see python's configparser.

Features

Variable Substitutions

Declare variables in a [DEFAULT] section, then use them in values as %(variable_name)s

[DEFAULT]
natural_earth_110m_base_url = http://www.naturalearthdata.com/http//www.naturalearthdata.com/download/110m

[natural_earth_land_110m]
file = %(natural_earth_110m_base_url)s/physical/110m-land.zip

[natural_earth_admin0_110m]
file = %(natural_earth_110m_base_url)s/cultural/110m-admin-0-countries.zip

Spatial Reference System Handling

Mapnik XML declares SRS values on the Layer element, yet they are by definition a property of the Datasource. Cascadenik will set a Layer's srs to a Datasource's layer_srs, if it is provided.

The compiler also excepts shorthand notation for EPSG values accepted by Proj.4.

[source1] # long hand
layer_srs = +proj=merc +a=637... # set to a proj.4 string

[source2] # short hand
layer_srs = epsg:4326  # specify 'epsg:...' and cascadenik will create a proj.4 string for it

Datasource and Variable Overrides

Sometimes you want to use a different data source, or at least a differently configured one. There are two mechanisms for this:

  1. Compile-time overrides - the typical case for development
  2. Permanently redeclare a datasource after its original definition explicitly in the MML

Compile-time Overrides - the --datasources-config option

Most config files will declare variables akin to postgis_host****postgis_dbname and shapedir in their [DEFAULT] section. These can be overridden at compile time by providing the --datasources-config= command-line option, and point it to a file that redefines those values in its [DEFAULT] section.

Further, you can redefine entire datasources - you want to use a shapefile instead of a database. To do so, define a datasource with the same name in that same file you override variables in.

$ cat my.cfg
[DEFAULT]
shapedir = /opt/geodata

[processed_p]
type = shape
file = /data/processed_p.shp
layer_srs = epsg:900913

$ cascadenik-compile.py in.mml outdir/out.xml --datasources-config=my.cfg

Permanent Overrides

DataSourcesConfig elements are processed sequentially to build up a full list of datasources. To redefine a datasource you must redeclare a complete replacement after the initial declaration, e.g.

<!-- This file defines [land_polygon] -->
<DataSourcesConfig src="master.cfg" />
<DataSourcesConfig>
[land_polygon] # this second definition will be used
...
</DataSourcesConfig>
<Layer source_name="land_polygon" />

Datasource Templates

Mapnik supports the datasource templates a mapnik XML file: http://trac.mapnik.org/changeset/574. You declare templates in a similar way, by providing a **template = ** value.

[DEFAULTS]  # declare these so they can be overridden easily
postgis_dbname = osm_belgium
postgis_user = gis
postgis_host = localhost
postgis_port = 5432
postgis_pass = 

[postgis_conn_0]
type = postgis
user = %(postgis_user)s
dbname = %(postgis_dbname)s
estimate_extent = false
extent = -20037508,-19929239,20037508,19929239
host = %(postgis_host)s
layer_srs = epsg:900913
password = %(postgis_pass)s
port = %(postgis_port)s

[water_area]
template = postgis_conn_0
table = (SELECT *
                         FROM planet_osm_polygon
                         WHERE landuse IN ('reservoir', 'water')
                            OR "natural" IN ('lake', 'water', 'land')
                            OR waterway IN ('canal', 'riverbank', 'river')
                         ORDER BY z_order ASC) AS water

Getting started

A script is provided to take existing MML files and output a config file containing all the datasources defined and output two new files.

$ cascadenik-extract-dscfg.py existing.mml new.mml datasources.cfg