Skip to content

Cobbling together UK postcode area data

Ross Hendry edited this page Jan 17, 2018 · 4 revisions

1. Source the data

Shapefiles are the standard data format for describing geospatial information: we'll need a shapefile containing all the UK postcode area polygons.

The easiest way is obviously to buy a set of shapefiles from the O/S, but if you're creating a demonstration, or proof-of-concept, then the expense may be hard to justify. The following steps to cobble together an appropriate dataset are a bit tedious, and are to be avoided if at all possible.

A quick Google brings up http://random.dev.openstreetmap.org/postcode_shapes/, which looks like it contains what we're after: postcode-XX.shp, postcode-XX.dbf & postcode-XX.shx (a shapefile is actually a set of files that together describe the geometries).

2. View the data

Before we go any further, let's have a look at what we've got: download, install & run QGIS; then add a vector layer containing the downloaded the .shp file (hit or click Layer > "Add Vector Layer"), select "EPSG:3395" as the Coordinate Reference System (CRS), and you should see something like:

3. Clean the data

There are a couple of problems with the shapefile as it is:

  • There's no CRS specified in the file (hence the prompt by QGIS to supply one).
  • The postcodes are currently defined as lines, rather than polygons (which will be a problem when we need to add colour).
  • The shapes extend beyond the coastline of the UK.

This last point may be due to way the data has been created from point-data using Voronoi polygons (these are called "Thiessen" polygons when referring to Geospatial data).

To change the CRS, we're going to need some additional tools - follow the instructions in the "Install Tools" section of the "Let's Make a Map" tutorial, or use the bundled package at http://www.maptools.org/, which comes with a handy command shell. Run the following command to convert the original shapefile to use the most commonly used CRS ("WGS 84").

ogr2ogr -s_srs EPSG:3395 -t_srs EPSG:4326 uk-postcode-area-lines.shp postcode-XX.shp

We can now add the resultant shapefile ("uk-postcode-area-lines.shp") into QGIS (it should no longer request a CRS).

Point 2) is easily fixed in QGIS itself by selecting Vector > "Geometry Tools" > "Lines to polygons" & providing a name ("uk-postcode-area-polygons.shp") for the resultant shapefile.

To fix 3), the shape problem, we can create an intersection between our new postcode area polygons & a polygon of the UK - thus removing any area outside of the UK. Following the "Finding Data" section of the "Let's Make a Map" tutorial tells us how to create a shapefile for the UK: download & extract ne_10m_admin_0_map_subunits.zip, then run the following command against it to just get the UK and Ireland geometries into a new shapefile (gbr_irl.shp):

ogr2ogr -where "adm0_a3 IN ('GBR', 'IRL')" gbr_irl.shp ne_10m_admin_0_map_subunits.shp -select ""

Note, the '-select ""' parameter is so as not to copy any data-fields to the new shapefile (both the files currently have a "NAME" field that would conflict when we run the intersection).

Add this as a new layer, and we should now have something like:

Before we process the intersection, we need to make the UK one geometry (otherwise postcode areas on the edge of boundaries will get confused). In QGIS, select Vector > "Geoprocessing Tools" > Dissolve: selecting all data-fields & specifying the output shapefile as "gbr_irl_unionist.shp".

Now we can process the intersection - in QGIS, select Vector > "Geoprocessing Tools" > "Intersect" & select our 2 shapefiles as the input layers (gbr_irl_unionist.shp & uk-postcode-area-polygons.shp). Adding the new shapefile ("uk-postcode-area.shp") to QGIS gives us:

Looks like we've intersected a bit of Ireland into a UK postcode - we can use QGIS in edit mode to select ,then delete the feature.

Let's make sure we've managed to retain the postcode fields - in QGIS, right-click on the layer in the TOC, select Properties and then, under the Labels tab, check the "Display labels" box & select the "NAME" field as the label:

4. Simplify Geometries

We can simplify the geometries greatly to reduce the filesize - drag the .shp file over to http://www.mapshaper.org/ & simplify to 10%; export as shapefile, and extract the downloaded ZIP containing the simplified .shp & .shx files over the top of the existing ones (leaving the other files in the shapefile set in place).

Now we can run the following command to convert our simplified shapefile into the topojson format that we'll use in our D3 map:

topojson --id-property NAME -o uk-postcode-area.json uk-postcode-area.shp

Note that we set the NAME field to be the id for each shape.

Without the simplifying we'd have had a 223KB .json file, rather than the 48KB - making our map a bit more browser-friendly.