Skip to content
This repository has been archived by the owner on Nov 27, 2020. It is now read-only.

digital-land-attic/brownfield-land-pipeline

Repository files navigation

Digital Land brownfield land collection

Run brownfield land pipeline License Code style: black

Collect data published by each Local Planning Authority, validate the publications, and build a national dataset.

You can explore data with a geospatial position on our map.

Collection

The source list of registers collected is kept and maintained in dataset/brownfield-land.csv.

The collection directory contains resources collected from sources:

Processing pipeline

The collected resources are then processed in a pipeline:

  • var/converted -- the resource converted into UTF-8 encoded CSV
  • var/normalised -- removed padding, drop obviously spurious rows
  • var/mapped -- column names mapped to ones in the schema
  • var/harmonised -- dates, geospatial, and other values translated into a consistent format
  • var/transformed -- transformed into the digital-land dataset model

Dataset

The resources are then collated into a national set of entries, ordered by the date the resource was published, and the entry-date:

The entries are then reduced to a national dataset of site records, using the organisation and site reference to uniquely identify a site, and the order of the entries to help remove duplicate and older entries:

which has the following fields, to be consistent with other datasets published by digital land:

  • entry-date
  • organisation -- the curie for the organisation
  • site -- a unique identifier for the site
  • site-address
  • site-plan-url
  • deliverable
  • ownership
  • planning-permission-status
  • planning-permission-type
  • hazardous-substances
  • latitude
  • longitude
  • hectares
  • minimum-net-dwellings
  • maximum-net-dwellings
  • start-date
  • end-date
  • resource -- the source resource for the entry

Indexes

A number of index files are generated for the collection:

These indexes are used by the dataset and other code to build the dataset, resource, and other pages.

Manual fixes

Resources which cannot be automatically processed are fixed manually using the following configuration and data:

  • fixed -- manually fixed resources introduced into the pipeline instead of the collected resource
  • patches/organisation.csv -- a map of OrganisationURI to organisation CURIE values

Validation

Each collected resource is tested for conformance to the schema which is a frictionless data schema with extensions to support the pipeline. The results of validation are stored in the validation directory, and included in the indexes.

Updating the collection

We recommend working in virtual environment before installing the python dependencies:

$ make init
$ make

Not all of the files can be downloaded automatically. These can be added to the collection using the addone script;

$ bin/addone.py ~/Downloads/download.csv https://example.com/inaccessible-site

Licence

The software in this project is open source and covered by LICENSE file.

Individual datasets copied into this repository may have specific copyright and licensing, otherwise all content and data in this repository is © Crown copyright and available under the terms of the Open Government 3.0 licence.

About

A collection of brownfield site registers published by local planning authorities https://digital-land.github.io/dataset/brownfield-land

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published