Skip to content
This repository has been archived by the owner on Nov 27, 2019. It is now read-only.

scraperwiki/address-matching-data

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Address matching data

Test cases and other data for training and testing address matching algorithms.

Test case format

Test cases are held in tab-separated format files with the following columns:

  • test — an identifier for the test case which should be unique across all tests
  • name — the addressee or name of the business (if separable)
  • text — address text to be matched, newlines should be encoded as '\n' (only include name or postcode if can't be stored in separate field)
  • postcode — an optional, separate postcode (if separable)
  • uprns — one or more UPRN values in decimal which could match the address, separated by semicolon ';'
  • notes — an explanation of the test

A test case may contain additional fields for information.

Bulk datasets

The bulk directory contains addresses found in bulk in open data, to be matched.

Few bulk datasets currently contain resolved UPRNs, but can form the basis of test cases as we build registers.

Licence

The software in this project is open source, covered by LICENSE file.

The data held in this repository is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Data downloaded by the build process may be covered by different copyright and terms.

About

Test data for matching addresses

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 54.2%
  • Makefile 28.9%
  • XSLT 12.1%
  • Shell 4.8%