Matching UK addresses using Splink

High performance address matching using a pre-trained Splink model.

Assuming you have two duckdb dataframes in this format:

unique_id	address_concat	postcode
1	123 Fake Street, Faketown	FA1 2KE
2	456 Other Road, Otherville	NO1 3WY
...	...	...

Match them with:

from uk_address_matcher.cleaning_pipelines import (
    clean_data_using_precomputed_rel_tok_freq,
)
from uk_address_matcher.splink_model import _performance_predict

df_1_c = clean_data_using_precomputed_rel_tok_freq(df_1, con=con)
df_2_c = clean_data_using_precomputed_rel_tok_freq(df_2, con=con)


linker, predictions = _performance_predict(
    df_addresses_to_match=df_1_c,
    df_addresses_to_search_within=df_2_c,
    con=con,
    match_weight_threshold=-10,
    output_all_cols=True,
    include_full_postcode_block=True,
)

Initial tests suggest you can match ~ 1,000 addresses per second against a list of 30 million addresses on a laptop.

Refer to the example, which has detailed comments, for how to match your data.

See an example of comparing two addresses to get a sense of what it does/how it scores

Run an interactive example in your browser:

Match 5,000 FHRS records to 21,952 companies house records in < 10 seconds.

Investigate and understand how the model works

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
example_data		example_data
scripts		scripts
tools		tools
uk_address_matcher		uk_address_matcher
.gitignore		.gitignore
LICENSE		LICENSE
example.py		example.py
example_against_canonical.py		example_against_canonical.py
example_compare_two.py		example_compare_two.py
example_performance.py		example_performance.py
interactive_comparison.ipynb		interactive_comparison.ipynb
match_example_data.ipynb		match_example_data.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Matching UK addresses using Splink

About

Releases

Packages

Languages

License

RobinL/uk_address_matcher

Folders and files

Latest commit

History

Repository files navigation

Matching UK addresses using Splink

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages