Skip to content

data-with-chris/python-data-entity-resolution

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Rule-based Data Entity Resolution

Table of Contents

Project Overview

This project demonstrates how Python can be used for rule-based entity resolution between datasets to identify matching records. This is a common challenge when working with siloed systems that don't share a pre-existing common unique identifier.

Python Libraries

  • Polars - Data normalisation, transformation, matching and merging.
  • Jupyter Notebooks - Interactive visualisation of data transformation, and entity resolution processes.
  • Pytest - Unit testing and application logic validation.

Setup and Installation

  1. Clone the repo:
git clone https://github.com/data-with-chris/python-data-entity-resolution.git
cd python-data-entity-resolution
  1. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate   # macOS/Linux
venv\Scripts\activate      # Windows
  1. Install dependencies
pip install -r requirements.txt

Usage

Run tests

pytest

Run the Jupyter Notebook (from the root of the repository)

jupyter notebook notebooks/demo.ipynb

Demo

You can view the fully executed notebook (with outputs) on GitHub:

View the demo notebook here

About

Demonstrates a rule-based data entity resolution process using Polars in a Jupyter Notebook

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published