This project demonstrates how Python can be used for rule-based entity resolution between datasets to identify matching records. This is a common challenge when working with siloed systems that don't share a pre-existing common unique identifier.
- Polars - Data normalisation, transformation, matching and merging.
- Jupyter Notebooks - Interactive visualisation of data transformation, and entity resolution processes.
- Pytest - Unit testing and application logic validation.
- Clone the repo:
git clone https://github.com/data-with-chris/python-data-entity-resolution.git
cd python-data-entity-resolution
- Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
- Install dependencies
pip install -r requirements.txt
Run tests
pytest
Run the Jupyter Notebook (from the root of the repository)
jupyter notebook notebooks/demo.ipynb
You can view the fully executed notebook (with outputs) on GitHub: