Privacy-preserving AI made simple - A comprehensive toolkit for data privacy risk assessment and de-identification in Python-based ML pipelines.
READI augments the functionalities provided by IBM Data Privacy Toolkit, offering state-of-the-art capabilities for detecting Personal and Sensitive Information in unstructured documents. Built for modern compliance frameworks and AI model training workflows.
- π― Advanced PII Detection - Identify personal and sensitive information across multiple data types
- π Seamless Integration - Low-effort integration with existing ML pipelines
- π Structured & Unstructured Data - Support for both data formats
- π REST API - Easy-to-use HTTP interface for remote processing
- π§ͺ Extensible Framework - Modular design for custom privacy requirements
- π Comprehensive Examples - Jupyter notebooks with real-world use cases
- Python 3.11 or higher
- Git with git-lfs support (for large files >50 MB)
- uv (recommended) - A fast Python package installer
Recommended: Using uv (10-100x faster)
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create and activate virtual environment
uv venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install READI
uv pip install git+https://github.com/IBM/READI.gitStandard Installation with pip:
pip install git+https://github.com/IBM/READI.gitClone Repository:
git clone https://github.com/IBM/READI.git
cd READI
# With uv (recommended)
uv pip install -e .
# Or with pip
pip install -e .For contributors and developers:
Recommended: Using uv
# Install in editable mode with development dependencies
uv pip install -e .
uv pip install -r requirements-dev.txt
# Set up pre-commit hooks (recommended)
pre-commit installAlternative: Using pip
# Install in editable mode with development dependencies
pip install -e .
pip install -r requirements-dev.txt
# Set up pre-commit hooks (recommended)
pre-commit installThis installs the project in editable mode along with development tools (pytest, ruff, bandit, etc.).
π‘ Tip: Using
uvprovides significantly faster dependency resolution and installation compared to traditionalpip.
READI provides a simple REST API for remote processing.
# Install with REST API support
pip install -e '.[rest]'
# Start the server
uvicorn risk_assessment.entry_points.rest.api:appcurl -H 'Content-Type: application/json' \
http://localhost:8000/detect_phi \
--data-raw '{"text":"My text with email: john@gmail.com"}'The API will be available at http://localhost:8000 with interactive documentation at /docs.
Explore our comprehensive Jupyter notebooks in the notebooks/ directory:
| Notebook | Description |
|---|---|
| Unstructured Data Classification | General overview of READI API for free-text processing |
| Structured Data Classification | Working with tabular and structured datasets |
For detailed documentation, API references, and advanced usage patterns, please visit our documentation portal (coming soon).
We welcome contributions! Please see our Contributing Guidelines for details on:
- Code style and standards
- Testing requirements
- Pull request process
- Development workflow
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you use READI in academic work, please cite the most relevant publication from the references below. A general citation entry is:
@software{readi_ibm,
title = {READI: Risk Evaluation and De-Identification},
author = {Stefano Braghin and Liubov Nedoshivina and Anisa Halimi and Naoise Holohan and Kieran Fraser},
year = {2026},
url = {https://github.com/IBM/READI}
}When your usage specifically relates to unstructured document de-identification, prefer citing:
@article{nedoshivina2024pragmatic,
title = {Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering},
author = {Liubov Nedoshivina and Anisa Halimi and Joa Bettencourt-Silva and Stefano Braghin},
journal = {AMIA Summits on Translational Science Proceedings},
volume = {2024},
pages = {85},
year = {2024}
}READI is built on years of privacy research. Key publications:
-
Nedoshivina, L., Halimi, A., Bettencourt-Silva, J., & Braghin, S. (2024). Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering. AMIA Summits on Translational Science Proceedings, 2024, 85.
-
Pachilakis, M., Antonatos, S., Levacher, K., & Braghin, S. (2020). PrivLeAD: Privacy Leakage Detection on the Web. Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1250. Springer, Cham. DOI: 10.1007/978-3-030-55180-3_32
-
Braghin, S., Bettencourt-Silva, J. H., Levacher, K., & Antonatos, S. (2019). An Extensible De-Identification Framework for Privacy Protection of Unstructured Health Information: Creating Sustainable Privacy Infrastructures. MEDINFO 2019: Health and Wellbeing e-Networks for All (pp. 1140-1144). IOS Press. DOI: 10.3233/SHTI190404
-
Antonatos, S., Braghin, S., Holohan, N., Gkoufas, Y., & Mac Aonghusa, P. (2018). PRIMA: An End-to-End Framework for Privacy at Scale. 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1531-1542. DOI: 10.1109/ICDE.2018.00171
-
Gkoulalas-Divanis, A., & Braghin, S. (2016). IPV: A system for identifying privacy vulnerabilities in datasets. IBM Journal of Research and Development, vol. 60, no. 4, pp. 14:1-14:10. DOI: 10.1147/JRD.2016.2576818
-
Gkoulalas-Divanis, A., Braghin, S., & Antonatos, S. (2016). FPVI: A scalable method for discovering privacy vulnerabilities in microdata. 2016 IEEE International Smart Cities Conference (ISC2), pp. 1-8. DOI: 10.1109/ISC2.2016.7580849
-
Gkoulalas-Divanis, A., & Braghin, S. (2015). Efficient algorithms for identifying privacy vulnerabilities. 2015 IEEE First International Smart Cities Conference (ISC2), pp. 1-8. DOI: 10.1109/ISC2.2015.7366170
This project is partly supported by the Innovative Health Initiative Joint Undertaking (IHI JU) under grant agreement No. 101172997 β SEARCH.
- π Issues: GitHub Issues
- π‘ Discussions: GitHub Discussions
- π§ Contact: For enterprise support, please contact the IBM Research team
Built with β€οΈ by IBM Research
Documentation β’ Examples β’ Contributing β’ License