🔒 READI - Risk Evaluation and De-Identification

Privacy-preserving AI made simple - A comprehensive toolkit for data privacy risk assessment and de-identification in Python-based ML pipelines.

READI augments the functionalities provided by IBM Data Privacy Toolkit, offering state-of-the-art capabilities for detecting Personal and Sensitive Information in unstructured documents. Built for modern compliance frameworks and AI model training workflows.

✨ Features

🎯 Advanced PII Detection - Identify personal and sensitive information across multiple data types
🔄 Seamless Integration - Low-effort integration with existing ML pipelines
📊 Structured & Unstructured Data - Support for both data formats
🌐 REST API - Easy-to-use HTTP interface for remote processing
🧪 Extensible Framework - Modular design for custom privacy requirements
📝 Comprehensive Examples - Jupyter notebooks with real-world use cases

🚀 Quick Start

Prerequisites

Python 3.11 or higher
Git with git-lfs support (for large files >50 MB)
uv (recommended) - A fast Python package installer

Installation

Recommended: Using uv (10-100x faster)

# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create and activate virtual environment
uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install READI
uv pip install git+https://github.com/IBM/READI.git

Standard Installation with pip:

pip install git+https://github.com/IBM/READI.git

Clone Repository:

git clone https://github.com/IBM/READI.git
cd READI

# With uv (recommended)
uv pip install -e .

# Or with pip
pip install -e .

💻 Development Setup

For contributors and developers:

Recommended: Using uv

# Install in editable mode with development dependencies
uv pip install -e .
uv pip install -r requirements-dev.txt

# Set up pre-commit hooks (recommended)
pre-commit install

Alternative: Using pip

# Install in editable mode with development dependencies
pip install -e .
pip install -r requirements-dev.txt

# Set up pre-commit hooks (recommended)
pre-commit install

This installs the project in editable mode along with development tools (pytest, ruff, bandit, etc.).

💡 Tip: Using uv provides significantly faster dependency resolution and installation compared to traditional pip.

🌐 REST API Usage

READI provides a simple REST API for remote processing.

Setup

# Install with REST API support
pip install -e '.[rest]'

# Start the server
uvicorn risk_assessment.entry_points.rest.api:app

Example Request

curl -H 'Content-Type: application/json' \
     http://localhost:8000/detect_phi \
     --data-raw '{"text":"My text with email: john@gmail.com"}'

The API will be available at http://localhost:8000 with interactive documentation at /docs.

📚 Examples & Tutorials

Explore our comprehensive Jupyter notebooks in the notebooks/ directory:

Notebook	Description
Unstructured Data Classification	General overview of READI API for free-text processing
Structured Data Classification	Working with tabular and structured datasets

📖 Documentation

For detailed documentation, API references, and advanced usage patterns, please visit our documentation portal (coming soon).

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details on:

Code style and standards
Testing requirements
Pull request process
Development workflow

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

📌 How to Cite

If you use READI in academic work, please cite the most relevant publication from the references below. A general citation entry is:

@software{readi_ibm,
  title        = {READI: Risk Evaluation and De-Identification},
  author       = {Stefano Braghin and Liubov Nedoshivina and Anisa Halimi and Naoise Holohan and Kieran Fraser},
  year         = {2026},
  url          = {https://github.com/IBM/READI}
}

When your usage specifically relates to unstructured document de-identification, prefer citing:

@article{nedoshivina2024pragmatic,
  title   = {Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering},
  author  = {Liubov Nedoshivina and Anisa Halimi and Joa Bettencourt-Silva and Stefano Braghin},
  journal = {AMIA Summits on Translational Science Proceedings},
  volume  = {2024},
  pages   = {85},
  year    = {2024}
}

📚 Academic References

READI is built on years of privacy research. Key publications:

Nedoshivina, L., Halimi, A., Bettencourt-Silva, J., & Braghin, S. (2024). Pragmatic De-Identification of Cross-Domain Unstructured Documents: A Utility-Preserving Approach with Relation Extraction Filtering. AMIA Summits on Translational Science Proceedings, 2024, 85.
Pachilakis, M., Antonatos, S., Levacher, K., & Braghin, S. (2020). PrivLeAD: Privacy Leakage Detection on the Web. Intelligent Systems and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, vol 1250. Springer, Cham. DOI: 10.1007/978-3-030-55180-3_32
Braghin, S., Bettencourt-Silva, J. H., Levacher, K., & Antonatos, S. (2019). An Extensible De-Identification Framework for Privacy Protection of Unstructured Health Information: Creating Sustainable Privacy Infrastructures. MEDINFO 2019: Health and Wellbeing e-Networks for All (pp. 1140-1144). IOS Press. DOI: 10.3233/SHTI190404
Antonatos, S., Braghin, S., Holohan, N., Gkoufas, Y., & Mac Aonghusa, P. (2018). PRIMA: An End-to-End Framework for Privacy at Scale. 2018 IEEE 34th International Conference on Data Engineering (ICDE), pp. 1531-1542. DOI: 10.1109/ICDE.2018.00171
Gkoulalas-Divanis, A., & Braghin, S. (2016). IPV: A system for identifying privacy vulnerabilities in datasets. IBM Journal of Research and Development, vol. 60, no. 4, pp. 14:1-14:10. DOI: 10.1147/JRD.2016.2576818
Gkoulalas-Divanis, A., Braghin, S., & Antonatos, S. (2016). FPVI: A scalable method for discovering privacy vulnerabilities in microdata. 2016 IEEE International Smart Cities Conference (ISC2), pp. 1-8. DOI: 10.1109/ISC2.2016.7580849
Gkoulalas-Divanis, A., & Braghin, S. (2015). Efficient algorithms for identifying privacy vulnerabilities. 2015 IEEE First International Smart Cities Conference (ISC2), pp. 1-8. DOI: 10.1109/ISC2.2015.7366170

🙏 Acknowledgment

This project is partly supported by the Innovative Health Initiative Joint Undertaking (IHI JU) under grant agreement No. 101172997 – SEARCH.

💬 Support & Community

🐛 Issues: GitHub Issues
💡 Discussions: GitHub Discussions
📧 Contact: For enterprise support, please contact the IBM Research team

Built with ❤️ by IBM Research

Documentation • Examples • Contributing • License

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
notebooks		notebooks
src/risk_assessment		src/risk_assessment
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.secrets.baseline		.secrets.baseline
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
bandit.yaml		bandit.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔒 READI - Risk Evaluation and De-Identification

✨ Features

🚀 Quick Start

Prerequisites

Installation

💻 Development Setup

🌐 REST API Usage

Setup

Example Request

📚 Examples & Tutorials

📖 Documentation

🤝 Contributing

📄 License

📌 How to Cite

📚 Academic References

🙏 Acknowledgment

💬 Support & Community

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🔒 READI - Risk Evaluation and De-Identification

✨ Features

🚀 Quick Start

Prerequisites

Installation

💻 Development Setup

🌐 REST API Usage

Setup

Example Request

📚 Examples & Tutorials

📖 Documentation

🤝 Contributing

📄 License

📌 How to Cite

📚 Academic References

🙏 Acknowledgment

💬 Support & Community

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages