Provided and maintained by 🦄 UnicoLab
Transform your raw data into ML-ready features with just a few lines of code!
KDP provides a state-of-the-art preprocessing system built on TensorFlow Keras. It handles everything from feature normalization to advanced embedding techniques, making your ML pipelines faster, more robust, and easier to maintain. Built with ❤️ by 🦄 UnicoLab, it provides a clean, efficient, and extensible foundation for building sophisticated machine learning models for enterprise AI applications.
- 🚀 Efficient Single-Pass Processing: Process all features in one go, dramatically faster than alternatives
- 🧠 Distribution-Aware Encoding: Automatically detects and optimally handles different data distributions
- 👁️ Tabular Attention: Captures complex feature interactions for better model performance
- 🔍 Feature Selection: Automatically identifies and focuses on the most important features
- 🔄 Feature-wise Mixture of Experts: Specialized processing for different feature types
- 📦 Production-Ready: Deploy your preprocessing along with your model as a single unit
# Using pip
pip install kdp
# Using Poetry
poetry add kdpfrom kdp import PreprocessingModel, FeatureType
# Define your features
features_specs = {
"age": FeatureType.FLOAT_NORMALIZED,
"income": FeatureType.FLOAT_RESCALED,
"occupation": FeatureType.STRING_CATEGORICAL,
"description": FeatureType.TEXT
}
# Create and build the preprocessor
preprocessor = PreprocessingModel(
path_data="data/my_data.csv",
features_specs=features_specs,
# Enable advanced features
use_distribution_aware=True,
tabular_attention=True
)
result = preprocessor.build_preprocessor()
model = result["model"]
# Use the preprocessor with your data
processed_features = model(input_data)We've built an extensive documentation system to help you get the most from KDP:
- 🚀 Quick Start Guide - Get up and running in minutes
- 📊 Feature Processing - Learn about all supported feature types
- 🧙♂️ Auto-Configuration - Let KDP configure itself for your data
- 📈 Distribution-Aware Encoding - Smart handling of different distributions
- 👁️ Tabular Attention - Capture complex feature interactions
- 🔢 Advanced Numerical Embeddings - Rich representations for numbers
- 🤖 Transformer Blocks - Apply transformer architecture to tabular data
- 🎯 Feature Selection - Focus on what matters in your data
- 🧠 Feature-wise Mixture of Experts - Specialized processing per feature
- 🔗 Integration Guide - Use KDP with existing ML pipelines
- 🚀 Tabular Optimization - Supercharge your preprocessing
- 📈 Performance Tips - Handling large datasets efficiently
- 💡 Motivation - Why we built KDP
- 🤝 Contributing - Help improve KDP
Your preprocessing pipeline is built as a Keras model that can be used independently or as the first layer of any model:
KDP outperforms alternative preprocessing approaches, especially as data size increases:
We welcome contributions! Please check out our Contributing Guide for guidelines on how to proceed.
Have questions or want to connect with other KDP users? Join us on Discord:
KDP includes tools to help developers:
- Documentation Generation: Automatically generate API docs from docstrings
- Model Diagram Generation: Visualize model architectures with
make generate_doc_contentor run:This creates diagram images inpython scripts/generate_model_diagrams.py
docs/features/imgs/models/for all feature types and configurations.
This project is licensed under the MIT License - see the LICENSE file for details.
- Built with TensorFlow and Keras
- Inspired by modern deep learning research
- Community-driven development
- All contributors who help make KDP better
Built with ❤️ for the ML community by 🦄 UnicoLab.ai


