Skip to content

UnicoLab/keras-data-processor

Repository files navigation

🌟 Keras Data Processor (KDP) - Powerful Data Preprocessing for TensorFlow 🌟

Keras Data Processor Logo

Provided and maintained by 🦄 UnicoLab

Python 3.9+ TensorFlow 2.18+ License: MIT 🦄 UnicoLab Documentation

Transform your raw data into ML-ready features with just a few lines of code!

KDP provides a state-of-the-art preprocessing system built on TensorFlow Keras. It handles everything from feature normalization to advanced embedding techniques, making your ML pipelines faster, more robust, and easier to maintain. Built with ❤️ by 🦄 UnicoLab, it provides a clean, efficient, and extensible foundation for building sophisticated machine learning models for enterprise AI applications.

✨ Key Features

  • 🚀 Efficient Single-Pass Processing: Process all features in one go, dramatically faster than alternatives
  • 🧠 Distribution-Aware Encoding: Automatically detects and optimally handles different data distributions
  • 👁️ Tabular Attention: Captures complex feature interactions for better model performance
  • 🔍 Feature Selection: Automatically identifies and focuses on the most important features
  • 🔄 Feature-wise Mixture of Experts: Specialized processing for different feature types
  • 📦 Production-Ready: Deploy your preprocessing along with your model as a single unit

🚀 Quick Installation

# Using pip
pip install kdp

# Using Poetry
poetry add kdp

📋 Simple Example

from kdp import PreprocessingModel, FeatureType

# Define your features
features_specs = {
    "age": FeatureType.FLOAT_NORMALIZED,
    "income": FeatureType.FLOAT_RESCALED,
    "occupation": FeatureType.STRING_CATEGORICAL,
    "description": FeatureType.TEXT
}

# Create and build the preprocessor
preprocessor = PreprocessingModel(
    path_data="data/my_data.csv",
    features_specs=features_specs,
    # Enable advanced features
    use_distribution_aware=True,
    tabular_attention=True
)
result = preprocessor.build_preprocessor()
model = result["model"]

# Use the preprocessor with your data
processed_features = model(input_data)

📚 Comprehensive Documentation

We've built an extensive documentation system to help you get the most from KDP:

Core Guides

Advanced Topics

Integration & Performance

Background & Resources

🖼️ Model Architecture

Your preprocessing pipeline is built as a Keras model that can be used independently or as the first layer of any model:

📊 Performance

KDP outperforms alternative preprocessing approaches, especially as data size increases:

🤝 Contributing

We welcome contributions! Please check out our Contributing Guide for guidelines on how to proceed.

💬 Join Our Community

Have questions or want to connect with other KDP users? Join us on Discord:

Discord

🛠️ Development Tools

KDP includes tools to help developers:

  • Documentation Generation: Automatically generate API docs from docstrings
  • Model Diagram Generation: Visualize model architectures with make generate_doc_content or run:
    python scripts/generate_model_diagrams.py
    This creates diagram images in docs/features/imgs/models/ for all feature types and configurations.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Built with TensorFlow and Keras
  • Inspired by modern deep learning research
  • Community-driven development
  • All contributors who help make KDP better

Built with ❤️ for the ML community by 🦄 UnicoLab.ai

About

Data Preprocessing model based on Keras preprocessing layers that can be used as a standalone model or incorporated to Keras model as first layers.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 6

Languages