with our partners we are developing the AI Data Platform (KI-Datenplattform)
The AI Data Platform is an open, interoperable infrastructure developed by the KI Allianz Baden-Württemberg.
Its goal: make high-quality datasets, AI models, and compute resources easily accessible to companies, researchers, and public institutions.
- A metadata-based catalog for datasets and AI models
- A cross-sectorial federation platform for data spaces containing domain specific data
- Quality assurance tools for data preparation and annotation
- Seemless workflows towards Cloud and HPC resources for training and experimentation with data
- Built-in legal & ethical compliance for safe data use
Many organizations — especially SMEs — lack structured data, annotation tools, or legal clarity.
The AI Data Platform removes these barriers, helping teams build trustworthy and production-ready AI solutions faster.
Early adopters can:
- Test platform features
- Provide feedback within our User Forum
- Co-shape standards and use cases
- Develop together
- Fork us to run your own data space catalog
What you can do soon:
- Efficiently find high value, AI-ready data sets across sectorial domains
- Federate your data catalog within the AI Allianz to offer data and AI models to customers
- Use our catalogs to federate with your internal knowledge management systems
The power of the AI Data Platform lies in its set of modular, specialized components that work together to create a seamless user experience. This section explores the cornerstone releases that form the foundation of this new ecosystem:
- a reimagined frontend for an intuitive portal experience,
- a novel AI powerered search and interactions for superior discoverability,
- automated data quality services to ensure trustworthy AI,
- and a suite of data integration tools to connect critical sectoral data spaces and your AI workflows
We are building our AI Data Platform on top of the field-proven piveau ecosystem developed by Fraunhofer FOKUS that again builds on the widely adopted DCAT standard.
🔗 https://github.com/KI-Allianz/piveau.nextjs
At the forefront of the platform is Piveau.nextjs, a new modern web interface that serves as the primary gateway for users. This component reimagines the Piveau data portal experience, focusing on discoverability, accessibility, and performance.
Its key features include:
- Full Feature Parity: The new frontend retains all core functionalities of the original Piveau platform, including robust dataset search, advanced filtering, sorting, and data visualization.
- Enhanced User Experience: It introduces new capabilities designed for modern data workflows. Users can now save and manage User Favorites directly in their browser and navigate to dedicated detail pages for AI models via AI Model Routing.
- Optimized Performance: Built on a modern technology stack featuring Next.js 15, the interface leverages server-side rendering (SSR) to deliver superior performance. This ensures that valuable datasets and models are not only fast to access for users but are highly discoverable by external search engines, maximizing their reach and impact.
🔗 https://github.com/KI-Allianz/DCAT-metadata-assistant
We have integrated a DCAT-aware Chat-Agent that can answer complex questions about data sets in the plattform using the structured Meta Data available.
🔗 https://github.com/KI-Allianz/semantic-hub-search
A significant upgrade to the platform's discovery engine comes with the Semantic-Hub-Search component. This innovation enhances the existing lexical (keyword-based) search with a powerful semantic function, creating a state-of-the-art hybrid system. Instead of merely matching keywords, the search now understands the conceptual meaning behind a user's query.
The primary benefits of this semantic approach are:
- Conceptual Understanding: The engine recognizes thematic relationships between terms. For example, a search for "Bahn" (train) will also return relevant results for "Schienenverkehr" (rail transport), significantly improving the breadth of discovery.
- Improved Accuracy: By understanding context and intent, the search delivers more relevant and precise results, reducing the time needed to find the right data.
- Cross-Lingual Capabilities: The underlying technology supports searches across different languages, breaking down barriers for a more inclusive data ecosystem.
This advanced functionality is powered by word embeddings generated via a configurable Ollama-based service. Search requests trigger a vector search in Elasticsearch, and the results are then intelligently combined with a classical lexical full-text search using a "rescore-query" to deliver the best of both worlds.
🔗 https://github.com/KI-Allianz/DataQualityServices
The principle that data quality determines AI success is a core tenet of the platform. A suite of automated services has been released to turn raw data into quality-assured assets, laying the foundation for trustworthy and reliable AI applications. These tools are accessible through a user-friendly, wizard-based interface that supports CSV/XLSX uploads and can publish cleaned results directly back to the Piveau Hub.
The available services are detailed below:
Service Name Core Function and Impact
Data Quality AI Service This multi-step pipeline transforms raw data into a clean, structured, and compliant format. It automates critical tasks including:
- Automated Feature Type Inference: Correctly classifies data columns.
- Personal Data Detection: Flags sensitive information for privacy compliance.
- Automated Data Imputation: Intelligently handles missing values.
- Anomaly Detection: Identifies unusual patterns that may indicate errors.
A key strategic goal of the AI Data Platform is to enable cross-sector innovation by bridging industry-specific data spaces. A collection of "harvesters" and tools has been developed to connect the platform with critical sectors, making siloed data findable and accessible in line with major European data initiatives.
Key integration tools in this release include:
🔗 https://github.com/KI-Allianz/piveau-excel-importer
A low entry barrier tool for importing AI assets into our data catalogs straight from Microsoft Excel template document
🔗 https://github.com/KI-Allianz/hf2dcat
Import existing AI-ready data sets and AI models straight from the current definite source and convert them into the standardized DCAT format for better interoperability with data management workflows and other data spaces
🔗 https://github.com/KI-Allianz/aas2dcat
This PyQt5-based desktop application actively bridges the world of manufacturing with the platform. It maps Asset Administration Shell (AAS) models from FA³ST or BaSyx repositories into DCAT-AP datasets, enabling Industry 4.0 data to be published and discovered in alignment with Manufacturing-X concepts.
🔗 https://github.com/KI-Allianz/mds2hdcat
A powerful command-line tool designed to prepare health data for broader use. It converts metadata records from the NFDI4Health standard into the Health DCAT-AP standard, making them ready for inclusion in catalogues for the European Health Data Space (EHDS).
🔗 https://github.com/KI-Allianz/FROST-Server
An essential extension for IoT applications that promotes urban resilience. It enables SensorThings API data streams to be exported as a DCAT-AP compliant catalogue, making real-time sensor data from Smart City and industrial domains findable and reusable.
While these core components provide a powerful, ready-to-use platform, their true potential is unlocked by the developer community. To that end, we've equipped developers with the following tools to build the next generation of AI applications.
🔗 https://github.com/KI-Allianz/python-dcat-ap-hub
For developers, data scientists, and engineers who wish to programmatically interact with the platform's assets, the python-dcat-ap-hub library offers a streamlined and efficient solution. This Python library is designed for easy downloading and loading of datasets and models directly from the platform's catalog.
Its primary functions include:
- Loading Datasets: The library can download and load any dataset whose metadata is provided in the DCAT-AP (JSON-LD) format, simplifying the process of integrating data into custom scripts and applications.
- Loading Huggingface Models: It includes a specialized function to directly load Huggingface models that are referenced in the platform's catalog, providing seamless access to pre-trained models for AI development.