In the modern enterprise, data is not merely a resourceβit is the living nervous system of the organization. Yet, most data platforms remain fragmented, with isolated lakes, incompatible pipelines, and governance gaps that silently drain value. The Unified Data Intelligence Platform with Fabric & Open-Source Extensions reimagines this landscape as an interconnected, self-healing architecture that adapts to your workflows rather than forcing your workflows into rigid structures.
This solution accelerator provides a cohesive foundation built on Microsoft Fabric, while offering seamless integration points for Azure Databricks for advanced analytics and Microsoft Purview for unified governance. Unlike traditional siloed approaches, this repository implements a polyglot data meshβa design where each domain owns its data product, yet all domains speak the same governance language.
The core innovation lies in its dual-lane ingestion pattern: real-time streaming data flows through Fabric's Eventhouse while batch historical data is orchestrated via Databricks pipelines, with Purview providing cross-platform lineage tracking. This eliminates the common trade-off between freshness and completeness.
Every organization we have observed faces the same three paradoxes:
- The Speed ParadoxβThe faster data arrives, the messier it becomes.
- The Scale ParadoxβThe more data you store, the harder it is to find what matters.
- The Trust ParadoxβThe more governance you enforce, the slower innovation becomes.
This repository resolves all three simultaneously. By leveraging Fabric's OneLake as the single copy of truth, Databricks for compute-optimized transformations, and Purview for automated classification, we create a virtuous cycle: fresher data improves models, models improve governance, and governance builds trust.
This is not just another data integration template. It is a living ecosystem with the following capabilities:
- Real-time ingestion via Fabric Event Streams with sub-second latency
- Batch ingestion scheduled through Databricks orchestration (not cron-based, but event-driven)
- Schema-on-read flexibility that adapts to changing source formats without pipeline rewrites
- Automated classification using Purview's machine learning classifiers (no manual tagging)
- Cross-platform lineage visible as a directed acyclic graph spanning Fabric, Databricks, and external sources
- Policy-as-Code enforcement where data access rules travel WITH the data, not separately configured
- Support for T-SQL, PySpark, KQL, and DAX within the same virtual warehouse
- Language-agnostic semantic model that translates between dialects automatically
- Natural language querying (NLQ) via integrated AI copilot for business users
- Built with adaptive UI that renders seamlessly on mobile, tablet, or 60-inch command center screens
- Real-time lineage visualization that zooms from macro architecture to individual column transformations
- Anomaly detection using statistical process control (not simple threshold alerts)
- Automatic retry with exponential backoff for transient failures
- Data quality checkpoints that pause the pipeline and generate corrective recommendations
- Versioned data products that allow rollback to any historical state without data duplication
- Built-in remediation chatbot that contextualizes errors with the specific data product lineage
- SLA monitoring that predicts potential breaches 30 minutes before they occur
- Runbook automation for common failure scenarios (e.g., schema drift, throttling, credential expiration)
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Consumption Layer β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
β β Power BI β β Python β β REST β β Kafka β β External β β
β β Reports β β Notebooksβ β API β β Streams β β Apps β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Semantic & Governance β
β ββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
β β Microsoft Purview β β Data Mesh Domain Boundaries β β
β β β’ Automated classificationβ β β’ Product ownership per domain β β
β β β’ Lineage tracking β β β’ Contract-based data sharing β β
β β β’ Policy enforcement β β β’ Global catalog + local schemasβ β
β ββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Compute & Processing β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ β
β β Fabric β β Azure β β Databricks β β
β β β’ Lakehouse β β β’ Spark pools β β β’ Delta Lake β β
β β β’ Data Factory β β β’ Serverless SQL β β β’ MLflow models β β
β β β’ Eventhouse β β β’ Streaming jobs β β β’ Feature store β β
β ββββββββββββββββββββ ββββββββββββββββββββ ββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Unified Storage (OneLake) β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
β β Bronze β β Silver β β Gold β β Externalβ β Archive β β
β β (Raw) β β (Cleansed)β β (Curated)β β (Shared) β β (Cold) β β
β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
This architecture follows a medallion pattern with a twist: the bronze layer is not immutableβit can be rolled back to any point-in-time snapshot. The silver layer applies column-level lineage using Databricks' Unity Catalog combined with Purview's automated scanning. The gold layer is where data products are curated with versioned contracts.
| Capability | Traditional Lakehouse | This Solution |
|---|---|---|
| Real-time ingestion | Manual setup needed | Built-in Event Streams |
| Cross-platform governance | Separate tools | Purview-integrated lineage |
| Query language support | Single dialect | T-SQL + PySpark + KQL + NLQ |
| Failure recovery | Manual retry | Self-healing with remediation |
| Mobile observability | None | Responsive adaptive dashboard |
| Multilingual NLP | None | AI copilot in 12 languages |
| Data product versioning | Not supported | Immutable time-travel snapshots |
| 24/7 proactive monitoring | Reactive alerts | Predictive SLA breach detection |
Before exploring this repository, ensure your environment possesses:
- An Azure subscription with Fabric capacity (F2 or higher recommended)
- Access to Microsoft Purview with data map configuration permissions
- A Databricks workspace (Premium tier for Unity Catalog)
- Power BI Desktop or Service for visualization consumption
- Python 3.10+ environment for local testing (optional but helpful)
βββ fabric/ # Fabric-specific artifacts
β βββ lakehouse/ # OneLake table schemas and shortcuts
β βββ eventstream/ # Real-time ingestion pipelines
β βββ datafactory/ # Orchestration and transformation
β βββ semanticmodel/ # Power BI dataset and measures
βββ databricks/ # Databricks integration layer
β βββ notebooks/ # PySpark transformation notebooks
β βββ unitycatalog/ # Schema, table, and model definitions
β βββ workflows/ # Orchestrated multi-task jobs
βββ purview/ # Governance and compliance
β βββ classification/ # Custom classification rules
β βββ lineage/ # Cross-system lineage mapping
β βββ policies/ # Data access policies (Policy-as-Code)
βββ observability/ # Monitoring and diagnostics
β βββ dashboards/ # Power BI and KQL dashboard templates
β βββ alerts/ # Anomaly detection and remediation
β βββ support/ # Chatbot configuration and runbooks
βββ shared/ # Cross-cutting concerns
β βββ schemas/ # Avro, Parquet, and Delta schema files
β βββ config/ # Environment-specific parameters
β βββ utils/ # Shared Python libraries
βββ docs/ # Documentation and references
β βββ architecture.md # Detailed architecture documentation
β βββ governance.md # Data governance implementation guide
β βββ performance.md # Tuning and optimization guidelines
βββ tests/ # Validation and quality checks
β βββ unit/ # Unit tests for transformations
β βββ integration/ # Cross-system integration tests
β βββ performance/ # Load and stress test scenarios
βββ CONTRIBUTING.md # Contribution guidelines
The fabric/ directory contains everything needed to instantiate a Fabric environment that aligns with medallion architecture best practices. The Eventhouse configuration captures real-time streams from Azure Event Hubs, while Data Factory pipelines handle scheduled batch loads. Both are configured to log lineage events into Purview automatically.
Within databricks/, you will find pre-built notebook templates for complex transformations that benefit from GPU acceleration or MLflow integration. The Unity Catalog definitions ensure that all Databricks-created tables are automatically registered in Purview's line of sight, closing the governance loop.
The purview/ section includes custom classification rules for industry-specific data types (e.g., HIPAA fields, GDPR personal identifiers, PCI card data). The policy engine uses Azure Policy definitions that dynamically enforce access controls based on data sensitivity scores.
24/7 operational support is built into the observability layer. The chatbot (located in observability/support/) uses event-driven triggers to diagnose pipeline failures in natural language, providing step-by-step remediation. The anomaly detection module uses moving averages and z-scores to identify unusual patterns before they become incidents.
The adaptive dashboard (built with Power BI's responsive layout capabilities) automatically reorganizes visualizations based on screen size. On mobile devices, it presents a focused view of pipeline health and recent anomalies; on desktop, it expands to show full lineage graphs and model performance metrics.
The platform's intelligent query copilot supports natural language inputs in the following languages:
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Japanese (ja)
- Chinese (zh-CN)
- Arabic (ar)
- Portuguese (pt-BR)
- Korean (ko)
- Italian (it)
- Dutch (nl)
- Polish (pl)
Language detection occurs automatically based on the query text, with no explicit configuration needed. The copilot returns results in the same language, including translated column names and measure definitions where possible.
- Start with
fabric/lakehouse/to understand the medallion table structures. - Review
databricks/notebooks/for transformation patterns. - Configure
purview/classification/for your domain-specific data.
- Explore
fabric/semanticmodel/for pre-built measures and dimensions. - Use the NLQ copilot to query data without writing SQL.
- Monitor dashboard health in
observability/dashboards/.
- Review policies in
purview/policies/to understand access controls. - Check lineage reports generated by Purview from the
purview/lineage/module. - Define new classification rules in
purview/classification/.
- Configure alerts in
observability/alerts/for proactive monitoring. - Test the remediation chatbot in
observability/support/. - Review SLA compliance reports from the observability dashboards.
This project is licensed under the MIT License β a permissive license that allows reuse with minimal restrictions. See the full license text for details.
We welcome contributions that enhance the platform's extensibility, performance, or governance capabilities. Before contributing, please review:
- The contribution guidelines in
CONTRIBUTING.md - The architecture decisions in
docs/architecture.md - The governance policies to ensure compliance with data handling standards
This repository provides reference architectures and implementation patterns for building unified data platforms with Microsoft Fabric, Azure Databricks, and Microsoft Purview. It is provided "as is" without warranty of any kind, express or implied.
Users are responsible for:
- Ensuring compliance with their organizational data governance policies
- Conducting appropriate security reviews before production deployment
- Validating that the architecture meets their specific scalability and reliability requirements
- Maintaining current versions of all dependencies and connectors
Note on Support: While this repository includes patterns for building 24/7 support automation, the repository maintainers do not provide around-the-clock operational support for custom deployments. Production support should be arranged through standard Azure support channels.
| Version | Date | Changes |
|---|---|---|
| 2.0 | April 2026 | Added NLQ copilot, responsive UI, remediation chatbot, self-healing pipelines |
| 1.5 | Dec 2025 | Expanded Purview integration, added multilingual support |
| 1.0 | June 2025 | Initial release with Fabric + Databricks + Purview foundation |
Data platforms are not built to lastβthey are built to evolve. This repository provides the evolutionary scaffolding that allows your data architecture to adapt faster than your competition can analyze. By unifying Fabric's managed experience with Databricks' computational power and Purview's governance sophistication, you create a platform that is greater than the sum of its parts.
The future of enterprise data is not about choosing between platformsβit is about orchestrating them into a coherent whole that feels like a single, intelligent system. This repository is your starting point for that journey.