# ü¶Ü DuckGuard for Microsoft Fabric

Validate your Fabric Lakehouse and Warehouse data in 3 lines of Python.

**Works with:** OneLake (Parquet/Delta), SQL endpoints, Fabric notebooks.

[![GitHub](https://img.shields.io/github/stars/XDataHubAI/duckguard?style=social)](https://github.com/XDataHubAI/duckguard)
[![PyPI](https://img.shields.io/pypi/v/duckguard.svg)](https://pypi.org/project/duckguard/)

## 1. Install

In [None]:
# In a Fabric notebook:
%pip install duckguard[fabric] -q

## 2. Connect to Your Lakehouse

### Option A: OneLake (direct file access)

Access Parquet and Delta tables in your Lakehouse via OneLake.

In [None]:
from duckguard import connect

# OneLake path ‚Äî Lakehouse tables
orders = connect(
    "fabric://my-workspace/my-lakehouse/Tables/orders",
    token="<your-azure-ad-token>"
)

# Or use the full OneLake URL
# orders = connect(
#     "onelake://my-workspace/my-lakehouse.Lakehouse/Tables/orders",
#     token="<your-azure-ad-token>"
# )

### Option B: SQL Endpoint

Query via T-SQL ‚Äî works with both Lakehouse and Warehouse.

In [None]:
# SQL endpoint
orders = connect(
    "fabric+sql://your-workspace-guid.datawarehouse.fabric.microsoft.com",
    table="orders",
    database="my_lakehouse",
    token="<your-azure-ad-token>"
)

### Option C: Inside a Fabric Notebook

If you're running in a Fabric notebook, you can load data via Spark and pass it as a DataFrame ‚Äî no token needed.

In [None]:
# In a Fabric notebook ‚Äî load via Spark, validate via DuckGuard
# df = spark.sql("SELECT * FROM my_lakehouse.orders").toPandas()
# orders = connect(df)

# For this demo, we'll create sample data
import pandas as pd

df = pd.DataFrame({
    "order_id": [f"ORD{i:04d}" for i in range(1, 11)],
    "customer_id": [f"CUST{i:03d}" if i != 3 else None for i in range(1, 11)],
    "product": ["Widget", "Gadget", "Widget", "Gizmo", "Widget",
                "Gadget", "Widget", "Bundle", "Widget", "Gizmo"],
    "quantity": [2, 1, -3, 1, 500, 2, 1, 3, 1, 2],
    "total_amount": [70.37, 54.49, -93.08, 217.99, 16349.54,
                      113.97, 15.88, 326.97, 37.68, 435.98],
    "status": ["shipped", "delivered", "pending", "shipped", "pending",
               "INVALID", "delivered", "shipped", "delivered", "pending"],
    "email": ["alice@example.com", "bob@example.com", "charlie@example.com",
              None, "eve@example.com", "frank@example.com", "grace@example",
              "hans@example.de", "ivan@example.com", "jun@example.jp"],
})

orders = connect(df)
print(f"Rows: {orders.row_count}, Columns: {len(orders.columns)}")

## 3. Validate ‚Äî Same API Everywhere

In [None]:
# These assertions work the same on Fabric, Snowflake, S3, or CSV
checks = [
    ("order_id not null", orders.order_id.is_not_null()),
    ("order_id unique", orders.order_id.is_unique()),
    ("customer_id not null", orders.customer_id.is_not_null()),
    ("quantity in [1, 100]", orders.quantity.between(1, 100)),
    ("total_amount positive", orders.total_amount.greater_than(0)),
    ("status valid", orders.status.isin(["pending", "shipped", "delivered", "cancelled"])),
]

for name, result in checks:
    icon = "‚úÖ" if result.passed else "‚ùå"
    print(f"{icon} {name}")
    if not result.passed:
        print(f"   ‚Üí {result.summary()}")

## 4. Quality Score

In [None]:
score = orders.score()

print(f"Quality Grade: {score.grade} ({score.overall:.1f}/100)")
print(f"  Completeness: {score.completeness:.1f}%")
print(f"  Uniqueness:   {score.uniqueness:.1f}%")
print(f"  Validity:     {score.validity:.1f}%")
print(f"  Consistency:  {score.consistency:.1f}%")

## 5. Profile & PII Detection

In [None]:
from duckguard import AutoProfiler, SemanticAnalyzer

# Full profile
profile = AutoProfiler().profile(orders)
print(f"{'Column':<20} {'Nulls %':<10} {'Unique %':<10} {'Grade'}")
print("-" * 50)
for col in profile.columns:
    print(f"{col.name:<20} {col.null_percent:<10.1f} {col.unique_percent:<10.1f} {col.quality_grade}")

# PII scan
analysis = SemanticAnalyzer().analyze(orders)
if analysis.pii_columns:
    print(f"\n‚ö†Ô∏è  PII found in: {analysis.pii_columns}")
    print("   ‚Üí Consider masking before sharing this data")

## 6. Auto-Generate Validation Rules

In [None]:
from duckguard import generate_rules

yaml_rules = generate_rules(orders, dataset_name="fabric_orders")
print(yaml_rules)

## 7. Integrate with Fabric Pipelines

### In a Fabric Notebook Activity (Data Pipeline)

```python
from duckguard import connect, load_rules, execute_rules

# Load data from Lakehouse
df = spark.sql("SELECT * FROM my_lakehouse.orders").toPandas()
data = connect(df)

# Validate against rules
rules = load_rules("/lakehouse/default/Files/duckguard.yaml")
result = execute_rules(rules, data)

if not result.passed:
    raise Exception(
        f"Data quality check failed: {result.failed_count} failures\n"
        f"{result.summary()}"
    )
```

### As a pytest Check in CI/CD

```python
# tests/test_fabric_quality.py
from duckguard import connect

def test_orders_quality():
    orders = connect(
        "fabric+sql://workspace.datawarehouse.fabric.microsoft.com",
        table="orders", database="lakehouse", token=os.environ["FABRIC_TOKEN"]
    )
    assert orders.row_count > 0
    assert orders.order_id.is_not_null()
    assert orders.order_id.is_unique()
    assert orders.total_amount.between(0, 50000)
```

## 8. Getting Your Azure AD Token

```python
# Option 1: Azure Identity (recommended)
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
token = credential.get_token("https://analysis.windows.net/powerbi/api/.default").token

# Option 2: In a Fabric notebook (automatic)
# Token is available via mssparkutils
token = mssparkutils.credentials.getToken("pbi")

# Then pass to DuckGuard
data = connect("fabric://workspace/lakehouse/Tables/orders", token=token)
```

---

## Next Steps

- üìö [Full Docs](https://xdatahubai.github.io/duckguard/)
- üîå [All Connectors](https://xdatahubai.github.io/duckguard/connectors/overview/) ‚Äî S3, Snowflake, Databricks, BigQuery, and more
- ü§ñ [AI Features](https://xdatahubai.github.io/duckguard/guide/ai-features/) ‚Äî LLM-powered explain, suggest, and fix
- ‚≠ê [Star on GitHub](https://github.com/XDataHubAI/duckguard)