# Automated Data Observability Pipeline for Tableau Cloud

### Project Overview
In large-scale Analytics environments, relying on users to report stale data is inefficient and damages trust. This project shifts the paradigm from **Reactive Support** to **Proactive Observability**.

Using the **Tableau Server Client (TSC) API**, this pipeline:
1.  **Audits** the entire Tableau Cloud environment automatically.
2.  **Validates** the "Data Freshness" of extracts against defined SLAs.
3.  **Identifies** failures before they impact business decision-making.

> This is a **blueprint / template**, not a production-ready pipeline.
Some sections intentionally use pseudocode to focus on architecture and reasoning.

### Tech Stack
* **Python 3.x**
* **Tableau Server Client (TSC)** for API interaction
* **Pandas** for structured log analysis
* **Data Governance** concepts (SLA monitoring)

### Data Freshness Logic (Conceptual)

This notebook uses a simplified and conceptual approach to evaluate data freshness.

Important:
- `last_refresh` is treated as a placeholder for extract refresh metadata.
- Assets without applicable refresh information (e.g. live connections) are classified as `NotApplicable`.
- `Unknown` or missing metadata is never treated as `Healthy`.

This approach avoids false-positive health indicators and reflects observability best practices.


In [None]:
# Install the Tableau Server Client library
# This is required as it is not included in the standard Databricks Runtime.

%pip install tableauserverclient

In [None]:
import tableauserverclient as TSC
import pandas as pd
import datetime
from datetime import timezone
from pyspark.sql import SparkSession

## Authentication Configuration

Authentication is handled via Personal Access Tokens (PAT).
In Databricks, credentials should be retrieved securely using Secret Scopes.

This notebook assumes the following secrets exist:
- tableau-pat-name
- tableau-pat-value
- tableau-site-id


In [None]:
# --- CONFIGURATION ---
SERVER_URL = "https://prod-useast-b.online.tableau.com"

try:
    TOKEN_NAME  = dbutils.secrets.get(scope="tableau", key="pat-name")
    TOKEN_VALUE = dbutils.secrets.get(scope="tableau", key="pat-value")
    SITE_ID     = dbutils.secrets.get(scope="tableau", key="site-id")
except Exception as e:
    raise RuntimeError(
        "Tableau credentials not found. "
        "This notebook expects PAT credentials stored in Databricks Secrets."
    )

tableau_auth = TSC.PersonalAccessTokenAuth(
    token_name=TOKEN_NAME,
    personal_access_token=TOKEN_VALUE,
    site_id=SITE_ID
)

server = TSC.Server(SERVER_URL, use_server_version=True)


## Data Freshness Evaluation Logic (Conceptual)

Observability distinguishes between **absence of data** and **healthy data**.

Classification rules:
- Healthy: Extract refreshed within SLA
- Critical: Extract refresh exceeds SLA
- NotApplicable: Live connections or assets without refresh semantics
- Unknown: Metadata exists but freshness cannot be determined

Important:
- Missing refresh timestamps are never treated as "Healthy"


In [None]:
now_utc = datetime.datetime.now(timezone.utc)

def calculate_freshness(last_refresh, sla_hours=24):
    """
    Conceptual freshness evaluation.
    
    Parameters:
    - last_refresh: datetime or None
    - sla_hours: SLA threshold
    
    Returns:
    - hours_since_refresh (float or None)
    - status (str)
    """
    if last_refresh is None:
        return None, 'NotApplicable'

    hours_since = (now_utc - last_refresh).total_seconds() / 3600

    if hours_since > sla_hours:
        return round(hours_since, 2), 'Critical'

    return round(hours_since, 2), 'Healthy'


## Metadata Extraction Pipeline

This section iterates through Tableau Cloud assets and builds
a unified audit log for observability analysis.

⚠️ API calls are simplified and serve as conceptual placeholders.


In [None]:
audit_log = []

print(f"Starting Audit on: {SERVER_URL}")

with server.auth.sign_in(tableau_auth):

    # --- WORKBOOK AUDIT (PSEUDOCODE) ---
    print("Scanning Workbooks...")
    all_workbooks, _ = server.workbooks.get(TSC.RequestOptions())

    for wb in all_workbooks:
        # Conceptual placeholder: real implementations may use extract refresh endpoints
        last_refresh = wb.updated_at

        hours_since, status = calculate_freshness(last_refresh)

        audit_log.append({
            'asset_type': 'Workbook',
            'asset_id': wb.id,
            'asset_name': wb.name,
            'owner_id': wb.owner_id,
            'project_name': wb.project_name,
            'last_refresh_utc': last_refresh,
            'hours_since_refresh': hours_since,
            'status': status,
            'audit_timestamp': now_utc
        })

    # --- DATASOURCE AUDIT (PSEUDOCODE) ---
    print("Scanning Datasources...")
    all_datasources, _ = server.datasources.get(TSC.RequestOptions())

    for ds in all_datasources:
        last_refresh = ds.updated_at  # Conceptual placeholder

        hours_since, status = calculate_freshness(last_refresh)

        audit_log.append({
            'asset_type': 'Datasource',
            'asset_id': ds.id,
            'asset_name': ds.name,
            'owner_id': ds.owner_id,
            'project_name': ds.project_name,
            'last_refresh_utc': last_refresh,
            'hours_since_refresh': hours_since,
            'status': status,
            'audit_timestamp': now_utc
        })

print(f"Audit Complete. Extracted {len(audit_log)} records.")


## Persistence (Bronze Layer)

The audit log is persisted as a Delta table.
This layer is intended for downstream analytics, alerting, and dashboards.


In [None]:
audit_df = spark.createDataFrame(audit_log)

(
    audit_df
    .write
    .mode("append")
    .format("delta")
    .saveAsTable("tableau_observability_bronze")
)

display(audit_df.limit(10))


## Expected Outputs

- Unified audit table containing:
  - Asset metadata
  - Data freshness indicators
  - SLA-based health classification
- Bronze layer ready for:
  - Alerting
  - Governance dashboards
  - Historical trend analysis

## Next Steps

- Integrate Tableau Metadata API / GraphQL for lineage and field-level observability
- Introduce project-specific SLAs
- Build alerting on Critical assets
- Create a Gold-layer dashboard for operational monitoring
