# Databricks SDK Integration with GRID

This notebook demonstrates how to use the Databricks SDK with the GRID framework for job orchestration, cluster management, and notebook operations.

**Prerequisites:**
- Set `DATABRICKS_HOST` environment variable with your workspace URL
- Set `databricks` or `DATABRICKS_TOKEN` environment variable with your API key
- `databricks-sdk>=0.40.0` installed

## 1. Install and Import Databricks SDK

First, let's ensure the databricks-sdk package is installed and import the necessary modules.

In [7]:
# Setup: Add workspace to path and configure environment
import os
import sys

# Add workspace root to path
workspace_root = "e:\\grid"
if workspace_root not in sys.path:
    sys.path.insert(0, workspace_root)

# Set Databricks host - use the full URL without /browse/folders path
DATABRICKS_HOST = "https://dbc-9747ff30-23c5.cloud.databricks.com"
os.environ["DATABRICKS_HOST"] = DATABRICKS_HOST

print("‚úÖ Environment configured")
print(f"   Python path includes: {workspace_root}")
print(f"   DATABRICKS_HOST: {os.getenv('DATABRICKS_HOST')}")
print(f"   databricks token: {'SET' if os.getenv('databricks') else 'NOT SET'}")

# Verify databricks-sdk is installed
try:
    import databricks.sdk
    print("‚úÖ databricks-sdk is installed and importable")
except ImportError:
    print("‚ùå databricks-sdk not found")

‚úÖ Environment configured
   Python path includes: e:\grid
   DATABRICKS_HOST: https://dbc-9747ff30-23c5.cloud.databricks.com
   databricks token: SET
‚úÖ databricks-sdk is installed and importable


## 2. Initialize Databricks Client

Create a DatabricksClient instance using environment variables. The client supports multiple authentication methods:
- `DATABRICKS_HOST` (required) - Your Databricks workspace URL
- `DATABRICKS_TOKEN` or `databricks` (required) - Your API token

In [4]:
try:
    from src.integration.databricks import (
        DatabricksClient,
        DatabricksClustersManager,
        DatabricksJobsManager,
        DatabricksNotebooksManager,
    )
    print("‚úÖ All GRID Databricks modules imported successfully!")
except ImportError as e:
    print(f"‚ùå Import error: {e}")
    print("\nTrying alternative import...")
    # Try direct module import
    from src.integration.databricks.client import DatabricksClient
    from src.integration.databricks.jobs import DatabricksJobsManager
    print("‚úÖ Imports successful via direct module paths")

‚úÖ All GRID Databricks modules imported successfully!


## 3. Cluster Management

Let's explore cluster operations available through the client.

In [8]:
# Initialize client (reads DATABRICKS_HOST and 'databricks' env vars)
print("üîå Connecting to Databricks...")
try:
    client = DatabricksClient()
    print("‚úÖ Successfully connected!")
except Exception as e:
    print(f"‚ùå Connection failed: {e}")
    print("   Make sure DATABRICKS_HOST and 'databricks' env vars are set")
    raise

# Now test a simple operation
print("\nüìä Listing clusters in your workspace...")
try:
    clusters = client.list_clusters()

    if clusters:
        print(f"‚úÖ Found {len(clusters)} cluster(s):")
        for cluster in clusters:
            print(f"  üìç {cluster.get('cluster_name', 'Unknown')} (ID: {cluster.get('cluster_id')})")
    else:
        print("‚úÖ Connected successfully! No clusters currently running.")
        print("   (You can create clusters in the Databricks UI or via API)")
except Exception as e:
    print(f"‚ùå Error: {e}")

üîå Connecting to Databricks...
‚úÖ Successfully connected!

üìä Listing clusters in your workspace...
‚úÖ Connected successfully! No clusters currently running.
   (You can create clusters in the Databricks UI or via API)


## 4. Job Management

Create and run jobs using DatabricksJobsManager.

**Note:** This example shows the structure. Replace with your actual notebook paths and job names.

In [None]:
try:
    jobs_manager = DatabricksJobsManager(client)

    # List existing jobs
    print("üìã Existing Jobs in Workspace:\n")
    jobs = jobs_manager.list_jobs()

    if not jobs:
        print("  No jobs found")
    else:
        for i, job in enumerate(jobs[:5], 1):  # Show first 5
            print(f"  {i}. {job['settings']}")
            print(f"     Job ID: {job['job_id']}")
            print()

    # Example: Create a notebook job (commented out - customize for your use)
    print("\nüìù Example: Creating a Notebook Job\n")
    print("""
    # To create a job, use:
    job_id = jobs_manager.create_notebook_job(
        job_name="my-processing-job",
        notebook_path="/Repos/user/project/process_data.ipynb",
        cluster_id="cluster-123",  # Use cluster from step 3
        base_parameters={"input_path": "/data/input"}
    )
    print(f"Created job {job_id}")

    # Then run it:
    run_id = jobs_manager.run_job(job_id)
    print(f"Started run {run_id}")
    """)

    print("‚úÖ Jobs manager initialized successfully!")

except Exception as e:
    print(f"‚ùå Error with jobs manager: {e}")
    jobs_manager = None

üìã Existing Jobs in Workspace:

  No jobs found

üìù Example: Creating a Notebook Job


    # To create a job, use:
    job_id = jobs_manager.create_notebook_job(
        job_name="my-processing-job",
        notebook_path="/Repos/user/project/process_data.ipynb",
        cluster_id="cluster-123",  # Use cluster from step 3
        base_parameters={"input_path": "/data/input"}
    )
    print(f"Created job {job_id}")

    # Then run it:
    run_id = jobs_manager.run_job(job_id)
    print(f"Started run {run_id}")
    
‚úÖ Jobs manager initialized successfully!


: 

## 5. Notebook Operations

Manage notebooks in the Databricks workspace.