# Verily Workbench Setup

Run this notebook once at the start of each JupyterLab session to initialize workspace environment variables.

This will set:
- `GOOGLE_CLOUD_PROJECT` - Your Google Cloud project ID
- `WORKSPACE_CDR` - BigQuery dataset with OMOP CDR data
- `WORKSPACE_BUCKET` - Persistent workspace GCS bucket
- `WORKSPACE_TEMP_BUCKET` - Temporary workspace GCS bucket

**Time to run:** ~2 seconds

In [None]:
import os
import json
import subprocess
from typing import Dict

In [None]:
def setup_aou_env(verbose: bool = True) -> Dict[str, str]:
    """
    Set All of Us workspace environment variables using wb CLI.

    Uses the Verily Workbench CLI to dynamically extract workspace
    configuration and set environment variables for the current session.

    Args:
        verbose: If True, print variables as they're set

    Returns:
        dict: Environment variables that were set

    Raises:
        subprocess.CalledProcessError: If wb CLI commands fail
    """
    # Extract workspace info
    workspace = json.loads(
        subprocess.run(
            ["wb", "workspace", "describe", "--format=json"],
            capture_output=True, text=True, check=True
        ).stdout
    )

    # Extract resources
    resources = json.loads(
        subprocess.run(
            ["wb", "resource", "list", "--format=json"],
            capture_output=True, text=True, check=True
        ).stdout
    )

    # Set Google Cloud project
    os.environ["GOOGLE_CLOUD_PROJECT"] = workspace["googleProjectId"]

    # Initialize CDR (will be set below)
    os.environ["WORKSPACE_CDR"] = ""

    # Set buckets and CDR from resources list
    for r in resources:
        if r["resourceType"] == "GCS_BUCKET":
            # Check temporary bucket first to avoid substring conflicts
            if "temporary-workspace-bucket" in r["id"]:
                os.environ["WORKSPACE_TEMP_BUCKET"] = f"gs://{r['bucketName']}"
            elif "workspace-bucket" in r["id"]:
                os.environ["WORKSPACE_BUCKET"] = f"gs://{r['bucketName']}"

        elif r["resourceType"] in ["BQ_DATASET", "BIGQUERY_DATASET"]:
            # Only set CDR if not already set (use first found)
            if os.environ.get("WORKSPACE_CDR") == "":
                os.environ["WORKSPACE_CDR"] = f"{r['projectId']}.{r['datasetId']}"

    # Collect variables for return
    env_vars = {
        "GOOGLE_CLOUD_PROJECT": os.environ.get("GOOGLE_CLOUD_PROJECT"),
        "WORKSPACE_BUCKET": os.environ.get("WORKSPACE_BUCKET"),
        "WORKSPACE_TEMP_BUCKET": os.environ.get("WORKSPACE_TEMP_BUCKET"),
        "WORKSPACE_CDR": os.environ.get("WORKSPACE_CDR")
    }

    if verbose:
        print("✅ Workspace environment variables set:")
        for key, val in env_vars.items():
            print(f"  {key} = {val}")

    return env_vars

In [None]:
# Execute setup with verbose output
env = setup_aou_env(verbose=True)

In [None]:
# Assign to Python variables for easy access
WORKSPACE_CDR = os.environ['WORKSPACE_CDR']
WORKSPACE_BUCKET = os.environ['WORKSPACE_BUCKET']
WORKSPACE_TEMP_BUCKET = os.environ['WORKSPACE_TEMP_BUCKET']
GOOGLE_CLOUD_PROJECT = os.environ['GOOGLE_CLOUD_PROJECT']

print("\n✅ Variables ready for analysis:")
print(f"  CDR: {WORKSPACE_CDR}")
print(f"  Bucket: {WORKSPACE_BUCKET}")
print(f"  Temp Bucket: {WORKSPACE_TEMP_BUCKET}")
print(f"  Project: {GOOGLE_CLOUD_PROJECT}")

## Usage in Analysis Notebooks

These variables are now available in `os.environ` for the current session.

In other notebooks, access them with:

```python
import os

WORKSPACE_CDR = os.environ['WORKSPACE_CDR']
WORKSPACE_BUCKET = os.environ['WORKSPACE_BUCKET']
WORKSPACE_TEMP_BUCKET = os.environ['WORKSPACE_TEMP_BUCKET']
```

**Note**: You must run this setup notebook in each new JupyterLab session.