# Setting and Loading VWB Environment Variables

This notebook ensures that key workspace-level variables — like the Google Cloud project, workspace buckets, and CDR — are available to every notebook in this Jupyter environment.

It works by:
1. Writing the variables to your `~/.bashrc` file (so they persist across sessions).
2. Using a helper function to load them into Python’s `os.environ`.
3. Allowing other notebooks to simply import these variables instead of redefining them.

---

### Variables that will be set
- `GOOGLE_CLOUD_PROJECT`
- `WORKSPACE_BUCKET`
- `WORKSPACE_TEMP_BUCKET`
- `WORKSPACE_CDR`


We will use the WB CLI commands to dynamically obtain our environment variables and set them. This is the preferred method as updates to the datasets with each release can occur and overwrite pre-existing data.

## Step 1: Read and write environment variables to `.bashrc`

This step:
- extract main env variables from wb resource
- Checks whether `~/.bashrc` exists (creates it if not).
- Appends export statements for the four variables.
- If a variable already exists, it replaces its old value with the new one.

You’ll only need to run this cell once unless you want to change the variable values.

In [None]:
import json
import subprocess
import os

# --- Extract variables directly into os.environ ---
workspace = json.loads(subprocess.run(
    ["wb", "workspace", "describe", "--format=json"],
    capture_output=True, text=True, check=True
).stdout)

os.environ["GOOGLE_CLOUD_PROJECT"] = workspace["googleProjectId"]

resources = json.loads(subprocess.run(
    ["wb", "resource", "list", "--format=json"],
    capture_output=True, text=True, check=True
).stdout)

os.environ["WORKSPACE_CDR"] = ""

# --- Step 3: Extract workspace resources ---
for r in resources:
    
    # 1. BUCKET LOGIC (Execute for ALL resources)
    if r["resourceType"] == "GCS_BUCKET":
        print(f"Found bucket: id={r['id']}, bucketName={r['bucketName']}")
        
        # Check temporary bucket first to avoid substring conflicts
        if "temporary-workspace-bucket" in r["id"]:
            os.environ["WORKSPACE_TEMP_BUCKET"] = f"gs://{r['bucketName']}"
        elif "workspace-bucket" in r["id"]:
            os.environ["WORKSPACE_BUCKET"] = f"gs://{r['bucketName']}"

    # 2. BQ DATASET LOGIC (Only set if CDR is not already set)
    elif r["resourceType"] in ["BQ_DATASET", "BIGQUERY_DATASET"]:
        # Check if the WORKSPACE_CDR is still an empty string (i.e., not set yet)
        if os.environ.get("WORKSPACE_CDR") == "":
            os.environ["WORKSPACE_CDR"] = f"{r['projectId']}.{r['datasetId']}"
            print(f"Successfully set WORKSPACE_CDR to: {os.environ['WORKSPACE_CDR']}")

# --- Print what we got ---
print("\nVariables extracted:")
for var in ["GOOGLE_CLOUD_PROJECT", "WORKSPACE_BUCKET", 
            "WORKSPACE_TEMP_BUCKET", "WORKSPACE_CDR"]:
    value = os.environ.get(var)
    print(f"{var}: {value if value else 'NOT FOUND'}")


# --- Save to .bashrc (no dictionary needed!) ---
bashrc_path = os.path.expanduser("~/.bashrc")

# Check if .bashrc exists, create it if not
if not os.path.exists(bashrc_path):
    print(f"Creating {bashrc_path}...")
    with open(bashrc_path, "w") as f:
        f.write("# Created by Verily setup script\n")

# Now continue with reading/appending
# Simple approach: just append (will create duplicates if run multiple times)
with open(bashrc_path, "a") as f:
    f.write("\n# Verily Workbench variables\n")
    for var in ["GOOGLE_CLOUD_PROJECT", "WORKSPACE_BUCKET", 
                "WORKSPACE_TEMP_BUCKET", "WORKSPACE_CDR"]:
        value = os.environ.get(var)
        if value:
            f.write(f'export {var}="{value}"\n')

print(f"\n✅ Saved to {bashrc_path}")

In [None]:
for r in resources:
    print(r["id"], r["resourceType"])
    if r["resourceType"] == "GCS_BUCKET" and "workspace-bucket" in r["id"]:
        os.environ["WORKSPACE_BUCKET"] = f"gs://{r['bucketName']}"
    elif r["resourceType"] == "GCS_BUCKET" and "temporary" in r["id"]:
        os.environ["WORKSPACE_TEMP_BUCKET"] = f"gs://{r['bucketName']}"

## Step 2: Load Environment Variables in Python and test

The following function reads your `~/.bashrc` file and sets each variable in Python’s runtime environment (`os.environ`), so you can use them directly in your code.

Any other notebook can reuse this function to reload environment variables at the start.

In [None]:
import os

# In your notebook, just run this:
with open(os.path.expanduser("~/.bashrc"), 'r') as f:
    for line in f:
        if line.strip().startswith('export '):
            parts = line.strip().replace('export ', '').split('=', 1)
            if len(parts) == 2:
                var_name = parts[0].strip()
                var_value = parts[1].strip().strip("'\"")
                
                # SKIP PATH completely!
                if var_name == 'PATH':
                    continue  # Skip this line
                    
                os.environ[var_name] = var_value

# Now use them
print(f"GOOGLE_CLOUD_PROJECT = {os.environ.get('GOOGLE_CLOUD_PROJECT')}")
print(f"WORKSPACE_BUCKET = {os.environ.get('WORKSPACE_BUCKET')}")

**Confirm that the environment variables are correctly set.**

In [None]:
!echo $WORKSPACE_CDR

In [None]:
!echo $WORKSPACE_TEMP_BUCKET