# LAB 09: Governance & Security

**Duration:** ~40 min | **Day:** 3 | **Difficulty:** Advanced

> *"Before go-live: secure RetailHub data with Column Masks, Row Filters, GRANT/REVOKE, and INFORMATION_SCHEMA."*

## Setup

In [None]:
%run ../../setup/00_setup

In [None]:
# Verify data is available
df_customers = spark.table(f"{CATALOG}.{SILVER_SCHEMA}.customers")
df_orders = spark.table(f"{CATALOG}.{SILVER_SCHEMA}.orders")
print(f"Customers: {df_customers.count()}, Orders: {df_orders.count()}")

---
## Task 1: Grant Privileges

Grant `SELECT` on the Silver schema to the `analysts` group.

Hint: You need `USE CATALOG`, `USE SCHEMA`, and `SELECT` grants.

In [None]:
# TODO: Grant USE CATALOG to analysts
spark.sql(f"GRANT ________ ON CATALOG {CATALOG} TO `analysts`")

# TODO: Grant USE SCHEMA to analysts
spark.sql(f"GRANT ________ ON SCHEMA {CATALOG}.{SILVER_SCHEMA} TO `analysts`")

# TODO: Grant SELECT on schema to analysts
spark.sql(f"GRANT ________ ON SCHEMA {CATALOG}.{SILVER_SCHEMA} TO `analysts`")

In [None]:
# Verification
grants_df = spark.sql(f"SHOW GRANTS ON SCHEMA {CATALOG}.{SILVER_SCHEMA}")
grants_df.display()

assert grants_df.filter("Principal = 'analysts'").count() >= 2, "Grants not found for analysts group"
print("Task 1 PASSED")

---
## Task 2: Query INFORMATION_SCHEMA

List all tables in your catalog using `INFORMATION_SCHEMA.TABLES`.

Hint: Filter by `table_schema` to see only Silver tables.

In [None]:
# TODO: Query INFORMATION_SCHEMA.TABLES for Silver schema
tables_df = spark.sql(f"""
    SELECT table_schema, table_name, table_type
    FROM {CATALOG}.INFORMATION_SCHEMA.________
    WHERE table_schema = '{SILVER_SCHEMA}'
""")
tables_df.display()

In [None]:
# Verification
assert tables_df.count() > 0, "No tables found in Silver schema"
print(f"Found {tables_df.count()} tables in Silver schema")
print("Task 2 PASSED")

---
## Task 3: Create a Column Mask Function

Create a SQL function that masks email addresses. Non-admin users should see only the first 2 characters + `***@***.***`.

Hint: Use `is_account_group_member('admins')` and `CONCAT(LEFT(email, 2), '***@***.***')`.

In [None]:
# TODO: Create masking function
spark.sql(f"""
    CREATE OR REPLACE FUNCTION {CATALOG}.{SILVER_SCHEMA}.mask_email(email STRING)
    RETURNS STRING
    RETURN CASE
        WHEN ________(________) THEN email
        ELSE CONCAT(LEFT(email, 2), '***@***.***')
    END
""")
print("Masking function created")

In [None]:
# Verification -- test the function
test_df = spark.sql(f"SELECT {CATALOG}.{SILVER_SCHEMA}.mask_email('john.doe@example.com') AS masked")
test_df.display()
print("Task 3 PASSED")

---
## Task 4: Apply Column Mask to Table

Apply the `mask_email` function to the `email` column of the `customers` table.

Hint: `ALTER TABLE ... ALTER COLUMN ... SET MASK ...`

In [None]:
# TODO: Apply column mask
spark.sql(f"""
    ALTER TABLE {CATALOG}.{SILVER_SCHEMA}.customers
    ALTER COLUMN email SET ________ {CATALOG}.{SILVER_SCHEMA}.________
""")
print("Column mask applied to email column")

In [None]:
# Verification -- query the table
masked_df = spark.sql(f"SELECT customer_id, email FROM {CATALOG}.{SILVER_SCHEMA}.customers LIMIT 5")
masked_df.display()
print("Task 4 PASSED -- check if email is masked for non-admin users")

---
## Task 5: Create a Row Filter Function

Create a function that restricts visibility of orders by `store_region`. Only users in the matching group can see rows for that region. Admins see all.

Hint: Use `is_account_group_member()` with multiple OR conditions.

In [None]:
# TODO: Create row filter function
spark.sql(f"""
    CREATE OR REPLACE FUNCTION {CATALOG}.{SILVER_SCHEMA}.region_filter(region STRING)
    RETURNS BOOLEAN
    RETURN (
        is_account_group_member('admins')
        OR (is_account_group_member('________') AND region = '________')
        OR (is_account_group_member('________') AND region = '________')
    )
""")
print("Row filter function created")

---
## Task 6: Apply Row Filter to Table

Apply the `region_filter` function to the `orders` table.

Hint: `ALTER TABLE ... SET ROW FILTER ... ON (column)`

In [None]:
# TODO: Apply row filter
spark.sql(f"""
    ALTER TABLE {CATALOG}.{SILVER_SCHEMA}.orders
    SET ________ {CATALOG}.{SILVER_SCHEMA}.region_filter ON (________)
""")
print("Row filter applied to orders table")

In [None]:
# Verification -- check row count (admins should see all rows)
filtered_count = spark.sql(f"SELECT COUNT(*) AS cnt FROM {CATALOG}.{SILVER_SCHEMA}.orders").first()["cnt"]
print(f"Visible rows: {filtered_count}")
print("Task 6 PASSED")

---
## Task 7: Query Table Privileges

Use `INFORMATION_SCHEMA.TABLE_PRIVILEGES` to verify who has access to what.

In [None]:
# TODO: Query table privileges
privs_df = spark.sql(f"""
    SELECT grantor, grantee, table_schema, table_name, privilege_type
    FROM {CATALOG}.INFORMATION_SCHEMA.________
    ORDER BY grantee, table_name
""")
privs_df.display()

In [None]:
# Verification
assert privs_df.count() > 0, "No privileges found"
print(f"Found {privs_df.count()} privilege entries")
print("Task 7 PASSED")

---
## Task 8: Remove Mask and Filter (Cleanup)

Remove the column mask and row filter applied earlier.

Hint: `ALTER TABLE ... ALTER COLUMN ... DROP MASK` and `ALTER TABLE ... DROP ROW FILTER`

In [None]:
# TODO: Remove column mask from email
spark.sql(f"""
    ALTER TABLE {CATALOG}.{SILVER_SCHEMA}.customers
    ALTER COLUMN email ________ ________
""")

# TODO: Remove row filter from orders
spark.sql(f"""
    ALTER TABLE {CATALOG}.{SILVER_SCHEMA}.orders
    ________ ________ ________
""")

print("Cleanup complete -- mask and filter removed")

In [None]:
# Final verification
clean_df = spark.sql(f"SELECT customer_id, email FROM {CATALOG}.{SILVER_SCHEMA}.customers LIMIT 3")
clean_df.display()
print("Task 8 PASSED -- all governance controls removed")

---
## Summary

| Task | Concept | Key SQL |
|------|---------|--------|
| 1 | Privileges | `GRANT SELECT ON SCHEMA ... TO ...` |
| 2 | Metadata | `INFORMATION_SCHEMA.TABLES` |
| 3-4 | Column Mask | `ALTER TABLE ... ALTER COLUMN ... SET MASK fn` |
| 5-6 | Row Filter | `ALTER TABLE ... SET ROW FILTER fn ON (col)` |
| 7 | Audit | `INFORMATION_SCHEMA.TABLE_PRIVILEGES` |
| 8 | Cleanup | `DROP MASK`, `DROP ROW FILTER` |