# Workshop 4: Unity Catalog Governance

## The Story

A security audit revealed that too many users have access to sensitive customer data.
Your task is to secure the `customers_silver` table using Unity Catalog.
You need to implement Row-Level Security (RLS) to ensure analysts can only see data from their own country.

**Your Mission:**
1. Audit current permissions.
2. Grant specific permissions to a group.
3. Create a Row Filter (RLS) to restrict access.
4. Create a Data Mask to hide PII (Personally Identifiable Information).

**Time:** 30 minutes


In [None]:
%run ../00_setup
from pyspark.sql.functions import concat_ws, col

# --- INDEPENDENT SETUP ---
# Ensure source table exists
table_name = f"{catalog}.{SILVER_SCHEMA}.customers_silver"
source_file = f"{volume_path}/workshop/main/Customers.csv"

try:
    spark.table(table_name)
    print(f"Table {table_name} exists.")
except:
    print(f"Table {table_name} not found. Recreating from {source_file}...")
    
    try:
        # Read raw data
        df = spark.read.option("header", True).option("inferSchema", True).csv(source_file)
        
        # Apply basic transformations
        df_clean = df.withColumn("FullName", concat_ws(" ", col("FirstName"), col("LastName")))
        
        # Save as Delta
        df_clean.write.format("delta").mode("overwrite").saveAsTable(table_name)
        print("Created table for security workshop.")
    except Exception as e:
        print(f"Error creating table: {e}. Please ensure {source_file} exists.")

print(f"Working with table: {table_name}")

## Step 1: Audit Permissions

### Task 1.1: Check current grants

See who has access to the table.

**Hint:**
```sql
SHOW GRANTS ON TABLE catalog.schema.table_name
```


In [None]:
# TODO: Show grants
# display(spark.sql(f"SHOW GRANTS ON TABLE {table_name}"))


## Step 2: Row-Level Security (Row Filter)

We want to restrict access so that users can only see rows where `Country = 'Country_0'`.

### Task 2.1: Create a Row Filter Function

Create a SQL function that returns TRUE if the user is allowed to see the row.

**Hint:**
```sql
CREATE OR REPLACE FUNCTION filter_country(country STRING)
RETURN IF(is_account_group_member('admin'), true, country = 'Country_0')
```


In [None]:
# TODO: Create row filter function
# spark.sql(f"CREATE OR REPLACE FUNCTION {catalog}.{SILVER_SCHEMA}.filter_country ...")

### Task 2.2: Apply the Filter to the Table

**Hint:**
```sql
ALTER TABLE catalog.schema.table_name SET ROW FILTER catalog.schema.filter_country ON (Country)
```

## Step 3: Column Masking (Dynamic Data Masking)

We need to hide the `Email` column for non-admins.

### Task 3.1: Create a Masking Function

**Hint:**
```sql
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURN CASE WHEN is_account_group_member('admin') THEN email ELSE '***@***.com' END
```


In [None]:
# TODO: Apply row filter
# spark.sql(f"ALTER TABLE {table_name} SET ROW FILTER ...")

# TODO: Create masking function
# spark.sql(f"CREATE OR REPLACE FUNCTION {catalog}.{SILVER_SCHEMA}.mask_email ...")

# TODO: Apply column mask
# spark.sql(f"ALTER TABLE {table_name} ALTER COLUMN Email SET MASK ...")

## Step 4: Verification

Query the table to see if the filter and mask are working.
(Note: If you are an admin, you might still see everything unless you test with a different user or logic).


In [None]:
# Verify results
display(spark.table(table_name))


# Solution

The complete code is below.


In [None]:
# ============================================================
# FULL SOLUTION - Workshop 4: Unity Catalog Governance
# ============================================================

table_name = f"{catalog}.{SILVER_SCHEMA}.customers_silver"

# --- Step 1: Audit ---
print("CURRENT GRANTS:")
display(spark.sql(f"SHOW GRANTS ON TABLE {table_name}"))

# --- Step 2: Row Filter ---
# 1. Create Function
spark.sql(f"""
CREATE OR REPLACE FUNCTION {catalog}.{SILVER_SCHEMA}.filter_country(country STRING)
RETURN IF(is_account_group_member('admin'), true, country = 'Country_0')
""")

# 2. Apply Filter
spark.sql(f"ALTER TABLE {table_name} SET ROW FILTER {catalog}.{SILVER_SCHEMA}.filter_country ON (Country)")
print("Row Filter applied.")

# --- Step 3: Column Mask ---
# 1. Create Function
spark.sql(f"""
CREATE OR REPLACE FUNCTION {catalog}.{SILVER_SCHEMA}.mask_email(email STRING)
RETURN CASE WHEN is_account_group_member('admin') THEN email ELSE '***@***.com' END
""")

# 2. Apply Mask
spark.sql(f"ALTER TABLE {table_name} ALTER COLUMN Email SET MASK {catalog}.{SILVER_SCHEMA}.mask_email")
print("Column Mask applied.")

# --- Verification ---
print("\nVERIFICATION (As Admin):")
display(spark.table(table_name))