# 🎯 Insurance ABAC Demo - Step 1: Create Masking + Row Filtering Functions

## 📋 Overview
This notebook creates functions needed to apply column masks and row filters for the Insurance industry ABAC (Attribute-Based Access Control) demo.

### What are Column Masks + Row Filters?
Column masks control what values users see in specific columns, depending on who they are. Row filters let you control which rows a user can access in a table based on custom logic.

### Why Use Column Masking and Row Filtering?
- **Compliance**: Meet GDPR, CCPA, HIPAA, and other privacy regulations
- **Security**: Protect sensitive data from unauthorized access
- **Flexibility**: Apply different masks based on user roles and attributes
- **Analytics**: Preserve data utility for analysis while protecting privacy
- **Audit**: Track and log all data access patterns

### What This Notebook Creates
This notebook will create specialized functions for the Insurance industry, including:
- **Identity Protection**: Email, phone, address masking
- **Identifiers**: Deterministic hashing for cross-table analytics
- **Confidential Data**: Complete redaction of sensitive fields
- **Network Data**: IP address masking

## 🎓 How to Use This Notebook
1. **Update Configuration**: Change the catalog name in the configuration cell below
2. **Run All Cells**: Execute cells sequentially (Shift+Enter or Run All)
3. **Proceed to Next Step**: Continue to notebook 2 to create the schema

## ⚙️ Prerequisites
- ✅ Unity Catalog enabled workspace
- ✅ CREATE FUNCTION permission in the target catalog
- ✅ SQL Warehouse or Cluster attached to this notebook
- ✅ Account admin or catalog owner role (recommended)

---


## ⚙️ Configuration

### 🚨 IMPORTANT: Update Before Running!
Change `your_catalog_name` to **your catalog name** in the cell below or update the config.yaml.

### What This Does:
- Sets the target Unity Catalog
- Creates the `insurance` schema if it doesn't exist

In [0]:
pip install pyyaml

In [0]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'insurance'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Set catalog and schema to use for the cells below
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {SCHEMA}")
spark.sql(f"USE SCHEMA {SCHEMA}")


In [0]:
%sql
SELECT '🎯 Target: ' || current_catalog() || '.' || current_schema() AS status;

## MASKING FUNCTIONS FOR INSURANCE DATA


=============================================

1. SSN MASKING (Last 4 digits)

    Purpose: Mask SSN showing last 4 for identity verification

    Usage: Customer identification

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_ssn_last4(ssn STRING) 
RETURNS STRING
COMMENT 'ABAC utility: Mask SSN showing last 4 digits (XXX-XX-1234)'
RETURN CASE 
  WHEN ssn IS NULL THEN ssn 
  ELSE CONCAT('XXX-XX-', RIGHT(REPLACE(ssn, '-', ''), 4)) 
END;

=============================================

2. POLICY NUMBER MASKING (Last 4 digits)

    Purpose: Mask policy numbers showing last 4

    Usage: Policy verification without full exposure

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_policy_number_last4(policy STRING) 
RETURNS STRING
COMMENT 'ABAC utility: Mask policy number showing last 4 digits'
RETURN CASE 
  WHEN policy IS NULL THEN policy 
  ELSE CONCAT('****', RIGHT(policy, 4)) 
END;

=============================================

3. AMOUNT BUCKETING (Claim amounts)

    Purpose: Group claim amounts into ranges

    Usage: Analytics without exact amounts

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_claim_amount_bucket(amt DECIMAL(12,2))
RETURNS STRING
COMMENT 'ABAC utility: Bucket claim amounts into ranges' 
RETURN CASE 
  WHEN amt IS NULL THEN 'Unknown' 
  WHEN amt < 1000 THEN '\$0-\$1K'
  WHEN amt < 5000 THEN '\$1K-\$5K' 
  WHEN amt < 10000 THEN '\$5K-\$10K' 
  ELSE '\$10K+' 
END;

=============================================

4. POLICY HOLDER ID HASHING

    Purpose: Hash policy holder IDs for privacy

    Usage: Policy holder tracking without exposure

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_policyholder_id_hash(id STRING) 
RETURNS STRING
COMMENT 'ABAC utility: Deterministic policy holder ID masking for joins'
RETURN CONCAT('PH_', SUBSTRING(SHA2(id, 256), 1, 12));

=============================================

5. EMAIL MASKING

    Purpose: Mask email addresses for privacy

    Usage: Customer contact information

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask email local part'
RETURN CASE 
  WHEN email IS NULL OR email = '' THEN email
  WHEN email NOT LIKE '%@%' THEN '****'
  ELSE CONCAT('****@', SPLIT(email, '@')[1])
END;

=============================================

6. PHONE MASKING

    Purpose: Mask phone numbers

    Usage: Customer contact information

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_phone(phone STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask phone number showing last 4 digits'
RETURN CASE 
  WHEN phone IS NULL OR phone = '' THEN phone
  WHEN LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '')) < 4 THEN 'XXXX'
  ELSE CONCAT('XXXX', RIGHT(REGEXP_REPLACE(phone, '[^0-9]', ''), 4))
END;

## ROW FILTER FUNCTIONS

=============================================

7. BUSINESS HOURS FILTER

    Purpose: Restrict access to business hours (9AM-5PM EST)

    Usage: Compliance with data access policies

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_business_hours()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access only during business hours'
RETURN HOUR(CURRENT_TIMESTAMP()) BETWEEN 14 AND 22; -- adjusted for UTC

=============================================

8. HIGH VALUE CLAIMS FILTER

    Purpose: Filter claims above certain threshold

    Usage: Junior analysts see only regular claims

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_high_value_claims(amount DECIMAL(12,2))
RETURNS BOOLEAN
COMMENT 'ABAC utility: Filter out high-value claims'
RETURN amount > 10000; 

## Verify all functions created

In [0]:
%sql
SHOW FUNCTIONS LIKE 'mask_*';

In [0]:
%sql
SHOW FUNCTIONS LIKE 'filter_*';

## ✅ Success!

All Insurance functions have been created successfully!

### What You Just Created:
- ✅ Masking + filtering functions registered in Unity Catalog
- ✅ Functions available for use in SQL queries
- ✅ Foundation for ABAC policies ready

### 🎯 Next Step:
Continue to **`2_Create_Tables.ipynb`** to create the database tables and load sample data.

---
**Note**: These functions are stored in Unity Catalog and can be used across multiple notebooks and queries.
