# 🎯 Healthcare ABAC Demo - Step 1: Create Masking + Row Filtering Functions

## 📋 Overview
This notebook creates masking and filtering functions for the Healthcare industry ABAC (Attribute-Based Access Control) demo.

### What are Masking and Filtering Functions?
Masking functions are SQL user-defined functions (UDFs) that transform sensitive data to protect privacy while maintaining data utility for analytics. Row filters let you control which rows a user can access in a table based on custom logic. Masking and filtering functions are the foundation of ABAC policies in Unity Catalog.

### Why Use Masking and Filtering Functions?
- **Compliance**: Meet GDPR, CCPA, HIPAA, and other privacy regulations
- **Security**: Protect sensitive data from unauthorized access
- **Flexibility**: Apply different masks based on user roles and attributes
- **Analytics**: Preserve data utility for analysis while protecting privacy
- **Audit**: Track and log all data access patterns

### What This Notebook Creates
This notebook will create specialized functions for the Healthcare industry, including:
- **Identity Protection**: Email, phone, address masking
- **Financial Data**: Credit card, transaction amount bucketing
- **Identifiers**: Deterministic hashing for cross-table analytics
- **Confidential Data**: Complete redaction of sensitive fields
- **Network Data**: IP address masking

## 🎓 How to Use This Notebook
1. **Update Configuration**: Change the catalog name in the configuration cell below
2. **Run All Cells**: Execute cells sequentially (Shift+Enter or Run All)
3. **Proceed to Next Step**: Continue to notebook 2 to create the schema

## ⚙️ Prerequisites
- ✅ Unity Catalog enabled workspace
- ✅ CREATE FUNCTION permission in the target catalog
- ✅ SQL Warehouse or Cluster attached to this notebook
- ✅ Account admin or catalog owner role (recommended)

---


## ⚙️ Configuration

### 🚨 IMPORTANT: Update Before Running!
Change `your_catalog_name` to **your catalog name** in the cell below or update the config.yaml.

### What This Does:
- Sets the target Unity Catalog
- Creates the `healthcare` schema if it doesn't exist

In [0]:
pip install pyyaml

In [0]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'healthcare'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Set catalog and schema to use for the cells below
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {SCHEMA}")
spark.sql(f"USE SCHEMA {SCHEMA}")


In [0]:
%sql
SELECT '🎯 Target: ' || current_catalog() || '.' || current_schema() AS status;

## MASKING FUNCTIONS FOR HEALTHCARE DATA


=============================================
1. PARTIAL STRING MASKING FUNCTION

    Purpose: Show only first and last characters with middle masked

    Usage: Patient names, addresses for partial visibility

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_string_partial(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: Partial string masking showing first and last characters'
RETURN CASE 
    WHEN input IS NULL OR input = '' THEN input
    WHEN LENGTH(input) <= 2 THEN REPEAT('*', LENGTH(input))
    WHEN LENGTH(input) = 3 THEN CONCAT(LEFT(input, 1), '*', RIGHT(input, 1))
    ELSE CONCAT(LEFT(input, 1), REPEAT('*', LENGTH(input) - 2), RIGHT(input, 1))
END;

=============================================

2. EMAIL MASKING FUNCTION

    Purpose: Mask email addresses while preserving domain structure

    Usage: Patient and provider email addresses

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask email local part while preserving domain'
RETURN CASE 
    WHEN email IS NULL OR email = '' THEN email
    WHEN LOCATE('@', email) > 0 THEN 
        CONCAT('****', SUBSTRING(email, LOCATE('@', email)))
    ELSE '****'
END;

=============================================

3. PHONE NUMBER MASKING FUNCTION

    Purpose: Mask phone numbers while preserving format

    Usage: Patient and emergency contact phone numbers

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_phone(phone STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask phone numbers while preserving format structure'
RETURN CASE 
    WHEN phone IS NULL OR phone = '' THEN phone
    WHEN LENGTH(phone) >= 4 THEN 
        CONCAT(REPEAT('X', LENGTH(phone) - 4), RIGHT(phone, 4))
    ELSE REPEAT('X', LENGTH(phone))
END;

=============================================

4. ONE-WAY STRING MASKING FUNCTION

    Purpose: Hash string values using SHA-256 for irreversible masking

    Usage: Patient names, addresses, email domains for anonymization

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_string_hash(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: One-way hash masking using SHA-256 for complete anonymization'
RETURN sha2(input, 256);

=============================================

5. DATE MASKING FUNCTION (YEAR ONLY)

    Purpose: Mask date to show only year for age calculation

    Usage: Date of birth masking while preserving year for demographics

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_date_year_only(input_date DATE)
RETURNS DATE
COMMENT 'ABAC utility: Mask date to show only year (January 1st of same year)'
RETURN CASE 
    WHEN input_date IS NULL THEN NULL
    ELSE DATE(CONCAT(YEAR(input_date), '-01-01'))
END;

=============================================

6. ADDRESS MASKING FUNCTION

    Purpose: Mask street address while preserving city/state

    Usage: Patient addresses for geographic analysis without full PII

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_address_city_state(address STRING, city STRING, state STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask street address, show only city and state'
RETURN CASE 
    WHEN city IS NULL AND state IS NULL THEN '***'
    WHEN city IS NULL THEN state
    WHEN state IS NULL THEN city
    ELSE CONCAT(city, ', ', state)
END;

=============================================

7. COMPLETE MASKING FUNCTION

    Purpose: Completely mask any sensitive numeric column by returning NULL

    Usage: PatientID, VisitID, or any numeric identifier that should be hidden

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_for_all_roles(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Completely mask numeric values by returning NULL'
RETURN NULL;

=============================================

8. FAST DETERMINISTIC MULTIPLIER HELPER FUNCTION

    Purpose: Generate consistent multiplier for referential masking

    Usage: Helper function for mask_decimal_referential

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION fast_deterministic_multiplier(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Generate deterministic multiplier for consistent masking'
RETURN 1 + MOD(CRC32(CAST(CAST(id AS STRING) AS BINARY)), 1000) * 0.001;

=============================================

9. DETERMINISTIC NUMERIC MASKING WITH REFERENTIAL INTEGRITY

    Purpose: Mask numeric values while preserving referential relationships

    Usage: Transform PatientID, ProviderID while maintaining join relationships

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_decimal_referential(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Mask numeric values while preserving referential integrity'
RETURN id * fast_deterministic_multiplier(id);

=============================================

10. POLICY NUMBER MASKING (Last 4 digits)

    Purpose: Mask policy numbers showing last 4

    Usage: Policy verification without full exposure

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_policy_number_last4(policy STRING) 
RETURNS STRING
COMMENT 'ABAC utility: Mask policy number showing last 4 digits'
RETURN CASE 
  WHEN policy IS NULL THEN policy 
  ELSE CONCAT('****', RIGHT(policy, 4)) 
END;

## ROW FILTER FUNCTIONS

=============================================

11. TIME-BASED FILTER: BUSINESS HOURS

    Purpose: Allow data access only during business hours (9 AM - 6 PM Melbourne time)

    Usage: Time-based access control for sensitive healthcare operations

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION business_hours_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access only during business hours (9 AM - 6 PM Melbourne time)'
RETURN hour(from_utc_timestamp(current_timestamp(), 'Australia/Melbourne')) BETWEEN 9 AND 18;

=============================================

12. NO ROWS FILTER

    Purpose: Returns FALSE to filter out all rows (complete data hiding)

    Usage: Row-level security to hide all data from unauthorized users

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION no_rows()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Returns FALSE to filter out all rows for complete data hiding'
RETURN FALSE;

=============================================

13. REGIONAL ACCESS FILTER

      Purpose: Restrict data access by geographic region

      Usage: Regional compliance

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_by_region(region STRING)
RETURNS BOOLEAN
COMMENT 'ABAC utility: Filter data by user region'
RETURN region = 'North';

## Verify all functions created

In [0]:
%sql
SHOW FUNCTIONS LIKE 'mask_*';

In [0]:
%sql
SHOW FUNCTIONS LIKE 'filter_*';

## ✅ Success!

All Healthcare functions have been created successfully!

### What You Just Created:
- ✅ Masking + filtering functions registered in Unity Catalog
- ✅ Functions available for use in SQL queries
- ✅ Foundation for ABAC policies ready

### Verify Your Functions:
You can verify the functions were created by running:
```sql
SHOW FUNCTIONS IN healthcare;
```

### 🎯 Next Step:
Continue to **`2_Create_Tables.ipynb`** to create the database tables and load sample data.

---
**Note**: These functions are stored in Unity Catalog and can be used across multiple notebooks and queries.
