# 🎯 Finance ABAC Demo - Step 1: Create Masking + Row Filtering Functions

## 📋 Overview
This notebook creates functions needed to apply **column masks** and **row filters** for the Finance industry ABAC (Attribute-Based Access Control) demo.

### What are Column Masks + Row Filters?
Column masks control what values users see in specific columns, depending on who they are. Row filters let you control which rows a user can access in a table based on custom logic.

### Why Use Column Masking and Row Filtering?
- **Compliance**: Meet GDPR, CCPA, HIPAA, and other privacy regulations
- **Security**: Protect sensitive data from unauthorized access
- **Flexibility**: Apply different masks based on user roles and attributes
- **Analytics**: Preserve data utility for analysis while protecting privacy
- **Audit**: Track and log all data access patterns

### What This Notebook Creates
This notebook will create specialized functions for key uses cases in the Finance industry, including:
- **Identity Protection**: Email, phone, address masking
- **Financial Data**: Credit card, transaction amount bucketing
- **Identifiers**: Deterministic hashing for cross-table analytics
- **Confidential Data**: Complete redaction of sensitive fields
- **Network Data**: IP address masking

## 🎓 How to Use This Notebook
1. **Update Configuration**: Change the catalog name in the configuration cell below
2. **Run All Cells**: Execute cells sequentially (Shift+Enter or Run All)
3. **Proceed to Next Step**: Continue to notebook 2 to create the schema

## ⚙️ Prerequisites
- ✅ Unity Catalog enabled workspace
- ✅ CREATE FUNCTION permission in the target catalog
- ✅ SQL Warehouse or Cluster attached to this notebook
- ✅ Account admin or catalog owner role (recommended)

---


## ⚙️ Configuration

### 🚨 IMPORTANT: Update Before Running!
Change `your_catalog_name` to **your catalog name** in the cell below or update the config.yaml.

### What This Does:
- Sets the target Unity Catalog
- Creates the `finance` schema if it doesn't exist


In [0]:
pip install pyyaml

In [0]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'finance'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Set catalog and schema to use for the cells below
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {SCHEMA}")
spark.sql(f"USE SCHEMA {SCHEMA}")

In [0]:
%sql
SELECT '🎯 Target: ' || current_catalog() || '.' || current_schema() AS status;

## MASKING FUNCTIONS FOR FINANCIAL DATA

 Domain: Banking, Payments, Credit, Compliance


=============================================
1. CREDIT CARD MASKING (Last 4 digits)

    Purpose: Show only last 4 digits of credit card (PCI-DSS compliance)

    Usage: Credit card numbers for verification

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_credit_card(card_number STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask credit card showing last 4 digits (PCI-DSS compliant)'
RETURN CASE 
  WHEN card_number IS NULL OR card_number = '' THEN card_number
  WHEN LENGTH(card_number) < 4 THEN REPEAT('*', LENGTH(card_number))
  ELSE CONCAT('****-****-****-', RIGHT(card_number, 4))
END;

=============================================

2. ACCOUNT NUMBER MASKING (Last 4 digits)

    Purpose: Mask bank account numbers showing last 4

    Usage: Account verification without full exposure

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_account_last4(account_number STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask account number showing last 4 digits'
RETURN CASE 
  WHEN account_number IS NULL OR account_number = '' THEN account_number
  WHEN LENGTH(account_number) < 4 THEN REPEAT('*', LENGTH(account_number))
  ELSE CONCAT('****', RIGHT(account_number, 4))
END;

=============================================

3. SSN MASKING (Last 4 digits)

    Purpose: Mask SSN showing last 4 for identity verification

    Usage: Customer identification

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_ssn_last4(ssn STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask SSN showing last 4 digits (XXX-XX-1234)'
RETURN CASE 
  WHEN ssn IS NULL OR ssn = '' THEN ssn
  WHEN LENGTH(ssn) < 4 THEN REPEAT('*', LENGTH(ssn))
  ELSE CONCAT('XXX-XX-', RIGHT(REPLACE(ssn, '-', ''), 4))
END;

=============================================

4. AMOUNT BUCKETING (Transaction amounts)

    Purpose: Group transaction amounts into ranges

    Usage: Analytics without exact amounts

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_amount_bucket(amount DECIMAL(12,2))
RETURNS STRING
COMMENT 'ABAC utility: Bucket transaction amounts into ranges'
RETURN CASE
  WHEN amount IS NULL THEN 'Unknown'
  WHEN amount < 0 THEN 'Negative'
  WHEN amount = 0 THEN '\$0'
  WHEN amount < 100 THEN '\$0-\$100'
  WHEN amount < 500 THEN '\$100-\$500'
  WHEN amount < 1000 THEN '\$500-\$1K'
  WHEN amount < 5000 THEN '\$1K-\$5K'
  WHEN amount < 10000 THEN '\$5K-\$10K'
  WHEN amount < 50000 THEN '\$10K-\$50K'
  WHEN amount < 100000 THEN '\$50K-\$100K'
  ELSE '\$100K+'
END;

=============================================

5. ROUTING NUMBER MASKING

    Purpose: Mask routing numbers for bank transfers

    Usage: Payment processing

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_routing_number(routing STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask routing number showing last 3 digits'
RETURN CASE 
  WHEN routing IS NULL OR routing = '' THEN routing
  WHEN LENGTH(routing) < 3 THEN REPEAT('*', LENGTH(routing))
  ELSE CONCAT('XXXXX', RIGHT(routing, 3))
END;

=============================================

6. EMAIL MASKING

    Purpose: Mask email addresses for privacy

    Usage: Customer contact information

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask email local part'
RETURN CASE 
  WHEN email IS NULL OR email = '' THEN email
  WHEN email NOT LIKE '%@%' THEN '****'
  ELSE CONCAT('****@', SPLIT(email, '@')[1])
END;

=============================================

7. PHONE MASKING

    Purpose: Mask phone numbers

    Usage: Customer contact information

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_phone(phone STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask phone number showing last 4 digits'
RETURN CASE 
  WHEN phone IS NULL OR phone = '' THEN phone
  WHEN LENGTH(REGEXP_REPLACE(phone, '[^0-9]', '')) < 4 THEN 'XXXX'
  ELSE CONCAT('XXXX', RIGHT(REGEXP_REPLACE(phone, '[^0-9]', ''), 4))
END;

=============================================

8. TRANSACTION ID HASHING

    Purpose: Hash transaction IDs for privacy

    Usage: Transaction tracking without exposure

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_transaction_hash(txn_id STRING)
RETURNS STRING
COMMENT 'ABAC utility: Hash transaction ID using SHA-256'
RETURN CASE 
  WHEN txn_id IS NULL OR txn_id = '' THEN txn_id
  ELSE CONCAT('TXN_', SUBSTRING(SHA2(txn_id, 256), 1, 16))
END;

=============================================

9. CUSTOMER ID DETERMINISTIC MASKING

    Purpose: Deterministic masking for cross-table joins

    Usage: Customer analytics while preserving privacy

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_customer_id_deterministic(customer_id STRING)
RETURNS STRING
COMMENT 'ABAC utility: Deterministic customer ID masking for joins'
RETURN CASE 
  WHEN customer_id IS NULL OR customer_id = '' THEN customer_id
  ELSE CONCAT('CUST_', SUBSTRING(SHA2(customer_id, 256), 1, 12))
END;

=============================================

10. IP ADDRESS MASKING (Last octet)

    Purpose: Mask IP addresses showing network portion only

    Usage: Fraud detection without full IP exposure

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_ip_address(ip STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask last octet of IP address'
RETURN CASE 
  WHEN ip IS NULL OR ip = '' THEN ip
  WHEN ip NOT LIKE '%.%.%.%' THEN '***'
  ELSE CONCAT(
    SPLIT(ip, '\\.')[0], '.',
    SPLIT(ip, '\\.')[1], '.',
    SPLIT(ip, '\\.')[2], '.***'
  )
END;

=============================================

11. ADDRESS PARTIAL MASKING

    Purpose: Show only city/state, hide street address

    Usage: Geographic analysis without full address

=============================================


Assume format: "Street, City, State ZIP"


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_address_city_state(full_address STRING)
RETURNS STRING
COMMENT 'ABAC utility: Extract city/state from full address'
RETURN CASE 
  WHEN full_address IS NULL OR full_address = '' THEN full_address
  WHEN full_address LIKE '%,%' THEN CONCAT(
      COALESCE(SPLIT(full_address, ',')[1], '***'), ', ',
      COALESCE(SUBSTRING(SPLIT(full_address, ',')[2], 1, 2), '**')
    )
  ELSE '[REDACTED]'
END;

=============================================

12. SALARY/INCOME BUCKETING

    Purpose: Group salary/income into ranges

    Usage: Credit analysis without exact figures

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_income_bracket(income DECIMAL(18,2))
RETURNS STRING
COMMENT 'ABAC utility: Bucket income into ranges'
RETURN CASE
  WHEN income IS NULL THEN 'Unknown'
  WHEN income < 0 THEN 'Invalid'
  WHEN income < 25000 THEN 'Under \$25K'
  WHEN income < 50000 THEN '\$25K-\$50K'
  WHEN income < 75000 THEN '\$50K-\$75K'
  WHEN income < 100000 THEN '\$75K-\$100K'
  WHEN income < 150000 THEN '\$100K-\$150K'
  WHEN income < 250000 THEN '\$150K-\$250K'
  ELSE '\$250K+'
END;


ROW FILTER FUNCTIONS
=============================================
=============================================

13. BUSINESS HOURS FILTER

    Purpose: Restrict access to business hours (9AM-5PM EST)

    Usage: Compliance with data access policies

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_business_hours()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access only during business hours'
RETURN HOUR(CURRENT_TIMESTAMP()) BETWEEN 14 AND 22; -- adjusted for UTC

=============================================

14. HIGH VALUE TRANSACTION FILTER

    Purpose: Filter transactions above certain threshold

    Usage: Junior analysts see only regular transactions

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_high_value_transactions(amount DECIMAL(12,2))
RETURNS BOOLEAN
COMMENT 'ABAC utility: Filter out high-value transactions'
RETURN amount > 5000; 

=============================================

15. FRAUD INVESTIGATION FILTER

    Purpose: Show only flagged/suspicious transactions

    Usage: Fraud team access

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_fraud_flagged_only(fraud_flag BOOLEAN)
RETURNS BOOLEAN
COMMENT 'ABAC utility: Show only fraud-flagged records'
RETURN fraud_flag=True;

=============================================

16. REGIONAL ACCESS FILTER

    Purpose: Restrict data by geographic region

    Usage: Regional compliance (GDPR, state regulations)

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION filter_by_region(state STRING)
RETURNS BOOLEAN
COMMENT 'ABAC utility: Filter data by user region'
RETURN state = 'CA';

## Verify all functions created


In [0]:
%sql
SHOW FUNCTIONS LIKE 'mask_*';

In [0]:
%sql
SHOW FUNCTIONS LIKE 'filter_*';

## ✅ Success!

All Finance masking functions have been created successfully!

### What You Just Created:
- ✅ Masking + filtering functions registered in Unity Catalog
- ✅ Functions available for use in SQL queries
- ✅ Foundation for ABAC policies ready

### 🎯 Next Step:
Continue to **`2_Create_Tables.ipynb`** to create the tables and load sample data.

---
**Note**: These functions are stored in Unity Catalog and can be used across multiple notebooks and queries.
