# 🎯 Manufacturing ABAC Demo - Step 1: Create Masking + Filtering Functions

## 📋 Overview
This notebook creates masking and filtering functions for the Manufacturing industry ABAC (Attribute-Based Access Control) demo.

### What are Masking and Filtering Functions?
Masking functions are SQL user-defined functions (UDFs) that transform sensitive data to protect privacy while maintaining data utility for analytics. Row filters let you control which rows a user can access in a table based on custom logic. Masking and filtering functions are the foundation of ABAC policies in Unity Catalog.

### Why Use Masking and Filtering Functions?
- **Compliance**: Meet GDPR, CCPA, HIPAA, and other privacy regulations
- **Security**: Protect sensitive data from unauthorized access
- **Flexibility**: Apply different masks based on user roles and attributes
- **Analytics**: Preserve data utility for analysis while protecting privacy
- **Audit**: Track and log all data access patterns

### What This Notebook Creates
This notebook will create specialized masking functions for the Manufacturing industry, including:
- **Identity Protection**: Email, phone, address masking
- **Financial Data**: Credit card, transaction amount bucketing
- **Identifiers**: Deterministic hashing for cross-table analytics
- **Confidential Data**: Complete redaction of sensitive fields
- **Network Data**: IP address masking

## 🎓 How to Use This Notebook
1. **Update Configuration**: Change the catalog name in the configuration cell below
2. **Run All Cells**: Execute cells sequentially (Shift+Enter or Run All)
3. **Verify Success**: Check for ✅ success messages after each function
4. **Proceed to Next Step**: Continue to notebook 2 to create the schema

## ⚙️ Prerequisites
- ✅ Unity Catalog enabled workspace
- ✅ CREATE FUNCTION permission in the target catalog
- ✅ SQL Warehouse or Cluster attached to this notebook
- ✅ Account admin or catalog owner role (recommended)

## 🔄 Next Steps
After completing this notebook:
1. **Step 2**: `2_Create_Tables.ipynb` - Create schema and core tables
2. **Step 3**: `3_Setup_Tagging.ipynb` - Define and apply tags
3. **Step 4**: `4_Test_ABAC_Policies.ipynb` - Test functions through ABAC policies

---


## ⚙️ Configuration

### 🚨 IMPORTANT: Update Before Running!
Change `your_catalog_name` to **your catalog name** in the cell below or update the config.yaml.

### What This Does:
- Sets the target Unity Catalog
- Creates the `manufacturing` schema if it doesn't exist

In [0]:
pip install pyyaml

In [0]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'manufacturing'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Set catalog and schema to use for the cells below
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {SCHEMA}")
spark.sql(f"USE SCHEMA {SCHEMA}")

In [0]:
%sql
SELECT '🎯 Target: ' || current_catalog() || '.' || current_schema() AS status;

##COLUMN MASK FUNCTIONS

=============================================
1. PARTIAL STRING MASKING FUNCTION

    Purpose: Show only first and last characters with middle masked

    Usage: Names, addresses for partial visibility

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_string_partial(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: Partial string masking showing first and last characters'
RETURN CASE 
  WHEN input IS NULL OR input = '' THEN input
  WHEN LENGTH(input) <= 2 THEN REPEAT('*', LENGTH(input))
  WHEN LENGTH(input) = 3 THEN CONCAT(LEFT(input, 1), '*', RIGHT(input, 1))
  ELSE CONCAT(LEFT(input, 1), REPEAT('*', LENGTH(input) - 2), RIGHT(input, 1))
END;

=============================================

2. EMAIL MASKING FUNCTION

    Purpose: Mask email addresses while preserving domain structure

    Usage: Supplier email addresses

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask email local part while preserving domain'
RETURN CASE 
  WHEN email IS NULL OR email = '' THEN email
  WHEN LOCATE('@', email) > 0 THEN CONCAT('****', SUBSTRING(email, LOCATE('@', email)))
  ELSE '****'
END;

=============================================

3. PHONE NUMBER MASKING FUNCTION

    Purpose: Mask phone numbers while preserving format

    Usage: Supplier contact phone numbers

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_phone(phone STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask phone number leaving last 4 visible'
RETURN CASE 
  WHEN phone IS NULL OR phone = '' THEN phone
  WHEN LENGTH(phone) >= 4 THEN CONCAT(REPEAT('X', LENGTH(phone) - 4), RIGHT(phone, 4))
  ELSE REPEAT('X', LENGTH(phone))
END;

=============================================

4. ONE-WAY STRING MASKING FUNCTION

    Purpose: Hash string values using SHA-256 for irreversible masking

    Usage: Names, addresses, email domains for anonymization

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_string_hash(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: One-way SHA-256 hash for anonymization'
RETURN sha2(input, 256);

=============================================

5. FAST DETERMINISTIC MULTIPLIER HELPER FUNCTION

    Purpose: Generate consistent multiplier for referential masking

    Usage: Helper function for mask_decimal_referential

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION fast_deterministic_multiplier(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Deterministic multiplier helper (1.001..2.000)'
RETURN 1 + MOD(CRC32(CAST(CAST(id AS STRING) AS BINARY)), 1000) * 0.001;

=============================================

6. DETERMINISTIC NUMERIC MASKING WITH REFERENTIAL INTEGRITY

    Purpose: Mask numeric values while preserving referential relationships

    Usage: Transform IDs while maintaining join relationships

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_decimal_referential(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Deterministically mask numeric IDs while preserving referential integrity'
RETURN id * fast_deterministic_multiplier(id);

=============================================

7. SENSITIVE TEXT REDACTION FUNCTION

    Purpose: Replace input text with 'REDACTED_SPEC'

    Usage: Redact sensitive specification text

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_spec_text(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: Redact sensitive specification text'
RETURN CASE WHEN input IS NULL THEN NULL ELSE 'REDACTED_SPEC' END;

=============================================

8. CAD REFERENCES REDACTION FUNCTION

    Purpose: Hash values using SHA-256 for irreversible masking

    Usage: Redact CAD file references for non-admin users

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_cad_reference(uri STRING)
RETURNS STRING
COMMENT 'ABAC utility: Hash CAD/PLM file references for non-authorized roles'
RETURN CASE WHEN uri IS NULL THEN NULL ELSE sha2(uri, 256) END;

=============================================

9. COST BUCKETING

    Purpose: Group transaction amounts into buckets

    Usage: Analytics without exact amounts

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_cost_bucket(amount DECIMAL(18,2))
RETURNS STRING
COMMENT 'ABAC utility: Bucketize costs into Low/Medium/High'
RETURN CASE 
  WHEN amount IS NULL THEN NULL
  WHEN amount < 100 THEN 'Low'
  WHEN amount < 1000 THEN 'Medium'
  ELSE 'High'
END;

=============================================

10. SERIAL NUMBER MASKING FUNCTION

    Purpose: Mask serial numbers showing last 4

    Usage: Serial number verification without full exposure

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_serial_last4(serial STRING)
RETURNS STRING
COMMENT 'ABAC utility: Show only last 4 characters of serial'
RETURN CASE 
  WHEN serial IS NULL OR serial = '' THEN serial
  WHEN LENGTH(serial) >= 4 THEN CONCAT(REPEAT('X', GREATEST(LENGTH(serial) - 4, 0)), RIGHT(serial, 4))
  ELSE REPEAT('X', LENGTH(serial))
END;

=============================================

11. GPS PRECISION MASKING FUNCTION

    Purpose: Reduce GPS precison by rounding

    Usage: Analytics without exact coordinates

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_gps_precision(coords DOUBLE)
RETURNS STRING
COMMENT 'ABAC utility: Reduce GPS precision by rounding to 2 decimals'
RETURN CAST(ROUND(coords, 2) AS DOUBLE);

=============================================

12. TIMESTAMP MASKING FUNCTION

    Purpose: Round timestamps into 15-minute intervals

    Usage: Analytics without exact timestamps

=============================================

In [0]:
%sql
CREATE OR REPLACE FUNCTION mask_timestamp_15min(ts TIMESTAMP)
RETURNS TIMESTAMP
COMMENT 'ABAC utility: Floor timestamp to 15-minute intervals'
RETURN to_timestamp((unix_timestamp(ts) - (unix_timestamp(ts) % 900)));

##ROW FILTER FUNCTIONS

=============================================

13. TIME-BASED FILTER: BUSINESS HOURS

    Purpose: Allow data access only during business hours (8AM - 6PM Chicago time)

    Usage: Time-based access control for sensitive data

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION business_hours_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access only during business hours (8AM-6PM America/Chicago)'
RETURN hour(from_utc_timestamp(current_timestamp(), 'America/Chicago')) BETWEEN 8 AND 18;

=============================================

14. TIME-BASED FILTER: MAINTENANCE HOURS

    Purpose: Allow data access during maintenance night-shift window (10PM-6AM Chicago time)

    Usage: Time-based access control for sensitive data

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION maintenance_hours_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access during maintenance window (22:00-06:00 America/Chicago)'
RETURN (
  hour(from_utc_timestamp(current_timestamp(), 'America/Chicago')) >= 22
  OR hour(from_utc_timestamp(current_timestamp(), 'America/Chicago')) < 6
);

=============================================

15. NO ROWS FILTER

    Purpose: Returns FALSE to filter out all rows (complete data hiding)

    Usage: Row-level security to hide all data from unauthorized users

=============================================


In [0]:
%sql
CREATE OR REPLACE FUNCTION no_rows_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Returns FALSE to deny all rows'
RETURN FALSE;

## Verify all functions created

In [0]:
%sql
SHOW USER FUNCTIONS LIKE 'mask_*';

In [0]:
%sql
SHOW USER FUNCTIONS LIKE '*_filter';

## ✅ Success!

All Manufacturing functions have been created successfully!

### What You Just Created:
- ✅ Masking + filtering functions registered in Unity Catalog
- ✅ Functions available for use in SQL queries
- ✅ Foundation for ABAC policies ready

### Verify Your Functions:
You can verify the functions were created by running:
```sql
SHOW FUNCTIONS IN manufacturing;
```

### 🎯 Next Step:
Continue to **`2_Create_Tables.ipynb`** to create the test tables and load sample data.

---
**Note**: These functions are stored in Unity Catalog and can be used across multiple notebooks and queries.
