# 🎯 Manufacturing ABAC Demo - Step 1: Create Masking Functions

## 📋 Overview
This notebook creates **masking functions** for the Manufacturing industry ABAC (Attribute-Based Access Control) demo.

### What are Masking Functions?
Masking functions are SQL user-defined functions (UDFs) that transform sensitive data to protect privacy while maintaining data utility for analytics. They are the foundation of ABAC policies in Unity Catalog.

### Why Use Masking Functions?
- **Compliance**: Meet GDPR, CCPA, HIPAA, and other privacy regulations
- **Security**: Protect sensitive data from unauthorized access
- **Flexibility**: Apply different masks based on user roles and attributes
- **Analytics**: Preserve data utility for analysis while protecting privacy
- **Audit**: Track and log all data access patterns

### What This Notebook Creates
This notebook will create specialized masking functions for the Manufacturing industry, including:
- **Identity Protection**: Email, phone, address masking
- **Financial Data**: Credit card, transaction amount bucketing
- **Identifiers**: Deterministic hashing for cross-table analytics
- **Confidential Data**: Complete redaction of sensitive fields
- **Network Data**: IP address masking

## 🎓 How to Use This Notebook
1. **Update Configuration**: Change the catalog name in the configuration cell below
2. **Run All Cells**: Execute cells sequentially (Shift+Enter or Run All)
3. **Verify Success**: Check for ✅ success messages after each function
4. **Proceed to Next Step**: Continue to notebook 2 to create the schema

## ⚙️ Prerequisites
- ✅ Unity Catalog enabled workspace
- ✅ CREATE FUNCTION permission in the target catalog
- ✅ SQL Warehouse or Cluster attached to this notebook
- ✅ Account admin or catalog owner role (recommended)

## 🔄 Next Steps
After completing this notebook:
1. **Step 2**: `2_Create_Schema.ipynb` - Create database schema and core tables
2. **Step 3**: `3_Create_Extended_Tables.ipynb` - Add supplementary tables
3. **Step 4**: `4_Test_Masking.ipynb` - Test all masking functions

---


In [None]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'manufacturing'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Make variables available to SQL cells
spark.conf.set('catalog_name', CATALOG)
spark.conf.set('schema_name', SCHEMA)


## ⚙️ Configuration

### 🚨 IMPORTANT: Update Before Running!
Change `apscat` to **your catalog name** in the cell below.

### What This Does:
- Sets the target Unity Catalog
- Creates the `manufacturing` schema if it doesn't exist
- Confirms the target location

### Example:
If your catalog is named `my_catalog`, change:
```sql
USE CATALOG apscat;  -- Change this!
```
to:
```sql
USE CATALOG my_catalog;  -- Your catalog name
```


In [None]:
-- Configuration - UPDATE THE CATALOG NAME!
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {SCHEMA}")
  COMMENT 'Manufacturing ABAC demo schema with masking functions';
spark.sql(f"USE SCHEMA {SCHEMA}")

SELECT '🎯 Target: ' || current_catalog() || '.' || current_schema() AS status;

## Function 1: =============================================
DATABRICKS UNITY CATALOG ABAC FUNCTIONS (Manufacturing)
Purpose: Attribute-Based Access Control (ABAC) utility functions
Domain: Manufacturing (operations, maintenance, quality, supply chain)
Usage: Replace <<your_catalog_name>> with your actual catalog name
=============================================
CONFIGURATION
REQUIRED: Replace <<your_catalog_name>> with your actual catalog name (e.g., 'apscat', 'main')


## Function 2: =============================================
MASKING FUNCTIONS
=============================================
1) Partial string masking (e.g., names)


In [None]:
CREATE OR REPLACE FUNCTION mask_string_partial(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: Partial string masking showing first and last characters'
RETURN CASE 
  WHEN input IS NULL OR input = '' THEN input
  WHEN LENGTH(input) <= 2 THEN REPEAT('*', LENGTH(input))
  WHEN LENGTH(input) = 3 THEN CONCAT(LEFT(input, 1), '*', RIGHT(input, 1))
  ELSE CONCAT(LEFT(input, 1), REPEAT('*', LENGTH(input) - 2), RIGHT(input, 1))
END;

## Function 3: 2) Email masking (preserve domain)


In [None]:
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask email local part while preserving domain'
RETURN CASE 
  WHEN email IS NULL OR email = '' THEN email
  WHEN LOCATE('@', email) > 0 THEN CONCAT('****', SUBSTRING(email, LOCATE('@', email)))
  ELSE '****'
END;

## Function 4: 3) Phone masking (preserve last 4)


In [None]:
CREATE OR REPLACE FUNCTION mask_phone(phone STRING)
RETURNS STRING
COMMENT 'ABAC utility: Mask phone number leaving last 4 visible'
RETURN CASE 
  WHEN phone IS NULL OR phone = '' THEN phone
  WHEN LENGTH(phone) >= 4 THEN CONCAT(REPEAT('X', LENGTH(phone) - 4), RIGHT(phone, 4))
  ELSE REPEAT('X', LENGTH(phone))
END;

## Function 5: 4) One-way hash for strings (referential but anonymized)


In [None]:
CREATE OR REPLACE FUNCTION mask_string_hash(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: One-way SHA-256 hash for anonymization'
RETURN sha2(input, 256);

## Function 6: 5) Deterministic referential masking for IDs (string)


In [None]:
CREATE OR REPLACE FUNCTION mask_id_referential(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: Deterministic masking for string identifiers to preserve joins'
RETURN sha2(coalesce(input, ''), 256);

## Function 7: 6) Deterministic referential masking for numeric IDs


In [None]:
CREATE OR REPLACE FUNCTION fast_deterministic_multiplier(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Deterministic multiplier helper (1.001..2.000)'
RETURN 1 + MOD(CRC32(CAST(CAST(id AS STRING) AS BINARY)), 1000) * 0.001;
CREATE OR REPLACE FUNCTION mask_decimal_referential(id DECIMAL)
RETURNS DECIMAL
COMMENT 'ABAC utility: Deterministically mask numeric IDs while preserving referential integrity'
RETURN id * fast_deterministic_multiplier(id);

## Function 8: 7) Redact sensitive specification text


In [None]:
CREATE OR REPLACE FUNCTION mask_spec_text(input STRING)
RETURNS STRING
COMMENT 'ABAC utility: Redact sensitive specification text'
RETURN CASE WHEN input IS NULL THEN NULL ELSE 'REDACTED_SPEC' END;

## Function 9: 8) Redact or hash CAD file references


In [None]:
CREATE OR REPLACE FUNCTION mask_cad_reference(uri STRING)
RETURNS STRING
COMMENT 'ABAC utility: Hash CAD/PLM file references for non-authorized roles'
RETURN CASE WHEN uri IS NULL THEN NULL ELSE sha2(uri, 256) END;

## Function 10: 9) Cost bucketing


In [None]:
CREATE OR REPLACE FUNCTION mask_cost_bucket(amount DECIMAL(18,2))
RETURNS STRING
COMMENT 'ABAC utility: Bucketize costs into Low/Medium/High'
RETURN CASE 
  WHEN amount IS NULL THEN NULL
  WHEN amount < 100 THEN 'Low'
  WHEN amount < 1000 THEN 'Medium'
  ELSE 'High'
END;

## Function 11: 10) Serial last-4 display


In [None]:
CREATE OR REPLACE FUNCTION mask_serial_last4(serial STRING)
RETURNS STRING
COMMENT 'ABAC utility: Show only last 4 characters of serial'
RETURN CASE 
  WHEN serial IS NULL OR serial = '' THEN serial
  WHEN LENGTH(serial) >= 4 THEN CONCAT(REPEAT('X', GREATEST(LENGTH(serial) - 4, 0)), RIGHT(serial, 4))
  ELSE REPEAT('X', LENGTH(serial))
END;

## Function 12: 11) GPS precision reduction (round to 2 decimals)


In [None]:
CREATE OR REPLACE FUNCTION mask_gps_precision(lat DOUBLE, lon DOUBLE)
RETURNS STRING
COMMENT 'ABAC utility: Reduce GPS precision by rounding to 2 decimals'
RETURN CONCAT(CAST(ROUND(lat, 2) AS STRING), ',', CAST(ROUND(lon, 2) AS STRING));

## Function 13: 12) Timestamp rounding to 15-minute bins


In [None]:
CREATE OR REPLACE FUNCTION mask_timestamp_15min(ts TIMESTAMP)
RETURNS TIMESTAMP
COMMENT 'ABAC utility: Floor timestamp to 15-minute intervals'
RETURN to_timestamp((unix_timestamp(ts) - (unix_timestamp(ts) % 900)));

## Function 14: =============================================
ROW FILTER FUNCTIONS
=============================================
A) Business hours filter (8AM - 6PM Chicago time)


In [None]:
CREATE OR REPLACE FUNCTION business_hours_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access only during business hours (8AM-6PM America/Chicago)'
RETURN hour(from_utc_timestamp(current_timestamp(), 'America/Chicago')) BETWEEN 8 AND 18;

B) Maintenance night-shift window (10PM-6AM Chicago time)


In [None]:
CREATE OR REPLACE FUNCTION maintenance_hours_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Allow access during maintenance window (22:00-06:00 America/Chicago)'
RETURN (
  hour(from_utc_timestamp(current_timestamp(), 'America/Chicago')) >= 22
  OR hour(from_utc_timestamp(current_timestamp(), 'America/Chicago')) < 6
);

C) Hard deny (no rows)


In [None]:
CREATE OR REPLACE FUNCTION no_rows()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Returns FALSE to deny all rows'
RETURN FALSE;

D) Placeholder filters for tag-driven scenarios (return TRUE; policy tags/grouproles drive binding)


In [None]:
CREATE OR REPLACE FUNCTION sensitive_asset_filter()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Placeholder for sensitive asset contexts (policy-driven)'
RETURN TRUE;
CREATE OR REPLACE FUNCTION quality_nc_only()
RETURNS BOOLEAN
COMMENT 'ABAC utility: Placeholder to expose only non-conformance contexts (policy-driven)'
RETURN TRUE;

## Function 15: =============================================
FUNCTION INVENTORY
=============================================
SHOW USER FUNCTIONS;  -- uncomment to list functions after execution


## ✅ Success!

All Manufacturing masking functions have been created successfully!

### What You Just Created:
- ✅ Masking functions registered in Unity Catalog
- ✅ Functions available for use in SQL queries
- ✅ Foundation for ABAC policies ready

### Verify Your Functions:
You can verify the functions were created by running:
```sql
SHOW FUNCTIONS IN apscat.manufacturing;
```

### 🎯 Next Step:
Continue to **`2_Create_Schema.ipynb`** to create the database tables and load sample data.

---
**Note**: These functions are stored in Unity Catalog and can be used across multiple notebooks and queries.
