# 🗄️ Insurance ABAC Demo - Step 2: Create Database Schema

## 📋 Overview
This notebook creates the **core database schema** for the Insurance industry ABAC demo.

### What This Notebook Does:
1. **Creates Tables**: Sets up primary tables with realistic Insurance industry structure
2. **Loads Sample Data**: Inserts representative test data for demonstrations
3. **Validates Schema**: Confirms tables are created with correct row counts
4. **Establishes Relationships**: Sets up primary keys and table relationships

### Why This Schema?
This schema represents a typical Insurance industry data structure with:
- **Realistic Fields**: Common columns found in Insurance databases
- **Sensitive Data**: Fields that require masking (PII, financial, etc.)
- **Relationships**: Connected tables for realistic queries
- **Test Data**: Sufficient data for meaningful demonstrations

## 🎓 How to Use This Notebook
1. **Ensure Step 1 Complete**: Masking functions must be created first
2. **Run All Cells**: Execute cells sequentially
3. **Verify Counts**: Check table row counts match expected values
4. **Note Table Names**: You'll use these in testing and policy creation

## ⚙️ Prerequisites
- ✅ **Step 1 completed**: Masking functions created in `apscat.insurance`
- ✅ Unity Catalog CREATE TABLE permission
- ✅ SQL Warehouse or Cluster attached
- ✅ Schema already created (from Step 1)

## 📊 Expected Results
After running this notebook, you'll have:
- Multiple tables with primary keys
- Rows of sample data in each table
- A validation query showing row counts

## 🔄 Next Steps
After completing this notebook:
1. **Step 3**: `3_Create_Extended_Tables.ipynb` - Add supplementary tables
2. **Step 4**: `4_Test_Masking.ipynb` - Test masking functions with real data

---


In [None]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'insurance'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Make variables available to SQL cells
spark.conf.set('catalog_name', CATALOG)
spark.conf.set('schema_name', SCHEMA)


## ⚙️ Configuration

Setting the target catalog and schema for table creation.
- **Catalog**: `apscat`
- **Schema**: `insurance`

This should match what you used in Step 1.


In [None]:
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"USE SCHEMA {SCHEMA}")

SELECT '🗄️ Creating tables in: ' || current_catalog() || '.' || current_schema() AS status;

In [None]:
DROP TABLE IF EXISTS policyholders;

## Table: `policyholders`


In [None]:
CREATE TABLE policyholders (policyholder_id STRING, first_name STRING, last_name STRING, ssn STRING,
email STRING, phone STRING, PRIMARY KEY (policyholder_id)) USING DELTA;
INSERT INTO policyholders VALUES
('PH-1001', 'John', 'Smith', '123-45-6789', 'john@email.com', '555-0101'),
('PH-1002', 'Sarah', 'Johnson', '234-56-7890', 'sarah@email.com', '555-0102'),
('PH-1003', 'Mike', 'Williams', '345-67-8901', 'mike@email.com', '555-0103');

In [None]:
DROP TABLE IF EXISTS policies;

## Table: `policies`


In [None]:
CREATE TABLE policies (policy_id STRING, policyholder_id STRING, policy_number STRING, policy_type STRING,
premium DECIMAL(10,2), coverage_amount DECIMAL(12,2), PRIMARY KEY (policy_id)) USING DELTA;
INSERT INTO policies VALUES
('P-5001', 'PH-1001', 'POL123456', 'Auto', 1200.00, 50000.00),
('P-5002', 'PH-1002', 'POL234567', 'Home', 1800.00, 300000.00),
('P-5003', 'PH-1003', 'POL345678', 'Life', 600.00, 500000.00);

In [None]:
DROP TABLE IF EXISTS claims;

## Table: `claims`


In [None]:
CREATE TABLE claims (claim_id STRING, policy_id STRING, claim_amount DECIMAL(12,2), claim_date DATE,
status STRING, PRIMARY KEY (claim_id)) USING DELTA;
INSERT INTO claims VALUES
('C-1001', 'P-5001', 8500.00, '2024-02-15', 'Approved'),
('C-1002', 'P-5002', 15000.00, '2024-03-01', 'Pending'),
('C-1003', 'P-5003', 0.00, '2024-03-10', 'Denied');
SELECT 'policyholders' AS tbl, COUNT(*) AS cnt FROM policyholders
UNION ALL SELECT 'policies', COUNT(*) FROM policies
UNION ALL SELECT 'claims', COUNT(*) FROM claims;

## ✅ Success!

Insurance database schema has been created successfully!

### What You Just Created:
- ✅ Core tables with primary keys
- ✅ Sample data loaded and ready for testing
- ✅ Table relationships established
- ✅ Schema ready for masking function testing

### Verify Your Tables:
You can list all tables by running:
```sql
SHOW TABLES IN apscat.insurance;
```

To see table details:
```sql
DESCRIBE TABLE apscat.insurance.<table_name>;
```

### 📊 Data Summary:
The row count query above shows how many records are in each table. This data will be used for testing masking functions in the next step.

### 🎯 Next Step:
Continue to **`3_Create_Extended_Tables.ipynb`** to add supplementary tables that extend this schema.

---
**Tip**: Keep note of the table names and row counts for reference during testing.
