# 🗄️ Government ABAC Demo - Step 2: Create Database Schema

## 📋 Overview
This notebook creates the **core database schema** for the Government industry ABAC demo.

### What This Notebook Does:
1. **Creates Tables**: Sets up primary tables with realistic Government industry structure
2. **Loads Sample Data**: Inserts representative test data for demonstrations
3. **Validates Schema**: Confirms tables are created with correct row counts
4. **Establishes Relationships**: Sets up primary keys and table relationships

### Why This Schema?
This schema represents a typical Government industry data structure with:
- **Realistic Fields**: Common columns found in Government databases
- **Sensitive Data**: Fields that require masking (PII, financial, etc.)
- **Relationships**: Connected tables for realistic queries
- **Test Data**: Sufficient data for meaningful demonstrations

## 🎓 How to Use This Notebook
1. **Ensure Step 1 Complete**: Masking functions must be created first
2. **Run All Cells**: Execute cells sequentially
3. **Verify Counts**: Check table row counts match expected values
4. **Note Table Names**: You'll use these in testing and policy creation

## ⚙️ Prerequisites
- ✅ **Step 1 completed**: Masking functions created in `apscat.government`
- ✅ Unity Catalog CREATE TABLE permission
- ✅ SQL Warehouse or Cluster attached
- ✅ Schema already created (from Step 1)

## 📊 Expected Results
After running this notebook, you'll have:
- Multiple tables with primary keys
- Rows of sample data in each table
- A validation query showing row counts

## 🔄 Next Steps
After completing this notebook:
1. **Step 3**: `3_Create_Extended_Tables.ipynb` - Add supplementary tables
2. **Step 4**: `4_Test_Masking.ipynb` - Test masking functions with real data

---


In [None]:
# 📋 Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'✅ Configuration loaded from config.yaml')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'government'
    print(f'⚠️  config.yaml not found - using defaults')
    print(f'   📊 Catalog: {CATALOG}')
    print(f'   📁 Schema: {SCHEMA}')

# Make variables available to SQL cells
spark.conf.set('catalog_name', CATALOG)
spark.conf.set('schema_name', SCHEMA)


## ⚙️ Configuration

Setting the target catalog and schema for table creation.
- **Catalog**: `apscat`
- **Schema**: `government`

This should match what you used in Step 1.


In [None]:
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"USE SCHEMA {SCHEMA}")

SELECT '🗄️ Creating tables in: ' || current_catalog() || '.' || current_schema() AS status;

In [None]:
DROP TABLE IF EXISTS citizens;

## Table: `citizens`


In [None]:
CREATE TABLE citizens (citizen_id STRING, first_name STRING, last_name STRING, ssn STRING,
address STRING, city STRING, state STRING, zip STRING, PRIMARY KEY (citizen_id)) USING DELTA;
INSERT INTO citizens VALUES
('CZ-1001', 'John', 'Smith', '123-45-6789', '123 Main St', 'Springfield', 'IL', '62701'),
('CZ-1002', 'Sarah', 'Johnson', '234-56-7890', '456 Oak Ave', 'Madison', 'WI', '53703'),
('CZ-1003', 'Mike', 'Williams', '345-67-8901', '789 Pine Rd', 'Columbus', 'OH', '43201');

In [None]:
DROP TABLE IF EXISTS licenses;

## Table: `licenses`


In [None]:
CREATE TABLE licenses (license_id STRING, citizen_id STRING, license_type STRING, license_number STRING,
issue_date DATE, expiry_date DATE, PRIMARY KEY (license_id)) USING DELTA;
INSERT INTO licenses VALUES
('L-2001', 'CZ-1001', 'Drivers', 'DL-123456', '2020-01-15', '2025-01-15'),
('L-2002', 'CZ-1002', 'Drivers', 'DL-234567', '2019-05-20', '2024-05-20'),
('L-2003', 'CZ-1003', 'Business', 'BL-345678', '2021-03-10', '2026-03-10');

In [None]:
DROP TABLE IF EXISTS tax_records;

## Table: `tax_records`


In [None]:
CREATE TABLE tax_records (record_id STRING, citizen_id STRING, tax_year INT, income DECIMAL(12,2),
tax_owed DECIMAL(12,2), PRIMARY KEY (record_id)) USING DELTA;
INSERT INTO tax_records VALUES
('T-3001', 'CZ-1001', 2023, 75000.00, 12500.00),
('T-3002', 'CZ-1002', 2023, 95000.00, 18000.00),
('T-3003', 'CZ-1003', 2023, 62000.00, 9800.00);
SELECT 'citizens' AS tbl, COUNT(*) AS cnt FROM citizens
UNION ALL SELECT 'licenses', COUNT(*) FROM licenses
UNION ALL SELECT 'tax_records', COUNT(*) FROM tax_records;

## ✅ Success!

Government database schema has been created successfully!

### What You Just Created:
- ✅ Core tables with primary keys
- ✅ Sample data loaded and ready for testing
- ✅ Table relationships established
- ✅ Schema ready for masking function testing

### Verify Your Tables:
You can list all tables by running:
```sql
SHOW TABLES IN apscat.government;
```

To see table details:
```sql
DESCRIBE TABLE apscat.government.<table_name>;
```

### 📊 Data Summary:
The row count query above shows how many records are in each table. This data will be used for testing masking functions in the next step.

### 🎯 Next Step:
Continue to **`3_Create_Extended_Tables.ipynb`** to add supplementary tables that extend this schema.

---
**Tip**: Keep note of the table names and row counts for reference during testing.
