# üóÑÔ∏è Retail ABAC Demo - Step 2: Create Database Schema

## üìã Overview
This notebook creates the **core database schema** for the Retail industry ABAC demo.

### What This Notebook Does:
1. **Creates Tables**: Sets up primary tables with realistic Retail industry structure
2. **Loads Sample Data**: Inserts representative test data for demonstrations
3. **Validates Schema**: Confirms tables are created with correct row counts
4. **Establishes Relationships**: Sets up primary keys and table relationships

### Why This Schema?
This schema represents a typical Retail industry data structure with:
- **Realistic Fields**: Common columns found in Retail databases
- **Sensitive Data**: Fields that require masking (PII, financial, etc.)
- **Relationships**: Connected tables for realistic queries
- **Test Data**: Sufficient data for meaningful demonstrations

## üéì How to Use This Notebook
1. **Ensure Step 1 Complete**: Masking functions must be created first
2. **Run All Cells**: Execute cells sequentially
3. **Verify Counts**: Check table row counts match expected values
4. **Note Table Names**: You'll use these in testing and policy creation

## ‚öôÔ∏è Prerequisites
- ‚úÖ **Step 1 completed**: Masking functions created in `apscat.retail`
- ‚úÖ Unity Catalog CREATE TABLE permission
- ‚úÖ SQL Warehouse or Cluster attached
- ‚úÖ Schema already created (from Step 1)

## üìä Expected Results
After running this notebook, you'll have:
- Multiple tables with primary keys
- Rows of sample data in each table
- A validation query showing row counts

## üîÑ Next Steps
After completing this notebook:
1. **Step 3**: `3_Create_Extended_Tables.ipynb` - Add supplementary tables
2. **Step 4**: `4_Test_Masking.ipynb` - Test masking functions with real data

---


In [None]:
# üìã Load Configuration from config.yaml
import yaml
from pathlib import Path

config_file = Path('config.yaml')
if config_file.exists():
    with open(config_file) as f:
        config = yaml.safe_load(f)
    CATALOG = config['catalog']
    SCHEMA = config['schema']
    print(f'‚úÖ Configuration loaded from config.yaml')
    print(f'   üìä Catalog: {CATALOG}')
    print(f'   üìÅ Schema: {SCHEMA}')
else:
    # Fallback defaults
    CATALOG = 'your_catalog_name'
    SCHEMA = 'retail'
    print(f'‚ö†Ô∏è  config.yaml not found - using defaults')
    print(f'   üìä Catalog: {CATALOG}')
    print(f'   üìÅ Schema: {SCHEMA}')

# Make variables available to SQL cells
spark.conf.set('catalog_name', CATALOG)
spark.conf.set('schema_name', SCHEMA)


## ‚öôÔ∏è Configuration

Setting the target catalog and schema for table creation.
- **Catalog**: `apscat`
- **Schema**: `retail`

This should match what you used in Step 1.


In [None]:
spark.sql(f"USE CATALOG {CATALOG}")
spark.sql(f"USE SCHEMA {SCHEMA}")

SELECT 'üóÑÔ∏è Creating tables in: ' || current_catalog() || '.' || current_schema() AS status;

In [None]:
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {SCHEMA}")
spark.sql(f"USE SCHEMA {SCHEMA}")


In [None]:
DROP TABLE IF EXISTS customers;

## Table: `customers`


In [None]:
CREATE TABLE customers (customer_id STRING, first_name STRING, last_name STRING, email STRING, phone STRING, 
address STRING, city STRING, state STRING, zip STRING, created_date DATE, PRIMARY KEY (customer_id)) USING DELTA;
INSERT INTO customers VALUES
('C-1001', 'John', 'Smith', 'john.s@email.com', '555-0101', '123 Main St', 'New York', 'NY', '10001', '2023-01-15'),
('C-1002', 'Sarah', 'Johnson', 'sarah.j@email.com', '555-0102', '456 Oak Ave', 'Los Angeles', 'CA', '90001', '2023-02-20'),
('C-1003', 'Michael', 'Williams', 'm.will@email.com', '555-0103', '789 Pine Rd', 'Chicago', 'IL', '60601', '2023-03-10'),
('C-1004', 'Emily', 'Brown', 'ebrown@email.com', '555-0104', '321 Elm St', 'Houston', 'TX', '77001', '2023-04-05'),
('C-1005', 'David', 'Jones', 'djones@email.com', '555-0105', '654 Maple Dr', 'Phoenix', 'AZ', '85001', '2023-05-12');

In [None]:
DROP TABLE IF EXISTS products;

## Table: `products`


In [None]:
CREATE TABLE products (product_id STRING, product_name STRING, category STRING, price DECIMAL(10,2), 
cost DECIMAL(10,2), stock INT, PRIMARY KEY (product_id)) USING DELTA;
INSERT INTO products VALUES
('P-101', 'Laptop Pro', 'Electronics', 1299.99, 850.00, 50),
('P-102', 'Wireless Mouse', 'Electronics', 29.99, 12.00, 200),
('P-103', 'Coffee Maker', 'Appliances', 89.99, 45.00, 75),
('P-104', 'Running Shoes', 'Apparel', 129.99, 60.00, 150),
('P-105', 'Backpack', 'Accessories', 49.99, 20.00, 100);

In [None]:
DROP TABLE IF EXISTS orders;

## Table: `orders`


In [None]:
CREATE TABLE orders (order_id STRING, customer_id STRING, order_date TIMESTAMP, total_amount DECIMAL(12,2),
payment_method STRING, ip_address STRING, status STRING, PRIMARY KEY (order_id)) USING DELTA;
INSERT INTO orders VALUES
('O-5001', 'C-1001', timestamp('2024-03-01 10:30:00'), 1329.98, 'Credit Card', '192.168.1.100', 'Completed'),
('O-5002', 'C-1002', timestamp('2024-03-02 14:15:00'), 219.98, 'PayPal', '10.0.0.50', 'Completed'),
('O-5003', 'C-1003', timestamp('2024-03-03 09:45:00'), 89.99, 'Credit Card', '172.16.0.25', 'Completed'),
('O-5004', 'C-1004', timestamp('2024-03-04 16:20:00'), 179.98, 'Debit Card', '192.168.2.75', 'Shipped'),
('O-5005', 'C-1005', timestamp('2024-03-05 11:10:00'), 1299.99, 'Credit Card', '10.1.1.100', 'Processing');
SELECT 'customers' AS tbl, COUNT(*) AS cnt FROM customers
UNION ALL SELECT 'products', COUNT(*) FROM products
UNION ALL SELECT 'orders', COUNT(*) FROM orders;

## ‚úÖ Success!

Retail database schema has been created successfully!

### What You Just Created:
- ‚úÖ Core tables with primary keys
- ‚úÖ Sample data loaded and ready for testing
- ‚úÖ Table relationships established
- ‚úÖ Schema ready for masking function testing

### Verify Your Tables:
You can list all tables by running:
```sql
SHOW TABLES IN apscat.retail;
```

To see table details:
```sql
DESCRIBE TABLE apscat.retail.<table_name>;
```

### üìä Data Summary:
The row count query above shows how many records are in each table. This data will be used for testing masking functions in the next step.

### üéØ Next Step:
Continue to **`3_Create_Extended_Tables.ipynb`** to add supplementary tables that extend this schema.

---
**Tip**: Keep note of the table names and row counts for reference during testing.
