# Create Delta Tables from Hive DDL

This notebook converts Hive/Impala DDL statements to Databricks Delta tables.

**Features:**
- Optional type optimization (STRING `_ts` columns → TIMESTAMP) - **disabled by default for safety**
- Managed Delta tables with auto-optimization
- Single file or batch processing
- Dry-run mode to preview DDL

**Prerequisites:**
- Hive DDL files uploaded to Volumes
- Appropriate permissions on target catalog/schema
- Active cluster (serverless or all-purpose)

## Setup

Import the table creation functions

In [None]:
import sys

# Add the schema migration tool to Python path
sys.path.append("/Workspace/Users/eliao@bpcs.com/nifi_to_databricks_test/tools/schema_migration_tool")

from create_delta_tables import create_tables_from_hive_ddl

print("✓ Imports successful")

## Configuration

Set your target catalog and schema

In [None]:
# Target Databricks catalog and schema
TARGET_CATALOG = "eliao"
TARGET_SCHEMA = "nifi_to_databricks"

print(f"Target location: {TARGET_CATALOG}.{TARGET_SCHEMA}")

## Option 1: Single File Processing

Convert a single Hive DDL file to Delta table.

**Note:** By default, column types are kept as-is (no automatic conversion).

In [None]:
# Single file - update the path to your DDL file
SINGLE_FILE_PATH = "/Volumes/eliao/nifi_to_databricks/test_data_files/test.sql"

result = create_tables_from_hive_ddl(
    input_file=SINGLE_FILE_PATH,
    catalog=TARGET_CATALOG,
    schema=TARGET_SCHEMA
    # optimize_types=False is the default (keeps original types)
)

print(f"\n{'='*80}")
print("RESULT")
print(f"{'='*80}")
print(f"✓ Successfully created: {result['success_count']} table(s)")
print(f"✗ Failed: {result['fail_count']} table(s)")
print(f"Total processed: {result['total']} file(s)")

### Single File - Dry Run

Preview the DDL without creating tables

In [None]:
# Dry run - just show what would be created
result = create_tables_from_hive_ddl(
    input_file=SINGLE_FILE_PATH,
    catalog=TARGET_CATALOG,
    schema=TARGET_SCHEMA,
    dry_run=True  # Only preview, don't create
)

## Option 2: Batch Processing

Process multiple DDL files from a directory

In [None]:
# Batch processing - update the path to your directory containing .sql files
BATCH_DIRECTORY = "/Volumes/eliao/nifi_to_databricks/hive_ddls/"

result = create_tables_from_hive_ddl(
    input_dir=BATCH_DIRECTORY,
    catalog=TARGET_CATALOG,
    schema=TARGET_SCHEMA
)

print(f"\n{'='*80}")
print("BATCH PROCESSING RESULT")
print(f"{'='*80}")
print(f"✓ Successfully created: {result['success_count']} table(s)")
print(f"✗ Failed: {result['fail_count']} table(s)")
print(f"Total processed: {result['total']} file(s)")

### Batch Processing - Dry Run

Preview all tables without creating them

In [None]:
# Dry run for batch - see DDL for all files
result = create_tables_from_hive_ddl(
    input_dir=BATCH_DIRECTORY,
    catalog=TARGET_CATALOG,
    schema=TARGET_SCHEMA,
    dry_run=True  # Only preview, don't create
)

## Verify Created Tables

Check that tables were created successfully

In [None]:
%sql
-- Show all tables in the target schema
SHOW TABLES IN eliao.nifi_to_databricks;

In [None]:
%sql
-- Describe a specific table (update table name)
DESCRIBE EXTENDED eliao.nifi_to_databricks.obf_table_raw;

## Advanced Options

### WITH Type Optimization (Use with Caution)

⚠️ **WARNING:** Only enable if you're 100% certain your naming convention uses `_ts` suffix exclusively for timestamps.

This will convert STRING columns ending with `_ts` to TIMESTAMP.

**Risk:** Column names like `counts`, `status_ts`, `bytes` would be incorrectly converted!

In [None]:
# Process WITH type optimization (risky!)
result = create_tables_from_hive_ddl(
    input_file=SINGLE_FILE_PATH,
    catalog=TARGET_CATALOG,
    schema=TARGET_SCHEMA,
    optimize_types=True  # ⚠️ Converts _ts STRING columns to TIMESTAMP
)

### Custom Catalog and Schema

Create tables in different catalog/schema combinations

In [None]:
# Create in dev environment
result_dev = create_tables_from_hive_ddl(
    input_file=SINGLE_FILE_PATH,
    catalog="dev",
    schema="bronze"
)

# Create in prod environment
result_prod = create_tables_from_hive_ddl(
    input_file=SINGLE_FILE_PATH,
    catalog="prod",
    schema="bronze"
)

print(f"Dev: {result_dev['success_count']} tables created")
print(f"Prod: {result_prod['success_count']} tables created")

## Troubleshooting

### Check if schema exists

In [None]:
%sql
SHOW SCHEMAS IN eliao LIKE 'nifi*';

### Check catalog permissions

In [None]:
%sql
SHOW GRANTS ON CATALOG eliao;

### View table properties

In [None]:
%sql
-- View Delta table properties
SHOW TBLPROPERTIES eliao.nifi_to_databricks.obf_table_raw;

## Notes

- Tables are created as **managed Delta tables** (no LOCATION clause)
- Partitioning is preserved from the Hive table
- Auto-optimization is enabled by default
- Tables are created **empty** (structure only, no data)
- **Column types are preserved by default** - no automatic conversion
- To load data, use separate `INSERT INTO` or data migration tools