# Delta Lake: Creating and Managing Tables

Delta Lake supports **reliable, ACID-compliant tables** with advanced features like schema enforcement, constraints, cloning, and views.

This tutorial covers:

1. CTAS (Create Table As Select)  
2. Table Constraints  
3. Cloning Delta Tables

## Sample Table Setup

We’ll create a simple `sales` table with `CREATE TABLE` and insert a few rows.


In [0]:
USE CATALOG hive_metastore;

-- Create table definition
CREATE TABLE sales (
  id INT,
  customer STRING,
  amount DOUBLE,
  region STRING,
  order_date DATE
)
USING DELTA;

-- Insert sample rows
INSERT INTO sales VALUES
  (1, 'Alice', 120.50, 'West',  DATE'2025-08-01'),
  (2, 'Bob',   75.00,  'East',  DATE'2025-08-02'),
  (3, 'Carol', 200.00, 'West',  DATE'2025-08-02'),
  (4, 'David', 50.00,  'North', DATE'2025-08-03'),
  (5, 'Eve',   300.00, 'South', DATE'2025-08-03');

-- Verify data
SELECT * FROM sales;

## 1. Create Table As Select (CTAS)

CTAS = **Create Table As Select**.  
It allows you to create a new table directly from a query result.

Key Points:
- Schema is **automatically inferred** from query (cannot be declared manually).
- Can filter, rename, or transform columns in SELECT.
- Can add **options**: `COMMENT`, `PARTITIONED BY`, `LOCATION`.
- Supports both **managed** (default) and **external** (with LOCATION) tables.


In [0]:
-- Create a managed Delta table from SELECT
CREATE TABLE sales_ctas
COMMENT "Managed Sales table created via CTAS"
PARTITIONED BY (region)
AS
SELECT
    id,
    customer,
    amount * 1.18 AS amount_with_tax,   -- transformation
    region
FROM sales
WHERE region = 'West';

SELECT * FROM sales_ctas;

In [0]:
-- Inspect table details
DESCRIBE EXTENDED sales_ctas;

In [0]:
-- External CTAS (data stored in a custom location)
CREATE TABLE IF NOT EXISTS external_sales_ctas
COMMENT "External Sales table created via CTAS"
PARTITIONED BY (region)
LOCATION 'dbfs:/mnt/demo/external/sales_temp'
AS
SELECT id, customer, amount, region
FROM sales;

-- Verify
SELECT * FROM external_sales_ctas;

In [0]:
-- Inspect table details
DESCRIBE EXTENDED external_sales_ctas;

In [0]:
-- Create a new table using CTAS
CREATE OR REPLACE TABLE tranformed_sales_ctas AS
SELECT
  id AS sale_id,                               -- rename
  customer,                                    -- keep as is
  UPPER(region) AS region_upper,               -- transform
  amount * 1.1 AS amount_with_tax,             -- transform (add 10% tax)
  order_date
FROM sales
WHERE amount > 100;                            -- filter

-- Verify
SELECT * FROM tranformed_sales_ctas;


## 2. Table Constraints in Delta Lake

Constraints help enforce data quality.

### Types of Constraints
- **NOT NULL** – Column cannot contain NULL values.  
- **CHECK** – Condition must be satisfied by all rows.

Constraints are **enforced at write time** (INSERT/UPDATE/MERGE).


In [0]:
-- Create table with NOT NULL constraint
CREATE TABLE customers (
  id INT NOT NULL,
  name STRING NOT NULL,
  email STRING
) USING DELTA;

-- Add a CHECK constraint
ALTER TABLE customers
ADD CONSTRAINT valid_id CHECK (id > 0);

In [0]:

-- Violating constraints will fail:
INSERT INTO customers VALUES (NULL, 'John', 'john@example.com'); -- fails

In [0]:

-- Don not voilate constraints will pass:
INSERT INTO customers VALUES (1, 'John', 'john@example.com'); -- pass
SELECT * FROM customers;

In [0]:
-- View constraints metadata
DESCRIBE DETAIL customers;

In [0]:
SHOW TBLPROPERTIES customers;

## 3. Cloning Delta Tables

Delta supports **zero-copy cloning** for testing, migration, or experimentation.

### Types of Clones
- **SHALLOW CLONE**  
  - Copies only transaction logs.  
  - Fast and space-efficient.  
  - Underlying data is still linked.  

- **DEEP CLONE**  
  - Full copy of data + logs.  
  - Useful for **isolated testing** or backup.  
  - Takes more time and storage.


In [0]:
-- Shallow clone: just metadata, no data copy
CREATE TABLE sales_clone SHALLOW CLONE sales_ctas;
DESCRIBE DETAIL sales_clone;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/sales_clone'

In [0]:
-- Deep clone: full copy
CREATE TABLE sales_deep_clone DEEP CLONE sales_ctas;
DESCRIBE DETAIL sales_deep_clone;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/sales_deep_clone'

In [0]:
-- Deep clone with location (useful for backups)
CREATE TABLE backup_sales
DEEP CLONE sales_ctas
LOCATION 'dbfs:/mnt/backup/sales_clone';
DESCRIBE DETAIL backup_sales;

In [0]:
%fs ls 'dbfs:/mnt/backup/sales_clone'

Use case: clone **production tables** into a **testing environment** without impacting production.


# Summary

- **CTAS** creates tables from queries with auto schema inference.  
- **Constraints** (NOT NULL, CHECK) enforce data quality.  
- **Cloning** (SHALLOW/DEEP) enables testing, backups, and migrations.
