# Delta Table – CTAS (Create Table As Select) Example

## Overview
CTAS (Create Table As Select) is a **Databricks SQL command** used to:
1. Create a new Delta table
2. Populate it immediately with data from a **SELECT query**

**Advantages of CTAS:**
- Create & populate table in **one step**
- Avoids separate `CREATE TABLE` + `INSERT INTO`
- Supports **transformations during table creation**
- Can specify **table properties, partitioning, clustering, and storage location**

Syntax:

```sql
CREATE TABLE <table_name>
USING <format>
[LOCATION <path>]
AS
SELECT ...;
```
- **Advantages**:
  1. Creates & populates table in **one step**.
  2. Supports **transformations** (casting, filtering, computed columns) during creation.
  3. Supports **Delta table properties**, partitioning, and clustering.
  4. Reduces need for separate `CREATE TABLE` + `INSERT` operations.


In [0]:
# Step 1: Prepare sample transaction data as DataFrame
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType, DoubleType

spark = SparkSession.builder.getOrCreate()

data = [
  ("00000000","06-26-2011","4007024","040.33","Exercise & Fitness","Cardio Machine Accessories","Clarksville","Tennessee","credit"),
  ("00000001","05-26-2011","4006742","198.44","Exercise & Fitness","Weightlifting Gloves","Long Beach","California","credit"),
  ("00000002","06-01-2011","4009775","005.58","Exercise & Fitness","Weightlifting Machine Accessories","Anaheim","California","credit"),
  ("00000003","06-05-2011","4002199","198.19","Gymnastics","Gymnastics Rings","Milwaukee","Wisconsin","credit"),
  ("00000004","12-17-2011","4002613","098.81","Team Sports","Field Hockey","Nashville  ","Tennessee","credit"),
  ("00000005","02-14-2011","4007591","193.63","Outdoor Recreation","Camping & Backpacking & Hiking","Chicago","Illinois","credit"),
  ("00000006","10-28-2011","4002190","027.89","Puzzles","Jigsaw Puzzles","Charleston","South Carolina","credit"),
  ("00000007","07-14-2011","4002964","096.01","Outdoor Play Equipment","Sandboxes","Columbus","Ohio","credit"),
  ("00000008","01-17-2011","4007361","010.44","Winter Sports","Snowmobiling","Des Moines","Iowa","credit")
]

schema = StructType([
    StructField("txnid", StringType(), True),
    StructField("txndate", StringType(), True),
    StructField("custid", StringType(), True),
    StructField("amount", StringType(), True),
    StructField("product", StringType(), True),
    StructField("category", StringType(), True),
    StructField("city", StringType(), True),
    StructField("state", StringType(), True),
    StructField("paytype", StringType(), True)
])

df_txn = spark.createDataFrame(data, schema)

display(df_txn)

In [0]:

## Step 2: Write the DataFrame to a staging Delta table
# We will create a **temporary table** to use for the CTAS example.

df_txn.write.format("delta").mode("overwrite").saveAsTable("inceptez_catalog.inputdb.txn_staging")
print("Staging Table Created")

## Step 3: Create a new table using CTAS

- The new table `txn_ctas` is **created and populated** in a single command.
- We can also apply transformations during creation (e.g., cast amount to double, convert date).

In [0]:
%sql
CREATE TABLE inceptez_catalog.inputdb.txn_ctas
USING DELTA
AS
SELECT
  txnid,
  to_date(txndate,'MM-dd-yyyy') as txndate,
  custid,
  CAST(amount AS DOUBLE) as amount,
  product,
  category,
  city,
  state,
  paytype
FROM inceptez_catalog.inputdb.txn_staging;


In [0]:
%sql
--Query the new CTAS table
select * from inceptez_catalog.inputdb.txn_ctas;
describe formatted inceptez_catalog.inputdb.txn_ctas;

In [0]:
%sql
CREATE TABLE inceptez_catalog.inputdb.txn_ctas_exercise
USING DELTA
AS
SELECT *
FROM inceptez_catalog.inputdb.txn_ctas
WHERE category = 'Exercise & Fitness';

In [0]:
%sql
CREATE TABLE inceptez_catalog.inputdb.txn_ctas_opt
USING DELTA
PARTITIONED BY (state)
TBLPROPERTIES (
  'delta.enableChangeDataCapture' = 'true',
  'delta.autoOptimize.optimizeWrite' = 'true',
  'delta.autoOptimize.autoCompact' = 'true'
)
AS
SELECT
  txnid,
  to_date(txndate,'MM-dd-yyyy') as txndate,
  custid,
  CAST(amount AS DOUBLE) as amount,
  product,
  category,
  city,
  state,
  paytype
FROM inceptez_catalog.inputdb.txn_staging;