# Delta File – Checkpoint Example

**Checkpoint in Delta Lake**

### Overview
- **Purpose**: Reduce query latency for Delta tables with large `_delta_log` directories.  
- **How it works**:
  1. Creates a **Parquet checkpoint** summarizing table state.
  2. Each checkpoint corresponds to a **specific version**.
  3. Speeds up future queries by reading fewer JSON logs.
- **Applicable for** both streaming and batch tables.
- **Best practice**: Periodically run checkpoint for frequently updated Delta datasets.


In [0]:


# Step 1: Prepare sample transaction data
from pyspark.sql.types import StructType, StructField, StringType

from pyspark.sql import Row

data = [
    Row(emp_id=1, emp_name="Venkat", dept="HR", salary=50000),
    Row(emp_id=2, emp_name="Sathish", dept="Finance", salary=60000),
    Row(emp_id=3, emp_name="Jay", dept="IT", salary=70000)
]
df_txn = spark.createDataFrame(data)
display(df_txn)


In [0]:
# Step 2: Write Data as Delta Files (not table)

# Save as Delta format in the given path
df_txn.write.format("delta").mode("overwrite").save("/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint")

# Verify the data written
df_verify = spark.read.format("delta").load("/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint")
display(df_verify)


In [0]:
delta_path = "/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint"

# Overwrite existing data
df.write.format("delta").mode("overwrite").save(delta_path)

print(" Data written to Delta path:", delta_path)


In [0]:
%sql
-- Step 4: Update record directly using Delta path (no table)
UPDATE delta.`/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint`
SET salary = 55000
WHERE emp_name = 'Sathish';


In [0]:
%sql
-- Step 4: Update record directly using Delta path (no table)
UPDATE delta.`/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint`
SET salary = 82000
WHERE emp_name = 'Venkat';


In [0]:
%sql
-- Step 5: Delete record directly using Delta path (no table)
DELETE FROM delta.`/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint`
WHERE emp_name = 'Jay';


In [0]:
%sql
-- Step 6: Verify Data After Update/Delete
SELECT * FROM delta.`/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint`;


In [0]:
%sql
DESC HISTORY delta.`/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint`;

In [0]:
from delta.tables import DeltaTable

# Load delta table directly from path
delta_tbl = DeltaTable.forPath(spark, "/Volumes/inceptez_catalog/inputdb/employee/dept_checkpoint")

# Create checkpoint
delta_tbl.checkpoint()

print("✅ Checkpoint created successfully.")
