###  VACUUM COMMAND IN DATABRICKS (DELTA LAKE)

#### PURPOSE:
- The VACUUM command in Databricks deletes obsolete files from a Delta table to free up storage.
- Obsolete files are created during operations like UPDATE, DELETE, MERGE, or OPTIMIZE.

#### WHY NEEDED:
- Delta tables maintain a transaction log (.delta_log) that keeps old versions of files
- for time-travel and ACID compliance. Over time, these files accumulate and occupy storage.
- VACUUM safely removes files older than the retention period.

#### COMMAND SYNTAX:
     VACUUM table_name [RETAIN num HOURS]

#### DEFAULT RETENTION:
- By default, Delta enforces a 7-day retention period to prevent accidental data loss.
- To override, you can specify a shorter period using RETAIN num HOURS.

**NOTES AND BEST PRACTICES:**
- Do not reduce retention below 1 hour without understanding consequences.
- Always use VACUUM after OPTIMIZE to clean obsolete files if needed.
- VACUUM only affects physical files, not the transaction log or metadata.

In [0]:
%sql
-- Step 1: Create a Delta table

CREATE TABLE IF NOT EXISTS inceptez_catalog.inputdb.product_inventory (
    product_id INT,
    product_name STRING,
    category STRING,
    price DOUBLE,
    quantity INT,
    updated_date DATE
)
USING DELTA;

In [0]:
%sql
-- Step 2: Insert initial data

INSERT INTO inceptez_catalog.inputdb.product_inventory VALUES
 (1, 'Laptop', 'Electronics', 65000, 10, '2025-10-01'),
 (2, 'Headphones', 'Electronics', 2500, 50, '2025-10-01'),
 (3, 'Desk Chair', 'Furniture', 4500, 20, '2025-10-01');



In [0]:
%sql
-- Step 3: Update some records (creates new Parquet file versions)

UPDATE inceptez_catalog.inputdb.product_inventory
SET price = price * 1.1,
    updated_date = '2025-10-05'
WHERE category = 'Electronics';

In [0]:
%sql
-- Step 4: View Delta table history (shows create, insert, update operations)

DESCRIBE HISTORY inceptez_catalog.inputdb.product_inventory;

In [0]:
%sql
-- Step 5: View current table details (file count, size, path)

DESCRIBE DETAIL inceptez_catalog.inputdb.product_inventory;

In [0]:
%sql
-- Disable retention duration check (for demo only)
-- Default retention is 7 days for safety
-- Run VACUUM to permanently delete obsolete files
-- WARNING: Irreversible! Use only for demo or test

SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM inceptez_catalog.inputdb.product_inventory RETAIN 0 HOURS;