# Advanced Delta Lake Concepts

This notebook covers advanced Delta Lake features:  

- **Time Travel** (audit changes & restore data)  
- **Compaction & Z-Ordering** (optimize small files & indexing)  
- **Vacuum** (cleaning up old data files)  

## Time Travel - Audit Data Changes

In [0]:
-- Check the history of table changes
USE CATALOG hive_metastore;
DESCRIBE HISTORY employees;

In [0]:
-- Query older versions using timestamp
SELECT * FROM employees TIMESTAMP AS OF '2025-08-26 13:24:47';

In [0]:
-- Query using version number
SELECT * FROM employees VERSION AS OF 2;

In [0]:
SELECT * FROM employees@v3

In [0]:
-- Make a change
DELETE FROM employees;

-- See new state
SELECT * FROM employees;


In [0]:
-- Restore to an older version
RESTORE TABLE employees TO VERSION AS OF 3;

In [0]:
SELECT * FROM employees

In [0]:
DESCRIBE HISTORY employees

## Compaction - Optimize Small Files

In [0]:
-- Inspect current file details
DESCRIBE DETAIL employees;

In [0]:
-- Compact small files into fewer files
OPTIMIZE employees;

In [0]:
-- Check details again
DESCRIBE DETAIL employees;
-- before: numFiles = 3, after: numFiles = 1

In [0]:
-- Insert more rows (to generate new small files)
INSERT INTO employees VALUES (8, 'Rita', 2000000);
INSERT INTO employees VALUES (9, 'Girsa', 2000000);

In [0]:
DESCRIBE DETAIL employees;
-- now numFiles increased again (3)

In [0]:
-- Optimize with ZORDER (indexing on salary)
OPTIMIZE employees ZORDER BY salary;

In [0]:
DESCRIBE DETAIL employees;
-- numFiles reduced back to 1

In [0]:
-- See history of operations
DESCRIBE HISTORY employees;


In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees'

## Vacuum - Clean Up Old Data Files
- Removes unused data files to free up storage.
- Default retention = 7 days.
- After vacuum, time travel to older versions doesn't not work.

In [0]:
-- Vacuum with default retention
VACUUM employees;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees'

In [0]:
VACUUM employees RETAIN 0 HOURS

In [0]:
-- Dangerous option: Disable safety check
SET spark.databricks.delta.retentionDurationCheck.enabled = false;
VACUUM employees RETAIN 0 HOURS;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees'

In [0]:
-- Table details after vacuum
DESCRIBE DETAIL employees;

In [0]:
-- Try querying an old version (will fail if files are gone)
SELECT * FROM employees@v1;

In [0]:
-- CleanUp
-- Drop table
DROP TABLE employees;

-- Validate
SELECT * FROM employees;

In [0]:
%fs ls 'dbfs:/user/hive/warehouse/employees';