# Delta Lake Versioning, Optimization, and Vacuuming

This notebook provides a hands-on overview of some of the more essential features Delta Lake brings to the data lakehouse.

---

## üéØ Learning Objectives

By the end of this lab, you should be able to:

* Review table history
* Query previous table versions and rollback a table to a specific version
* Perform file compaction and Z-Ordering
* Preview files marked for permanent deletion and commit those deletes

---

## üß™ Recreate the History of Your Bean Collection

This lab picks up where the last lab left off. The lab below condenses all the operations from that lab into a single cell rather than the many cells.

**For quick reference, the schema of the `beans` table created is:**

| Column    | Type    |
| --------- | ------- |
| name      | STRING  |
| color     | STRING  |
| grams     | FLOAT   |
| delicious | BOOLEAN |

---


### Create Table and Insert Initial Data

```sql
CREATE TABLE beans (
  name STRING,
  color STRING,
  grams FLOAT,
  delicious BOOLEAN
);

INSERT INTO beans VALUES
('black', 'black', 500, true),
('lentils', 'brown', 1000, true),
('jelly', 'rainbow', 42.5, false);
```

---

### Insert More Data

```sql
INSERT INTO beans VALUES
('pinto', 'brown', 1.5, true),
('green', 'green', 178.3, true),
('beanbag chair', 'white', 40000, false);
```

---

### Updates

```sql
UPDATE beans
SET delicious = true
WHERE name = 'jelly';

UPDATE beans
SET grams = 1500
WHERE name = 'pinto';
```

---

### Delete Data

```sql
DELETE FROM beans
WHERE delicious = false;
```

---

### Merge (Upsert) Operation

```sql
CREATE OR REPLACE TEMP VIEW new_beans(name, color, grams, delicious) AS VALUES
('black', 'black', 60.5, true),
('lentils', 'green', 500, true),
('kidney', 'red', 387.2, true),
('castor', 'brown', 25, false);

MERGE INTO beans a
USING new_beans b
ON a.name = b.name AND a.color = b.color
WHEN MATCHED THEN
  UPDATE SET grams = a.grams + b.grams
WHEN NOT MATCHED AND b.delicious = true THEN
  INSERT *
;
```

---

In [0]:
%sql
CREATE OR REPLACE TABLE beans(
  name STRING,
  color STRING,
  grams FLOAT,
  delicious BOOLEAN
);

INSERT INTO beans VALUES
('black', 'black', 500, true),
('lentile', 'brown', 1000, true),
('jelly', 'rainbow', 42.5, false);

In [0]:
%sql
INSERT INTO beans VALUES
('pinto', 'brown',1.5, true),
('green', 'green',178.3, true),
('beanbag chair', 'white',40000, false);

In [0]:
%sql
UPDATE beans
SET delicious = true
WHERE name = 'jelly';

UPDATE beans
SET grams = 1500
WHERE name = 'pinto';

In [0]:
%sql
DELETE FROM beans
WHERE delicious = false;

In [0]:
%sql
CREATE OR REPLACE TEMP VIEW new_beans (name, color, grams, delicious) AS VALUES
('black', 'black', 60.5, true),
('lentils', 'green', 500, true),
('kidney', 'red', 387.2, true),
('castor', 'brown', 25, false);

In [0]:
%sql
select * from new_beans;

In [0]:
%sql
--MERGE INTO students b
--USING updates u
--ON b.id = u.id
--WHEN MATCHED AND u.type = "update" THEN UPDATE SET *
--WHEN MATCHED AND u.type = "delete" THEN DELETE
--WHEN NOT MATCHED AND u.type = "insert" THEN INSERT *;

MERGE INTO beans b
USING new_beans nb
ON b.name = nb.name AND b.color = nb.color
WHEN MATCHED THEN 
    UPDATE SET b.grams = b.grams + nb.grams
WHEN NOT MATCHED AND nb.delicious = true
    THEN INSERT *;

## üïí Review the Table History

Delta Lake‚Äôs transaction log stores information about each transaction that modifies a table‚Äôs contents or settings.

### View the History

```sql
DESCRIBE HISTORY beans;
```

Each operation increments the table version.
Versions start at **0**.

---

## üîç Query a Specific Version

### Query by Version Number

```sql
SELECT * FROM beans VERSION AS OF 4;
```

---

### Query by Timestamp

```sql
SELECT * FROM beans TIMESTAMP AS OF '2023-01-01T12:00:00.000+0000';
```

---

### Query Files for a Specific Version

```sql
SELECT * FROM delta.`/path/to/table` VERSION AS OF 3;
```

---

In [0]:
%sql
DESCRIBE HISTORY beans;

In [0]:
%sql
SELECT * FROM beans VERSION AS OF 4;

## ‚è™ Restore a Previous Version

If you decide you want to roll back to a previous version of the table:

```sql
RESTORE TABLE beans VERSION AS OF 5;
```

---

## üìú Confirm the Restore

```sql
DESCRIBE HISTORY beans;
```

You should see a new operation of type **RESTORE**.

---



In [0]:
%sql
RESTORE TABLE beans VERSION AS OF 5;

In [0]:
%sql
SELECT * FROM beans;

In [0]:
%sql
DESCRIBE HISTORY beans;

## üß± File Compaction (OPTIMIZE)

Delta tables can accumulate many small files, which can impact performance.

### Optimize Table

```sql
OPTIMIZE beans;
```

---

### Optimize with Z-Ordering

Z-Ordering colocates related information in the same set of files.

```sql
OPTIMIZE beans
ZORDER BY (name);
```

---



In [0]:
%sql
OPTIMIZE beans;

In [0]:
%sql
OPTIMIZE beans
ZORDER BY (name);

## üßπ Cleaning Up Stale Data Files (VACUUM)

Looking at the transaction metrics during your revision, you may notice many files for a small collection of data.

### Preview Files Marked for Deletion

```sql
VACUUM beans RETAIN 168 HOURS DRY RUN;
```

---

### Permanently Remove Old Files

> ‚ö†Ô∏è **WARNING**: This operation is irreversible.

```sql
VACUUM beans RETAIN 168 HOURS;
```

---

### Disable Retention Check (Demo Only)

```sql
SET spark.databricks.delta.retentionDurationCheck.enabled = false;

VACUUM beans RETAIN 0 HOURS;
```

> ‚ö†Ô∏è **NOTE**: This should only be done for demonstrations ‚Äî **never in production**.

---

## ‚úÖ Final Notes

* Delta Lake keeps **transaction logs** to support time travel
* `DESCRIBE HISTORY` is your best friend for auditing
* `OPTIMIZE` + `ZORDER` = better performance
* `VACUUM` controls storage growth, but must be used carefully

---

In [0]:
Nara 