# Databricks Deletion Vectors

### **What are Deletion Vectors?**
A **Deletion Vector (DV)** in Delta Lake is a mechanism that enables **faster and more efficient row-level deletes and updates** without rewriting entire data files.

Traditionally, when a record was deleted or updated in a Delta table, the affected Parquet file had to be rewritten.  
With **Deletion Vectors**, Delta Lake now marks deleted rows **logically**, storing their positions in a separate **deletion vector file (.dv)** — rather than physically removing them immediately.

This allows:
- **Faster DELETE, UPDATE, and MERGE operations**
- **Reduced data rewriting**
- **Better concurrency and scalability**

Deletion vectors were introduced with **Delta Lake 2.3+** and are **enabled by default** in **Databricks Runtime 13.0+**.

---

### **How Deletion Vectors Work**
1. When a DELETE or UPDATE happens, Delta records the row positions that were removed or changed in a **deletion vector bitmap**.  
2. The underlying data file remains unchanged, but during reads, those rows are **logically filtered out**.  
3. Periodic **OPTIMIZE** or **VACUUM** operations can later rewrite files to physically remove the deleted rows.

---

### **Key Features**
| **Feature** | **Description** |
|--------------|----------------|
| **Logical Deletes** | Marks deleted rows using a bitmap instead of rewriting files. |
| **Faster Updates** | Reduces I/O during MERGE and UPDATE operations. |
| **Automatic Management** | Databricks manages DV creation and cleanup automatically. |
| **Compatibility** | Works with Delta tables using column mapping mode `name` or `id`. |
| **Compaction** | `OPTIMIZE` operation can compact and physically remove DVs. |

---

### **When Deletion Vectors Are Created**
- During `DELETE`, `UPDATE`, or `MERGE` operations.
- When **Liquid Clustering** or **OPTIMIZE** is used, DVs may also appear for data reorganization.

---

### **Important Configuration**
You can control deletion vectors using the following properties:
```sql
-- Enable or disable deletion vectors
SET spark.databricks.delta.properties.defaults.enableDeletionVectors = true;


In [0]:

# Step 1: Create a Delta table with sample data

data = [
    (1, "Kamath", 5000),
    (2, "Raghu", 6000),
    (3, "Avantika", 7000),
    (4, "Bhavana", 8000)
]

columns = ["emp_id", "name", "salary"]

df = spark.createDataFrame(data, columns)
df.write.format("delta").mode("overwrite").option("mergeSchema", "true").saveAsTable("inceptez_catalog.inputdb.employee_dv_demo1")

display(spark.table("inceptez_catalog.inputdb.employee_dv_demo1"))


### Step 1: Delete record from the Table

In [0]:
%sql
DESCRIBE EXTENDED inceptez_catalog.inputdb.employee_dv_demo1;

In [0]:
%sql
DELETE FROM inceptez_catalog.inputdb.employee_dv_demo WHERE emp_id = 4;

In [0]:
%sql
DESCRIBE EXTENDED inceptez_catalog.inputdb.employee_dv_demo;

In [0]:
%sql
DESC HISTORY inceptez_catalog.inputdb.employee_dv_demo;

### Step 2: Disable Deletion Vector Feature for the Table


In [0]:
spark.sql("""
ALTER TABLE inceptez_catalog.inputdb.employee_dv_demo
SET TBLPROPERTIES ('delta.enableDeletionVectors' = false)
""")

# Confirm the table property
display(spark.sql("DESCRIBE EXTENDED inceptez_catalog.inputdb.employee_dv_demo"))


### Step 3: Perform a DELETE Operation


In [0]:
# Delete a record (will create deletion vector instead of rewriting files)
spark.sql("DELETE FROM inceptez_catalog.inputdb.employee_dv_demo WHERE emp_id = 4")



In [0]:
%sql
DESC HISTORY inceptez_catalog.inputdb.employee_dv_demo;

In [0]:
%sql
update inceptez_catalog.inputdb.employee_dv_demo set name = 'Vishal' where emp_id=2;

In [0]:
%sql
DESC HISTORY inceptez_catalog.inputdb.employee_dv_demo;

In [0]:
%sql
ALTER TABLE inceptez_catalog.inputdb.employee_dv_demo
SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);

In [0]:
%sql
update inceptez_catalog.inputdb.employee_dv_demo set name = 'Vimal' where emp_id=2;

In [0]:
%sql
DESC HISTORY inceptez_catalog.inputdb.employee_dv_demo;