# Dynamic Partition Overwrites with SQL REPLACE USING

## Purpose
- This notebook demonstrates Databricks native dynamic partition overwrites using **`REPLACE USING`** syntax.

## References
- [REPLACE USING Syntax](https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-syntax-dml-insert-into#replace-using)
- [Dynamic Partition Overwrites](https://docs.databricks.com/aws/en/delta/selective-overwrite#dynamic-partition-overwrites-with-replace-using)

## Parameters
- `catalog`: Unity Catalog name (default: main)
- `schema`: Schema name (default: your_schema)

In [0]:
USE CATALOG IDENTIFIER(:catalog);
CREATE SCHEMA IF NOT EXISTS IDENTIFIER(:schema);
USE SCHEMA IDENTIFIER(:schema);

In [0]:
DROP TABLE IF EXISTS target_students;
DROP TABLE IF EXISTS source_new_students;
DROP TABLE IF EXISTS target_students_replace_on;
DROP TABLE IF EXISTS source_new_students_replace_on;

In [0]:
CREATE OR REPLACE TABLE target_students (country STRING, table_origin STRING)
USING delta
CLUSTER BY (country);

INSERT INTO target_students VALUES
  ('UK', 'target'),
  ('NL', 'target'),
  (NULL, 'target');

CREATE OR REPLACE TABLE source_new_students (country STRING, table_origin STRING)
USING delta;

INSERT INTO source_new_students VALUES
  ('FR', 'source'),
  ('UK', 'source1'),
  ('UK', 'source2'),
  (NULL, 'source');

## REPLACE USING

In [0]:
INSERT INTO TABLE target_students
  REPLACE USING (country)
  SELECT * FROM source_new_students;

SELECT * FROM target_students;

## Legacy DPO syntax &raquo; skipped as this fails
* Serverless GC as well as SQL do not support the Spark conf: `spark.sql.sources.partitionOverwriteMode=dynamic`

In [0]:
%skip
SET spark.sql.sources.partitionOverwriteMode=dynamic;
INSERT OVERWRITE TABLE target_students SELECT * FROM source_new_students;

## REPLACE ON

In [0]:
CREATE OR REPLACE TABLE target_students_replace_on (country STRING, table_origin STRING)
USING delta
CLUSTER BY (country);

INSERT INTO target_students_replace_on VALUES
  ('UK', 'target'),
  ('NL', 'target'),
  (NULL, 'target');

CREATE OR REPLACE TABLE source_new_students_replace_on (country STRING, table_origin STRING)
USING delta;

INSERT INTO source_new_students_replace_on VALUES
  ('FR', 'source'),
  ('UK', 'source1'),
  ('UK', 'source2'),
  (NULL, 'source');

In [0]:
INSERT INTO TABLE target_students_replace_on AS t
  REPLACE ON (t.country <=> s.country)
  (SELECT * FROM source_new_students_replace_on) AS s;

SELECT * FROM target_students_replace_on;

In [0]:
INSERT INTO TABLE target_students_replace_on AS t
  REPLACE ON (t.country = s.country AND t.country = 'UK')
  VALUES ('UK', 'source'), ('FR', 'source') AS s(country, table_origin);

SELECT * FROM target_students_replace_on;

## Summary

### Key Takeaways:

1. **No Spark conf needed** - `spark.sql.sources.partitionOverwriteMode=dynamic` is NOT required
2. **All data layouts** - Partitioned tables as well as Liquid Clustered tables
3. **Works on Serverless** - Both Serverless Jobs and Serverless SQL support this natively
4. **Automatic detection** - Databricks automatically determines which partitions/clusters to overwrite based on the data
---
* With INSERT REPLACE USING, customers can now do Dynamic Data Overwrite based on
    * Any set of columns
    * Any table's data layout
    * Across all compute types

* Customers can now do dbt-databricksâ€™ insert_overwrite with
    * DBSQL
    * Liquid Clustering
