

---

# 📌 Deep Dive into COPY Command Options

---

## 1. **Column Options: TRUNCATECOLUMNS & ENFORCE\_LENGTH**

When we load files, sometimes **data length mismatches** happen.

### 🔹 `TRUNCATECOLUMNS`

* **What it does**:
  If a value is **longer than the target column length**, Snowflake **truncates** it instead of throwing an error.

* **Scenario:**
  You have a table:

  ```sql
  CREATE OR REPLACE TABLE customers (
    customer_id INT,
    name VARCHAR(10)  -- max 10 characters
  );
  ```

  Your file contains:

  ```
  1,Jonathan
  2,Alex
  3,Christopher
  ```

  * If you run COPY with **no special option**:
    Row with `Christopher` (11 chars) → ❌ rejected.

  * With `TRUNCATECOLUMNS = TRUE`:

    ```sql
    COPY INTO customers
    FROM @my_s3_stage/customers.csv
    FILE_FORMAT = (TYPE = CSV)
    TRUNCATECOLUMNS = TRUE;
    ```

    👉 `Christopher` becomes `Christophe` (first 10 chars).

* **Default?**
  ❌ By default, `TRUNCATECOLUMNS` is **FALSE** (rows get rejected).

---

### 🔹 `ENFORCE_LENGTH`

* **What it does**:
  Ensures that values that **don’t fit the target column length** are **always rejected**, even if TRUNCATECOLUMNS is enabled.

* **Scenario:**
  Using the same `customers` table:

  ```sql
  COPY INTO customers
  FROM @my_s3_stage/customers.csv
  FILE_FORMAT = (TYPE = CSV)
  TRUNCATECOLUMNS = TRUE
  ENFORCE_LENGTH = TRUE;
  ```

  👉 In this case:

  * `Christopher` will ❌ still be rejected.
  * Why? Because `ENFORCE_LENGTH` says: “Do not silently truncate, throw it out.”

* **Default?**
  ✅ By default, `ENFORCE_LENGTH` is **TRUE** (rows longer than column length are rejected).

---

### ✅ Quick Summary:

* `TRUNCATECOLUMNS = TRUE` → Cuts values to fit.
* `ENFORCE_LENGTH = TRUE` → Rejects values that don’t fit.
* Default:

  * `TRUNCATECOLUMNS = FALSE`
  * `ENFORCE_LENGTH = TRUE`

👉 So, by default: rows get rejected on mismatch.

---

## 2. **FORCE Option**

### 🔹 What it does:

By default, Snowflake **remembers which files were already loaded** (to avoid duplicates).

* If you try to load the **same file again**, Snowflake will skip it.
* `FORCE = TRUE` tells Snowflake:
  👉 “Load it again, even if it was already loaded.”

---

### 🔹 Scenario

You have file `sales_20250912.csv` with 100 rows.

1. First load:

   ```sql
   COPY INTO sales_table
   FROM @my_s3_stage/sales/
   FILE_FORMAT = (TYPE = CSV);
   ```

   ✅ All 100 rows are loaded.

2. Run again **without FORCE**:

   ```sql
   COPY INTO sales_table
   FROM @my_s3_stage/sales/
   FILE_FORMAT = (TYPE = CSV);
   ```

   👉 Snowflake checks its load history, sees the file already processed → skips it.

3. Run again **with FORCE**:

   ```sql
   COPY INTO sales_table
   FROM @my_s3_stage/sales/
   FILE_FORMAT = (TYPE = CSV)
   FORCE = TRUE;
   ```

   👉 File is reloaded.
   Now your table has **200 rows** (duplicates).

---

### ✅ When to use FORCE?

* Debugging / Testing (when you want to reload the same file).
* Backfill scenarios.
* Avoid in production unless you want duplicates!

---

## 3. **PURGE Option**

### 🔹 What it does:

After successfully loading files, Snowflake **deletes them from the stage** if `PURGE = TRUE`.

* Works with **internal stages** (Snowflake-managed) or **named external stages** pointing to S3/GCS/Azure.

---

### 🔹 Scenario

You load files from `@my_s3_stage/sales/`.

1. **Without PURGE** (default):

   * Files remain in the stage.
   * You may accidentally reload them later if FORCE is used.

   ```sql
   COPY INTO sales_table
   FROM @my_s3_stage/sales/
   FILE_FORMAT = (TYPE = CSV);
   ```

   👉 Files are still sitting in the stage.

2. **With PURGE**:

   ```sql
   COPY INTO sales_table
   FROM @my_s3_stage/sales/
   FILE_FORMAT = (TYPE = CSV)
   PURGE = TRUE;
   ```

   👉 After successful load, Snowflake **removes the files** from the stage.

   * This saves storage cost.
   * Ensures no accidental reloading.

---

### ✅ When to use PURGE?

* If you treat the stage as **temporary storage**.
* If you want files **automatically cleaned up** after loading.

❌ Don’t use if:

* You need to keep raw files for auditing / replay.

---

# 🎯 Quick Recap

* **TRUNCATECOLUMNS** → Cuts values to fit column (default = FALSE).
* **ENFORCE\_LENGTH** → Rejects too-long values (default = TRUE).
* **FORCE** → Reloads already loaded files (default = FALSE).
* **PURGE** → Deletes files from stage after successful load (default = FALSE).

---

# 📌 Must-Ask Questions

1. What’s the difference between `TRUNCATECOLUMNS` and `ENFORCE_LENGTH`?
2. If both are set (`TRUNCATECOLUMNS=TRUE` and `ENFORCE_LENGTH=TRUE`), which one wins?
3. What happens if you load the same file twice without FORCE?
4. What’s the risk of using `FORCE = TRUE` in production?
5. How does PURGE affect data reprocessing strategies?
6. Can PURGE be used on external stages (like S3)? What happens in that case?

---


---

# 📌 Must-Ask Questions on TRUNCATECOLUMNS, ENFORCE\_LENGTH, FORCE, PURGE

---

### 1. **What’s the difference between `TRUNCATECOLUMNS` and `ENFORCE_LENGTH`?**

* **TRUNCATECOLUMNS = TRUE**
  → Cuts (truncates) values longer than column length instead of rejecting them.

* **ENFORCE\_LENGTH = TRUE**
  → Rejects values that don’t fit the column, even if truncation is possible.

👉 **Scenario:**
Table column is `VARCHAR(10)`.
Row = `Christopher` (11 chars).

* With `TRUNCATECOLUMNS=TRUE, ENFORCE_LENGTH=FALSE` → `Christopher` becomes `Christophe`.
* With `ENFORCE_LENGTH=TRUE` (default) → ❌ row rejected.

✅ **Key difference**:

* TRUNCATE → “I’ll cut it and keep it.”
* ENFORCE → “Nope, doesn’t fit, reject it.”

---

### 2. **If both are set (`TRUNCATECOLUMNS=TRUE` and `ENFORCE_LENGTH=TRUE`), which one wins?**

👉 `ENFORCE_LENGTH` **takes precedence**.

So even if you enable truncation, Snowflake will **still reject the row** if it doesn’t fit.

**Think of it like this:**

* TRUNCATECOLUMNS is a lenient parent.
* ENFORCE\_LENGTH is a strict parent.
* When both are present → strict parent wins.

---

### 3. **What happens if you load the same file twice without FORCE?**

By default, Snowflake tracks file load history (for 64 days).

* First load → file is processed ✅.
* Second load without `FORCE` → file is **skipped** (to prevent duplicates).

👉 Only new files get processed.

---

### 4. **What’s the risk of using `FORCE = TRUE` in production?**

* Snowflake will **reload already loaded files**.
* This can cause **duplicate rows** in your table.
* If your pipeline doesn’t handle deduplication → analytics will be wrong.

👉 **Scenario:**
Sales data file has 100 transactions.

* First load = 100 rows.
* Second load with FORCE = another 100 rows.
* Now you see 200 rows (duplicate transactions).

⚠️ Big risk in financial/transactional systems.

---

### 5. **How does PURGE affect data reprocessing strategies?**

* With `PURGE = TRUE`:

  * Files are **deleted from stage** after successful load.
  * You **cannot reload** the same files later (unless you re-upload them).
  * Saves storage cost, but limits flexibility.

* With `PURGE = FALSE` (default):

  * Files remain in stage.
  * You can reload or debug later.
  * But you need manual cleanup to avoid storage bloat.

👉 **Impact on reprocessing**:

* If you expect **audits / replays / debugging**, don’t use PURGE.
* If you treat stage as **temporary landing zone**, PURGE is safe.

---

### 6. **Can PURGE be used on external stages (like S3)? What happens in that case?**

Yes ✅ you can use PURGE on external stages.

👉 **What happens:**

* Snowflake issues a **delete request** to the external storage (S3, GCS, Azure).
* The file is removed from the bucket/container after load.

⚠️ Be careful:

* This means Snowflake is managing deletion outside its own storage.
* If your data lake team also uses that S3 bucket, you might delete files they still need.

---

# 🎯 Quick Recap (Easy to Remember)

1. TRUNCATECOLUMNS = cut, ENFORCE\_LENGTH = reject (strict wins if both).
2. Without FORCE → files load once only.
3. With FORCE → duplicates risk.
4. PURGE = auto-delete after load → good for temp stages, risky for audit needs.
5. PURGE on external stages = deletes from S3/GCS/Azure too.

---
