
**Scenario (Retail-X):** hourly `orders_YYYYMMDD.csv` files land in `s3://retailx-raw/orders/`. Some rows occasionally contain bad dates or non-numeric totals. You want to load everything that’s good, capture the bad rows in a place you can fix, then re-load only the fixed rows — safely, repeatably.

---

# TL;DR (one-line plan)

1. Create file format + stage.
2. Validate files first (`VALIDATION_MODE='RETURN_ERRORS'`) to see problems.
3. Load with `ON_ERROR='CONTINUE'` to let good rows in.
4. Capture error metadata with `VALIDATE(...)` or `RESULT_SCAN(LAST_QUERY_ID())`.
5. Extract actual bad rows (using `TRY_` functions when needed), write them to an internal stage or error table, fix them, then re-load only those fixed rows.
   (Examples follow.) ([Snowflake Documentation][1])

---

# A — Setup (file format, stage, table) — copy/paste these (edit names)

```sql
-- 1) File format for CSV
CREATE OR REPLACE FILE FORMAT retailx_csv_fmt
  TYPE = CSV
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  TRIM_SPACE = TRUE
  NULL_IF = ('', 'NULL', 'null')
  COMPRESSION = 'AUTO';

-- 2) External stage (assumes you already created STORAGE_INTEGRATION retailx_s3_int)
CREATE OR REPLACE STAGE retailx_orders_stage
  URL = 's3://retailx-raw/orders/'
  STORAGE_INTEGRATION = retailx_s3_int
  FILE_FORMAT = retailx_csv_fmt;

-- 3) Target (production) table
CREATE OR REPLACE TABLE retailx_orders (
  order_id    INTEGER,
  customer_id STRING,
  created_at  TIMESTAMP_NTZ,
  total_usd   NUMBER(10,2)
);
```

(If you don’t have `retailx_s3_int` yet, create it as a Storage Integration — we covered that flow earlier.) ([Snowflake Documentation][1])

---

# B — Step 1: **Validate** files (dry run) — find every error without loading

**Why first?** `VALIDATION_MODE` scans files and returns row-level error details — you learn what to fix before touching production tables. Use this when you want to inspect problems first. Example:

```sql
COPY INTO retailx_orders
  FROM @retailx_orders_stage
  FILE_FORMAT = (FORMAT_NAME = 'retailx_csv_fmt')
  PATTERN = '.*orders_20250828.*[.]csv'
  VALIDATION_MODE = 'RETURN_ERRORS';
```

* This returns a result-set listing each error (error message, file, line, column, row number). You can capture that result with `RESULT_SCAN(LAST_QUERY_ID())` or create a table from it. (Snowflake docs show the `VALIDATION_MODE` behavior.) ([Snowflake Documentation][1])

Example: save the validation output to an error-table:

```sql
CREATE OR REPLACE TABLE retailx_orders_validation_errors AS
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
```

`RESULT_SCAN(LAST_QUERY_ID())` reads the result set returned by the previous `COPY ... VALIDATION_MODE` command. Use that table to inspect full error details (error text, file, row, column). ([Snowflake Documentation][2])

**Important caveat:** `VALIDATION_MODE` does *not* support `COPY` statements that perform SQL transformations during the load (e.g., `COPY INTO table (SELECT ...)`). If you must transform during load, do a validation strategy using a raw landing table or ad-hoc queries — see next sections. ([Snowflake Documentation][1])

---

# C — Step 2: Load the good rows — `ON_ERROR = 'CONTINUE'` (what it means)

```sql
COPY INTO retailx_orders
  FROM @retailx_orders_stage
  FILE_FORMAT = (FORMAT_NAME = 'retailx_csv_fmt')
  PATTERN = '.*orders_20250828.*[.]csv'
  ON_ERROR = 'CONTINUE';
```

**What `ON_ERROR = 'CONTINUE'` does**

* If a row in a file causes an error (parsing, type conversion, null into non-nullable, etc.), Snowflake **skips that row** and continues loading the remaining rows from that file and the subsequent files. It does **not** abort the whole command. This is different from the default `ABORT_STATEMENT` (which stops at first error) and `SKIP_FILE` (which skips the entire file on the first error). Use `CONTINUE` when you want to ingest all good rows while still discovering bad rows. ([Snowflake Documentation][3])

**After this run you will have:**

* Good rows inserted into `retailx_orders`.
* The load result (query) will report counts of rows loaded vs. errors.
* Metadata about the load is available via `TABLE(VALIDATE(...))`, `COPY_HISTORY`, `LOAD_HISTORY`, or by capturing `LAST_QUERY_ID()` results. ([Snowflake Documentation][4])

---

# D — Step 3: Collect the error details after the LOAD

You have two useful choices depending on whether you want *metadata about errors* or the *actual raw rows that failed*.

## D.1 — Metadata / error reasons (fast, built-in)

If you executed the `COPY` with `ON_ERROR='CONTINUE'`, capture all errors for that load run:

```sql
-- Immediately after your COPY command:
CREATE OR REPLACE TABLE retailx_orders_load_errors AS
SELECT * FROM TABLE(VALIDATE('retailx_orders', JOB_ID => LAST_QUERY_ID()));
```

`VALIDATE(table, JOB_ID => ...)` returns the errors encountered during the COPY job — one row per error with details (error message, file, line, column, etc.). Save it to a table for triage. ([Snowflake Documentation][5])

## D.2 — Capture the actual **raw rows** that failed (recommended if you want to fix & re-load only those rows)

`VALIDATE`/`RETURN_ERRORS` give you **error metadata**, but they don’t always return the exact raw CSV text in a convenient column. To get the raw problem rows, query the stage with `TRY_` functions to detect rows that would fail when cast to the target types — then save those rows to a table or directly export them to a stage file.

Example: detect bad rows by using `TRY_CAST` / `TRY_TO_TIMESTAMP` (these return `NULL` instead of error):

```sql
CREATE OR REPLACE TABLE retailx_orders_bad_raw AS
SELECT
  t.$1::STRING AS order_id_raw,
  t.$2::STRING AS customer_id_raw,
  t.$3::STRING AS created_at_raw,
  t.$4::STRING AS total_usd_raw,
  METADATA$FILENAME           AS source_file,
  METADATA$FILE_ROW_NUMBER    AS source_row_num
FROM @retailx_orders_stage (FILE_FORMAT => 'retailx_csv_fmt') t
WHERE TRY_CAST(t.$1 AS INTEGER) IS NULL                       -- order_id not int
   OR TRY_TO_TIMESTAMP(t.$3, 'YYYY-MM-DD HH24:MI:SS') IS NULL -- bad date
   OR TRY_CAST(t.$4 AS NUMBER) IS NULL;                       -- bad number
```

Notes:

* When you query a stage directly, positional columns are `$1,$2,...`. You can reference `METADATA$FILENAME` and `METADATA$FILE_ROW_NUMBER` to know exactly which file/row the problem came from. ([Snowflake Documentation][6], [The Information Lab Nederland][7])

### Export those bad rows to a file (optional)

If you prefer to fix rows offline or send to a data-fixer team, write them to an internal stage:

```sql
-- create a named internal stage for error files (if needed)
CREATE OR REPLACE STAGE retailx_error_stage;

-- unload the bad rows (results) to files in that internal stage
COPY INTO @retailx_error_stage/errors_
FROM ( SELECT * FROM retailx_orders_bad_raw )
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = ','  HEADER = TRUE);
```

Now you can `GET` these files from the internal stage, hand-fix them, re-stage them (to S3 or to the internal stage) and load them separately. `COPY INTO <location>` from a SELECT is supported. ([Snowflake Documentation][8])

---

# E — Step 4: Fix & re-load only the bad rows (safe ways)

You have two strong options — pick based on scale and automation needs.

## Option 1 — **Recommended for production**: Raw-landing → transform workflow (idempotent, easiest to reprocess)

1. **Load everything** into a `raw` table with loose schema (all `VARIANT` or `VARCHAR`) so the COPY **never fails** due to type mismatch.

   ```sql
   CREATE OR REPLACE TABLE raw_orders_rawcols (
     src_file STRING,
     file_row_number NUMBER,
     col1 STRING, col2 STRING, col3 STRING, col4 STRING,
     ingested_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
   );
   -- COPY INTO raw table (no type conversion) so errors are avoided or small:
   COPY INTO raw_orders_rawcols
     FROM @retailx_orders_stage (FILE_FORMAT => 'retailx_csv_fmt')
     FILE_FORMAT = (FORMAT_NAME='retailx_csv_fmt')
     ON_ERROR = 'CONTINUE';
   ```
2. Use SQL (with `TRY_` functions) to validate/clean rows inside Snowflake, INSERT clean rows into `retailx_orders`, and write the unfixable rows into `retailx_orders_error` for manual remediation.

   ```sql
   INSERT INTO retailx_orders
   SELECT
     TRY_CAST(col1 AS INTEGER),
     col2,
     TRY_TO_TIMESTAMP(col3,'YYYY-MM-DD HH24:MI:SS'),
     TRY_CAST(col4 AS NUMBER(10,2))
   FROM raw_orders_rawcols
   WHERE TRY_CAST(col1 AS INTEGER) IS NOT NULL
     AND TRY_TO_TIMESTAMP(col3,'YYYY-MM-DD HH24:MI:SS') IS NOT NULL
     AND TRY_CAST(col4 AS NUMBER) IS NOT NULL;

   CREATE OR REPLACE TABLE retailx_orders_error AS
   SELECT * FROM raw_orders_rawcols
   WHERE NOT (TRY_CAST(col1 AS INTEGER) IS NOT NULL
          AND TRY_TO_TIMESTAMP(col3,'YYYY-MM-DD HH24:MI:SS') IS NOT NULL
          AND TRY_CAST(col4 AS NUMBER) IS NOT NULL);
   ```

**Why recommended:** no duplicate risk, easy reprocessing, easier to build automated cleaning transforms (SQL), and you keep raw immutable data for replay. (This is the most robust production pattern.)

## Option 2 — Lightweight ad-hoc: fix raw bad files and re-stage

1. Use the `retailx_orders_bad_raw` table from step D.2 or the exported CSV on `@retailx_error_stage`.
2. Fix the CSV rows (either manually or with a script).
3. Re-stage the corrected file with a new filename (so checksum changes), e.g. `orders_20250828_fixed.csv` in S3 or in internal stage.
4. Run `COPY INTO retailx_orders FROM @retailx_orders_stage (FILES=('orders_20250828_fixed.csv'))` to load only that file.

**Important:** do **not** re-run `COPY` on the original file without changing the filename OR without using `FORCE=TRUE` (but `FORCE=TRUE` will re-load the whole file and duplicate already-loaded good rows). So prefer creating a new corrected file or load fixed rows directly from an error table into the main table via `INSERT`. ([Snowflake Documentation][1])

---

# F — Audit and monitoring (where to look for load details)

* Use `TABLE(VALIDATE(table, JOB_ID => '<query_id>'))` or `TABLE(RESULT_SCAN(LAST_QUERY_ID()))` after validation loads to see row-level errors. ([Snowflake Documentation][5])
* Use `INFORMATION_SCHEMA.LOAD_HISTORY` or the `COPY_HISTORY` table function to see file-level load results and counts for the last 14 days. Example:

```sql
SELECT *
FROM TABLE(COPY_HISTORY(DATEADD('day', -2, CURRENT_TIMESTAMP()), CURRENT_TIMESTAMP()))
WHERE table_name = 'RETAILX_ORDERS'
ORDER BY last_load_time DESC;
```

These show which files were processed, rows loaded, and error counts. ([Snowflake Documentation][4])

---

# G — Short FAQ (quick answers)

Q — *Does `ON_ERROR = 'CONTINUE'` record the skipped row text?*
A — It records error metadata (line/column/error) that you can get via `VALIDATE` or `VALIDATION_MODE`; but if you want the exact original row fields, best to query the stage (positional `$1,$2...`) with `TRY_` functions and capture the raw row into a table or stage. ([Snowflake Documentation][5])

Q — *Can I automate this whole fix-and-reload?*
A — Yes — wrap the steps in a stored procedure or Snowflake Task: 1) `VALIDATE` or `COPY` with `CONTINUE`, 2) `VALIDATE(...)` → store errors, 3) generate error file or error table, 4) run cleansing stored proc / external job to fix errors, 5) stage corrected files and `COPY` them in. You can also use Snowpipe for continuous loads and monitor error outputs similarly. ([Snowflake Documentation][9])

Q — *Should I ever use `FORCE=TRUE` to reload the same file after fixing it?*
A — Only if you’re sure you want to re-load **all** rows in that file (and deduplicate later). Prefer staging corrected rows as new files or using the raw-landing & SQL-cleanse approach to avoid duplicates. ([Snowflake Documentation][1])

---

# H — Recommended production pattern (summary)

* **Always** validate files first (especially the first time a feed runs). `VALIDATION_MODE='RETURN_ERRORS'` is low cost and prevents surprises. ([Snowflake Documentation][1])
* **Prefer** landing raw data in a raw table (all strings/variants). Use idempotent transformation SQL to push clean data to final tables and write bad rows to an error table for human review. This is robust and easily automated.
* If you must operate ad-hoc, **extract bad rows** using SELECT from stage with `TRY_` functions, write them to an internal stage or table, fix them, and reload only the corrected file(s).
* Use `VALIDATE` / `RESULT_SCAN(LAST_QUERY_ID())` to capture error metadata and keep a persistent error log table for triage.

---

# References / docs (most relevant)

* `COPY INTO <table>` (VALIDATION\_MODE, ON\_ERROR, PATTERN, PURGE). ([Snowflake Documentation][1])
* `VALIDATE(table, JOB_ID => ...)` table function (returns all errors for a past COPY). ([Snowflake Documentation][5])
* `RESULT_SCAN(LAST_QUERY_ID())` to capture resultsets of the last command. ([Snowflake Documentation][2])
* `COPY_HISTORY` / load-history docs for auditing. ([Snowflake Documentation][4])
* Querying staged file metadata (`METADATA$FILENAME`, `METADATA$FILE_ROW_NUMBER`). ([Snowflake Documentation][6])

---


[1]: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table "COPY INTO <table> | Snowflake Documentation"
[2]: https://docs.snowflake.com/en/sql-reference/functions/result_scan?utm_source=chatgpt.com "RESULT_SCAN - Snowflake Documentation"
[3]: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table?utm_source=chatgpt.com "COPY INTO <table> | Snowflake Documentation"
[4]: https://docs.snowflake.com/en/sql-reference/functions/copy_history?utm_source=chatgpt.com "COPY_HISTORY - Snowflake Documentation"
[5]: https://docs.snowflake.com/en/sql-reference/functions/validate "VALIDATE | Snowflake Documentation"
[6]: https://docs.snowflake.com/en/user-guide/querying-stage?utm_source=chatgpt.com "Querying Data in Staged Files - Snowflake Documentation"
[7]: https://www.theinformationlab.nl/2022/08/26/snowflake-skills-3-metadata/?utm_source=chatgpt.com "Snowflake Skills #3 - Metadata - The Information Lab Nederland"
[8]: https://docs.snowflake.com/en/sql-reference/sql/copy-into-location?utm_source=chatgpt.com "COPY INTO <location> - Snowflake Documentation"
[9]: https://docs.snowflake.com/en/user-guide/data-load-bulk-ts?utm_source=chatgpt.com "Troubleshooting bulk data loads - Snowflake Documentation"
