

---

# 1) High-level overview — what Snowpipe is and why it exists

**What is Snowpipe?**
Snowpipe is Snowflake’s *serverless continuous file ingestion* service. It watches a stage (internal or external like AWS S3) and loads new files into Snowflake automatically (micro-batches), following the `COPY INTO` logic defined in a named **PIPE**. Use it when you want new files available for analytics within minutes, without scheduling big batch COPY jobs. ([Snowflake Documentation][1])

**Purpose / problem Snowpipe solves**

* Removes manual/scheduled ETL runs and long latency between file arrival and availability in the datawarehouse.
* Automates ingest so data consumers (dashboards, ML features) see fresh data quickly.
* Offloads load compute to Snowflake’s serverless ingestion service — you don’t manage a warehouse for the ingestion itself. ([Snowflake Documentation][1])

**What problem it *doesn’t* solve** (common wrong assumption)

* It’s not a full streaming message platform like Kafka — it ingests files (micro-batches) or supports Snowpipe Streaming for near-real-time client streaming. For record-level, real-time event processing you might prefer a streaming solution + Snowpipe Streaming or Snowflake Streams + Tasks. ([Snowflake Documentation][2])

---

# 2) The story / real scenario (storytelling approach)

Imagine: Acme Payments collects daily transaction files from POS terminals. Each store uploads a compressed CSV every few minutes to an S3 bucket `acme-raw/transactions/`. Analysts want the transactions table updated within ~2–5 minutes of file arrival for fraud detection dashboards.

We’ll use Snowpipe so each new file in S3 is picked up and loaded into `ACME_DB.PUBLIC.TRANSACTIONS` as soon as it’s created. We’ll configure:

1. External stage that points to the S3 bucket.
2. File format for the CSV.
3. Table to hold ingested rows.
4. A PIPE object whose `COPY INTO` defines how to load.
5. Auto-ingest via S3 event notifications (SNS → SQS) using the notification channel Snowflake provides.

---

# 3) Snowpipe workflow (step by step, simplified)

1. **File appears** in S3 bucket (user SDK, app, or another service uploads file).
2. **S3 emits an event** (object created), which is sent to SNS/SQS (depending on your AWS architecture). Snowflake must be subscribed to the SQS queue or otherwise receive those events. ([Snowflake Documentation][3])
3. Snowflake receives the notification and **queues the file for ingestion** into the pipe.
4. Snowpipe (Snowflake’s serverless ingestion compute) executes the `COPY INTO` defined in the pipe for those queued files and loads rows into the target table. (Snowflake prevents duplicate loads by tracking file load metadata.) ([Snowflake Documentation][1])
5. You can monitor load history (PIPE_USAGE_HISTORY, COPY_HISTORY, UI) and respond to failures (file format issues, permission problems). ([Snowflake Documentation][4])

---

# 4) Step-by-step demo (SQL + AWS items). Put these into your real account with adjustments.

> Note: below SQL is runnable in Snowflake (change names to your account objects). For AWS, you’ll use the AWS console/CLI to create policy, SNS/SQS and S3 event notifications.

## Snowflake side — core SQL

1. **Create target table**

```sql
CREATE OR REPLACE TABLE acme_db.public.transactions (
  txn_id STRING,
  store_id STRING,
  txn_ts TIMESTAMP_NTZ,
  amount NUMBER(12,2),
  raw_json VARIANT
);
```

2. **Create file format** (CSV example)

```sql
CREATE OR REPLACE FILE FORMAT acme_db.public.ff_csv
  TYPE = 'CSV'
  FIELD_DELIMITER = ','
  SKIP_HEADER = 1
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  TRIM_SPACE = TRUE
  NULL_IF = ('NULL','');
```

3. **Create external stage pointing to S3**
   (If using S3, you can use an integration; here’s the simple stage example with credentials — in prod use STORAGE INTEGRATION + IAM roles.)

```sql
CREATE OR REPLACE STAGE acme_db.public.s3_stage
  URL='s3://acme-raw/transactions'
  FILE_FORMAT = ff_csv
  -- best practice: use storage integration instead of credentials
  CREDENTIALS=(AWS_KEY_ID='XXXX' AWS_SECRET_KEY='YYYY');
```

Or use a Storage Integration (recommended) — Snowflake docs show steps to create and then configure IAM role/policy. ([Snowflake Documentation][3])

4. **Create the pipe** (this is the Snowpipe object)

```sql
CREATE OR REPLACE PIPE acme_db.public.pipe_transactions
  AUTO_INGEST = TRUE
  AS
  COPY INTO acme_db.public.transactions
  FROM @acme_db.public.s3_stage
  FILE_FORMAT = (FORMAT_NAME = acme_db.public.ff_csv)
  ON_ERROR = 'CONTINUE';  -- choose behaviour you want
```

Notes:

* `AUTO_INGEST = TRUE` enables automatic notifications to trigger ingestion (you must configure S3 notifications/SQS properly). ([Snowflake Documentation][5])

5. **Get the pipe’s notification channel value** (you’ll need this to configure SQS)

```sql
DESC PIPE acme_db.public.pipe_transactions;
-- In the output look for NOTIFICATION_CHANNEL (it provides an AWS ARN or a generated channel value)
```

You’ll copy the `notification_channel` value and use it when creating the SQS subscription or in the S3 notification config (details below). ([InterWorks][6])

## AWS side — key steps (high level)

1. **Create an IAM role / policy** that allows Snowflake to read the S3 bucket (if using Storage Integration) — official Snowflake doc has the exact JSON for the IAM policy. Grant `s3:GetObject`, `s3:ListBucket` on bucket. ([Snowflake Documentation][3])

2. **Create SNS topic or SQS queue** — common pattern:

   * Create SNS topic.
   * Create SQS queue and subscribe it to the SNS topic.
   * Configure the S3 bucket to send `s3:ObjectCreated:*` notifications to the SNS topic (or directly to SQS).
   * Add a policy to the SQS queue to allow the SNS topic publish and to allow Snowflake’s `notification_channel` ARN to send messages (the notification_channel is used to validate).

3. **Set S3 event notifications** for relevant events (`s3:ObjectCreated:Put`, and **CompleteMultipartUpload** for large files) — if you miss `CompleteMultipartUpload`, multipart uploads may not trigger ingestion. This is a very common gotcha. ([Snowflake Documentation][7])

4. **Provide/verify the notification channel**: when `AUTO_INGEST = TRUE`, Snowflake assigns a notification channel (SQS ARN or channel token). `DESC PIPE` shows it. Add that as a trusted publisher/subscriber in the SQS/SNS configuration so Snowflake will receive events. Many guides show copying `notification_channel` into the SQS configuration. ([InterWorks][6])

## Alternative to auto notifications: REST API / insertFiles

If you cannot rely on S3 events, you can programmatically call Snowpipe REST API `insertFiles` (via Snowflake-provided client SDKs) to tell Snowflake which files to load. That’s useful when your system knows the file names and wants to notify Snowflake directly. ([Snowflake Documentation][8])

---

# 5) Important operational commands & day-to-day queries (cheat sheet)

These are the queries you’ll use every day to manage/monitor Snowpipe:

* **Show pipes**

```sql
SHOW PIPES IN DATABASE acme_db;
```

* **Describe pipe** (copy statement, notification channel)

```sql
DESC PIPE acme_db.public.pipe_transactions;
```

> Look for `NOTIFICATION_CHANNEL` in the output. ([Snowflake Documentation][5])

* **Get pipe status** (JSON)

```sql
SELECT SYSTEM$PIPE_STATUS('acme_db.public.pipe_transactions');
```

> Useful to see `executionState` like `PAUSED`, `FAILING_OVER`, etc. ([Snowflake Documentation][9])

* **Pause/resume a pipe**

```sql
ALTER PIPE acme_db.public.pipe_transactions SET PIPE_EXECUTION_PAUSED = TRUE;
ALTER PIPE acme_db.public.pipe_transactions SET PIPE_EXECUTION_PAUSED = FALSE;
```

* **Check file load history (short window)**
  *Table function* for recent loads (15 days window, increments vary)

```sql
SELECT * 
FROM TABLE(INFORMATION_SCHEMA.PIPE_USAGE_HISTORY(
   DATE_RANGE_START => DATEADD('day', -1, CURRENT_TIMESTAMP()),
   DATE_RANGE_END => CURRENT_TIMESTAMP()
));
```

or account-level:

```sql
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY
WHERE PIPE_NAME = 'ACME_DB.PUBLIC.PIPE_TRANSACTIONS'
  AND USAGE_DATE >= '2025-10-01';
```

(Use the account-usage view for up to 365 days.) ([Snowflake Documentation][4])

* **COPY_HISTORY / LOAD_HISTORY** — to inspect details of COPY statements and file loads (helpful after manual COPY or troubleshooting).

* **Show failed loads**: check `LOAD_HISTORY` / `COPY_HISTORY` / pipe’s error messages (Snowsight also shows this visually). ([Snowflake Documentation][10])

---

# 6) Costs and compute behaviour (short, essential)

* **Serverless ingestion**: Snowpipe uses Snowflake-managed serverless compute for ingestion — you don’t need to configure a virtual warehouse to run Snowpipe (unlike manual `COPY INTO` where a warehouse is used). There is a separate billing model for Snowpipe: Snowpipe is billed per GB (newer simpler model: fixed credits per GB ingested) and other serverless attributes (older models included credits per file). Monitor `PIPE_USAGE_HISTORY` and billing lines to estimate cost. Always plan for cost monitoring because continuous micro-batches can add up. ([Snowflake Documentation][11])

---

# 7) Common pitfalls & troubleshooting (with solutions)

1. **No ingestion after upload**

   * Missing S3 event type: large multipart uploads need `S3:ObjectCreated:CompleteMultipartUpload`. Add it. ([Snowflake Documentation][7])
   * Wrong SQS policy or you didn’t copy the `notification_channel` correctly. Re-run `DESC PIPE` and reconfigure the SQS policy. ([InterWorks][6])

2. **Duplicate loads**

   * Snowpipe tracks files by name + metadata. If you overwrite files with same name in S3, Snowpipe may or may not reingest depending on metadata — best practice: write files with unique names (e.g., include UUID or timestamp) and use dedupe in downstream processing.

3. **Permission issues**

   * Ensure Snowflake has permission to read the S3 bucket (if using credentials or storage integration IAM role). Use the exact IAM policy from Snowflake docs. ([Snowflake Documentation][3])

4. **High bill / unexpected costs**

   * Check `PIPE_USAGE_HISTORY` and `ACCOUNT_USAGE.PIPE_USAGE_HISTORY` for billed bytes and credits. External processes (external tables auto metadata refreshes, etc.) can also show as “NULL” pipe entries — investigate if you see unexpected costs. ([Snowflake Documentation][12])

5. **File format errors**

   * If `COPY INTO` errors, set `ON_ERROR` to `CONTINUE` or route failing files to an error table; inspect `COPY_HISTORY` for error messages.

---

# 8) Snowpipe advanced topics (brief pointers)

* **Snowpipe Streaming** — separate capability for very low latency client streaming ingestion; billed differently and supports near-real-time streaming clients. Use when you need continuous record-by-record ingestion rather than file micro-batches. ([Snowflake Documentation][2])

* **REST API insertFiles** — useful when you want to control which files to ingest (push notification from your app). Works with Snowflake SDKs (Java/Python). ([Snowflake Documentation][8])

* **Snowsight UI** — visual management & lineage for pipelines/pipes; shows when pipes stall or fail and gives you tracing capability. Helpful for operations. ([Snowflake Documentation][10])

---

# 9) Must-know SQL snippets (copy-paste friendly)

Create a minimal pipe (example):

```sql
CREATE OR REPLACE PIPE mydb.public.my_pipe AUTO_INGEST = TRUE AS
COPY INTO mydb.public.target_table
FROM @mydb.public.s3_stage
FILE_FORMAT = (FORMAT_NAME = mydb.public.ff_csv)
ON_ERROR = 'CONTINUE';
```

Pause a pipe:

```sql
ALTER PIPE mydb.public.my_pipe SET PIPE_EXECUTION_PAUSED = TRUE;
```

Get status:

```sql
SELECT SYSTEM$PIPE_STATUS('mydb.public.my_pipe');
```

See recent usage (10 minutes default):

```sql
SELECT * FROM TABLE(INFORMATION_SCHEMA.PIPE_USAGE_HISTORY());
```

Account usage (1 year):

```sql
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY WHERE PIPE_NAME = 'MYDB.PUBLIC.MY_PIPE' ORDER BY USAGE_DATE DESC LIMIT 100;
```

---

# 10) Self-test questions (you should be able to answer these)

(Short answers expected when you’re ready.)

1. What is the role of a `PIPE` in Snowpipe?
2. How does Snowpipe differ from a scheduled `COPY INTO` job (compute, latency, management)?
3. How do you configure Snowpipe auto-ingest for AWS S3 — what AWS objects are needed? (Be specific.)
4. Where do you find the Snowpipe `notification_channel`? How is it used?
5. What S3 event types must you include to ensure large multipart uploads are ingested?
6. How do you pause a pipe, and why might you do that?
7. How does Snowpipe billing work in the latest model? Where do you check billed bytes?
8. When would you use the Snowpipe REST API `insertFiles` instead of auto-ingest?
9. How do you diagnose a file that failed to load (what views and logs)?
10. What best practices prevent duplicate/partial loads and unexpected costs?

(If you can answer these out loud — short, correct, and concise — you’re solid.)

---

# 11) Quick best-practice checklist (operational)

* Use **Storage Integrations + IAM role** instead of embedding keys in stages. ([Snowflake Documentation][3])
* Name files uniquely (timestamp + uuid) to avoid re-ingestion ambiguity.
* Include `CompleteMultipartUpload` in S3 event config. ([Snowflake Documentation][7])
* Use `PIPE_USAGE_HISTORY` and the Account Usage views to track costs. ([Snowflake Documentation][12])
* Monitor `SYSTEM$PIPE_STATUS` and set alerts when pipes become `PAUSED` or `FAILING_OVER`. ([Snowflake Documentation][9])

---

# 12) Short troubleshooting playbook (when Snowpipe doesn’t load)

1. `SHOW PIPES; DESC PIPE <pipe>;` — confirm pipe exists and `AUTO_INGEST` true and copy statement correct. ([Snowflake Documentation][5])
2. Check S3 bucket event configuration — includes `Put` and `CompleteMultipartUpload`. ([Snowflake Documentation][7])
3. Check SQS/SNS policies and that `notification_channel` is set in SQS as expected. ([InterWorks][6])
4. Inspect `PIPE_USAGE_HISTORY` & `COPY_HISTORY` for errors, failed files and billed bytes. ([Snowflake Documentation][4])
5. If using REST API, check your `insertFiles` implementation and credentials. ([Snowflake Documentation][8])

---

# 13) Where to read / references (official docs to bookmark)

* Snowpipe introduction & concepts. ([Snowflake Documentation][1])
* Automating Snowpipe for Amazon S3 (IAM, bucket config). ([Snowflake Documentation][3])
* `CREATE PIPE` syntax & details. ([Snowflake Documentation][5])
* Snowpipe billing (fixed credits per GB). ([Snowflake Documentation][11])
* Snowpipe troubleshooting (multipart uploads, notifications). ([Snowflake Documentation][7])

---

# 14) Final guidance — how I’d teach this in a hands-on lab

1. **Lab 1 (30 mins)**: Create a simple S3 bucket + sample CSVs and a stage + pipe with `AUTO_INGEST=FALSE`. Manually `COPY INTO` to understand COPY behavior and errors.
2. **Lab 2 (45 mins)**: Enable `AUTO_INGEST=TRUE`, set up SQS+SNS and S3 notifications, use `DESC PIPE` to copy notification channel, upload files and watch `PIPE_USAGE_HISTORY`. Fix a broken file to practice troubleshooting.
3. **Lab 3 (30 mins)**: Use the `insertFiles` REST API to programmatically queue a file and compare latency and cost to auto-ingest.
4. **Lab 4 (optional)**: Try Snowpipe Streaming for a sample streaming client and compare architecture & cost.

---



[1]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-intro?utm_source=chatgpt.com "Snowpipe"
[2]: https://docs.snowflake.com/en/user-guide/snowpipe-streaming/data-load-snowpipe-streaming-overview?utm_source=chatgpt.com "Snowpipe Streaming"
[3]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3?utm_source=chatgpt.com "Automating Snowpipe for Amazon S3"
[4]: https://docs.snowflake.com/en/sql-reference/functions/pipe_usage_history?utm_source=chatgpt.com "PIPE_USAGE_HISTORY"
[5]: https://docs.snowflake.com/en/sql-reference/sql/create-pipe?utm_source=chatgpt.com "CREATE PIPE - Snowpipe"
[6]: https://interworks.com/blog/2023/02/21/automated-ingestion-from-aws-s3-into-snowflake-via-snowpipe/?utm_source=chatgpt.com "Automated Ingestion from AWS S3 into Snowflake via ..."
[7]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-ts?utm_source=chatgpt.com "Troubleshooting Snowpipe"
[8]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-rest-apis?utm_source=chatgpt.com "Snowpipe REST API"
[9]: https://docs.snowflake.com/en/sql-reference/functions/system_pipe_status?utm_source=chatgpt.com "SYSTEM$PIPE_STATUS"
[10]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-snowsight?utm_source=chatgpt.com "Manage Snowpipe in Snowsight"
[11]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-billing?utm_source=chatgpt.com "Snowpipe costs"
[12]: https://docs.snowflake.com/en/sql-reference/account-usage/pipe_usage_history?utm_source=chatgpt.com "PIPE_USAGE_HISTORY view"



---

## 1️⃣ What is the role of a `PIPE` in Snowpipe?

A **PIPE** in Snowflake is an object that defines **how files will be loaded automatically** into a target table.

It acts like a *bridge* between:

* The **stage** (where your files are stored, e.g., S3), and
* The **table** (where the data should finally land).

Inside the pipe, Snowflake stores:

* The `COPY INTO` command (which defines the file format, target table, and error handling).
* The *state* of which files have already been loaded (so duplicates aren’t reloaded).
* The *connection* to a notification channel (for auto-ingest).

So you can think of it as:
🧩 **PIPE = Automation + Metadata + COPY logic**

When Snowflake receives a notification (that a new file arrived), the **PIPE** automatically executes that `COPY INTO` internally, using Snowflake’s *serverless compute*.

---

## 2️⃣ How does Snowpipe differ from a scheduled `COPY INTO` job (compute, latency, management)?

| Feature        | **Snowpipe**                                                                 | **Scheduled COPY INTO**                                  |
| -------------- | ---------------------------------------------------------------------------- | -------------------------------------------------------- |
| **Compute**    | Uses **Snowflake-managed serverless compute** (you don’t manage a warehouse) | Requires your **own warehouse** to be up/running         |
| **Trigger**    | Event-driven (new file upload triggers ingestion)                            | Time-based (runs on a fixed schedule, e.g., hourly)      |
| **Latency**    | Low latency (data available within minutes)                                  | Higher latency (depends on schedule interval)            |
| **Management** | Fully automated                                                              | You must orchestrate scheduling, retries, and monitoring |
| **Cost model** | Charged per GB (serverless ingestion credits)                                | Warehouse credits consumed while COPY runs               |
| **Use case**   | Near real-time ingestion                                                     | Periodic or bulk loading                                 |

In short:
➡️ **Snowpipe = Continuous, automated, serverless ingestion**
➡️ **Scheduled COPY = Manual, batch, warehouse-based ingestion**

---

## 3️⃣ How do you configure Snowpipe auto-ingest for AWS S3 — what AWS objects are needed? (Be specific.)

For **auto-ingest** to work, you need AWS services to send *notifications* to Snowflake when a new file lands in S3.

Here’s the required setup (step-by-step logic):

1. **S3 Bucket** – where your raw files are uploaded.
   Example: `s3://acme-raw/transactions/`

2. **SNS Topic** – receives “ObjectCreated” events from the S3 bucket.

3. **SQS Queue** – subscribed to the SNS topic.
   (Snowflake listens to this queue to know which file arrived.)

4. **Snowpipe Notification Channel** – Snowflake provides this ARN when you create the pipe (`DESC PIPE` shows it).
   You must **authorize this notification channel ARN** to publish messages to your SQS queue.

5. **S3 Event Notification Configuration**

   * You configure the bucket to publish the following events to the SNS topic:

     * `s3:ObjectCreated:Put`
     * `s3:ObjectCreated:CompleteMultipartUpload` (very important for large files)

6. **IAM Role or Storage Integration**

   * Grants Snowflake read access to your S3 bucket.

So the flow is:

```
S3 → SNS → SQS → Snowflake (Pipe) → COPY INTO Table
```

---

## 4️⃣ Where do you find the Snowpipe `notification_channel`? How is it used?

Once your pipe is created, run:

```sql
DESC PIPE <database>.<schema>.<pipe_name>;
```

You’ll see a column named **NOTIFICATION_CHANNEL**.
Example output:

```
+----------------------+------------------------------------------------------------+
| property             | value                                                      |
+----------------------+------------------------------------------------------------+
| NOTIFICATION_CHANNEL | arn:aws:sqs:us-east-1:123456789012:snowflake_pipe_channel  |
+----------------------+------------------------------------------------------------+
```

✅ **How it’s used:**

* You copy this ARN value into your AWS SQS policy.
* It tells AWS which Snowflake account is allowed to receive S3 event notifications.
* Without linking this ARN, Snowflake won’t be able to receive messages about new files.

In other words, this channel is the **secure handshake** between your AWS bucket events and your Snowflake pipe.

---

## 5️⃣ What S3 event types must you include to ensure large multipart uploads are ingested?

You must configure the following **S3 event types**:

1. `s3:ObjectCreated:Put`
   → triggers when a standard upload completes.

2. `s3:ObjectCreated:CompleteMultipartUpload`
   → triggers when large files (uploaded in multiple parts) are completed.

⚠️ If you miss `CompleteMultipartUpload`, Snowpipe won’t detect large file uploads that use multipart upload (common with SDKs or Spark).
This is one of the most frequent misconfigurations causing “Snowpipe not ingesting my files” problems.

---

## 6️⃣ How do you pause a pipe, and why might you do that?

To **pause**:

```sql
ALTER PIPE <pipe_name> SET PIPE_EXECUTION_PAUSED = TRUE;
```

To **resume**:

```sql
ALTER PIPE <pipe_name> SET PIPE_EXECUTION_PAUSED = FALSE;
```

✅ **Why pause a pipe:**

* You are troubleshooting an ingestion issue and don’t want more files to load.
* You need to make schema or format changes (e.g., table structure change).
* You want to stop ingestion temporarily for maintenance or cost control.

When paused, Snowflake won’t process new notifications until you resume it.

---

## 7️⃣ How does Snowpipe billing work in the latest model? Where do you check billed bytes?

### 🔹 Billing model

Snowpipe uses **serverless compute**, so you’re billed:

* Per **GB ingested** (fixed rate per GB, regardless of number of files)
* Or for **serverless compute seconds** (older model, depending on your account edition)

You **don’t pay for a warehouse**, since Snowflake manages ingestion internally.

### 🔹 Where to check billed usage

You can monitor Snowpipe usage and cost through:

1. **ACCOUNT_USAGE.PIPE_USAGE_HISTORY**

   ```sql
   SELECT *
   FROM SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY
   WHERE PIPE_NAME = '<pipe_name>'
   ORDER BY USAGE_DATE DESC;
   ```

   → Shows bytes loaded, credits billed, and execution time.

2. **Snowsight / Classic UI → Usage → Serverless Features → Snowpipe**
   → Shows detailed billing by pipe.

---

## 8️⃣ When would you use the Snowpipe REST API `insertFiles` instead of auto-ingest?

You use the **REST API (insertFiles)** when:

* You **can’t configure AWS S3 event notifications**, e.g., due to organizational restrictions.
* You want your **own application or Airflow DAG** to control ingestion (manually trigger it when a file is ready).
* You have **custom naming or timing logic** (for example, only ingest files after validation).
* You’re ingesting from **non-AWS sources** (like Azure Blob or GCP Storage without event notifications).

With `insertFiles`, your app directly calls:

```
POST /v1/data/pipes/{pipeName}/insertFiles
```

and passes a list of file names.
Snowflake then queues those files for ingestion.

---

## 9️⃣ How do you diagnose a file that failed to load (what views and logs)?

To troubleshoot ingestion issues:

1. **PIPE_USAGE_HISTORY**

   ```sql
   SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY
   WHERE PIPE_NAME = '<pipe_name>' ORDER BY USAGE_DATE DESC;
   ```

   → Shows high-level pipe activity, success/failure counts.

2. **COPY_HISTORY**

   ```sql
   SELECT * FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
       TABLE_NAME => '<table_name>',
       START_TIME => DATEADD('hour', -1, CURRENT_TIMESTAMP())
   ));
   ```

   → Shows details of every file loaded, including errors.

3. **LOAD_HISTORY / PIPE_EXECUTION_HISTORY**
   → Useful for identifying partially loaded or skipped files.

4. **Snowsight UI → Databases → Pipes → [your pipe] → History tab**
   → Graphical view of successes and failures with error details.

Common errors you’ll see:

* File format mismatch
* Missing columns
* Truncated values
* Access denied (S3 permission issue)

---

## 🔟 What best practices prevent duplicate/partial loads and unexpected costs?

Here’s the golden checklist:

| Category               | Best Practice                                                                                               |
| ---------------------- | ----------------------------------------------------------------------------------------------------------- |
| **File naming**        | Always use **unique filenames** (include timestamp or UUID). Never overwrite files.                         |
| **Error handling**     | Use `ON_ERROR = 'CONTINUE'` or route bad rows to an error table.                                            |
| **Event config**       | Include both `Put` and `CompleteMultipartUpload` events in S3 config.                                       |
| **Permissions**        | Use **Storage Integrations** and IAM roles (avoid hard-coded keys).                                         |
| **Monitoring**         | Query `PIPE_USAGE_HISTORY` weekly for cost trends.                                                          |
| **Schema changes**     | Pause pipes before altering table schema.                                                                   |
| **Batch optimization** | Compress small files (e.g., gzip or merge small batches) — too many tiny files increase ingestion overhead. |
| **Retention**          | Use `COPY_HISTORY` to verify no duplicate ingestion occurred.                                               |

---

✅ **Summary Table (Quick Recall)**

| Question | Key Concept                                                                 |
| -------- | --------------------------------------------------------------------------- |
| 1        | PIPE = object that automates COPY INTO & tracks files                       |
| 2        | Snowpipe = serverless, event-driven; COPY = scheduled, warehouse-driven     |
| 3        | Needs S3, SNS, SQS, notification_channel, IAM role                          |
| 4        | `DESC PIPE` shows notification_channel → used in SQS policy                 |
| 5        | Add `Put` + `CompleteMultipartUpload` events                                |
| 6        | Pause via `ALTER PIPE SET PIPE_EXECUTION_PAUSED`                            |
| 7        | Billed per GB serverless ingestion → check PIPE_USAGE_HISTORY               |
| 8        | REST API `insertFiles` used when you can’t use auto-ingest                  |
| 9        | Diagnose via COPY_HISTORY / PIPE_USAGE_HISTORY                              |
| 10       | Unique filenames, correct events, monitor costs, pause before schema change |

---



---

## 1) How to check for any error in Snowpipe? (strategy / SQL)

Snowpipe does **not** push an automatic Snowflake-level error notification to you when a file fails — you must *query Snowflake objects that record load/validation info* or build an alerting wrapper. Key built-in tools:

**A. `VALIDATE_PIPE_LOAD` (table function)** — shows errors encountered by Snowpipe for a pipe in the last 14 days.

```sql
SELECT *
FROM TABLE(VALIDATE_PIPE_LOAD(
  PIPE_NAME => 'MYDB.MYSCHEMA.MY_PIPE',
  START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
));
```

Use this first — it returns file-level validation errors and useful metadata. ([Snowflake Documentation][1])

**B. `INFORMATION_SCHEMA.COPY_HISTORY` (table function)** — shows COPY statements (including those executed by Snowpipe) and the first error per file (useful if a file produced an error during COPY).

```sql
SELECT *
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
  TABLE_NAME => 'TRANSACTIONS',
  START_TIME => DATEADD('hour', -6, CURRENT_TIMESTAMP())
))
ORDER BY LAST_LOAD_TIME DESC;
```

This is the canonical place to read COPY error messages per file. ([Snowflake Documentation][2])

**C. `PIPE_USAGE_HISTORY` / `SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY`** — pipe activity and billed bytes (quick way to spot spikes/failures). The table function `PIPE_USAGE_HISTORY()` returns recent pipe activity (about 14 days) and the account usage view returns up to 365 days for billing/ops analysis. ([Snowflake Documentation][3])

**D. `SYSTEM$PIPE_STATUS`** — returns a JSON with pipe health info: `executionState`, `lastIngestedTimestamp`, `numOutstandingMessagesOnChannel`, etc. Good to check whether the pipe is `RUNNING` / `PAUSED` or if messages are piling up in the channel.

```sql
SELECT SYSTEM$PIPE_STATUS('MYDB.MYSCHEMA.MY_PIPE');
```

Use this when investigating ingestion lag or pending messages. ([Snowflake Documentation][4])

**E. Snowsight UI** — visual history of pipe loads and copy errors (quick for human triage). ([Stack Overflow][5])

---

## 2) Strategy because “Snowpipe doesn't throw any error or notification”

Correct — Snowpipe itself won’t email or push alerts to you automatically. **Operational strategy**:

1. **Poll & validate automatically**

   * Schedule a short Task (or external orchestration like Airflow) to run every 1–5 minutes:

     * `SELECT SYSTEM$PIPE_STATUS(...)` — fail if `executionState != 'RUNNING'` or `numOutstandingMessagesOnChannel > threshold`.
     * `SELECT * FROM TABLE(VALIDATE_PIPE_LOAD(...)) WHERE <error detected>` — if non-empty, raise alert.
     * `SELECT * FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(...)) WHERE error is not null` — find earliest failing file(s).
   * If any check fails, **send a notification** (Teams/Slack/email) using `SYSTEM$SEND_SNOWFLAKE_NOTIFICATION` or call an external webhook. (You can create an Alert object or a Task that calls `SYSTEM$SEND_SNOWFLAKE_NOTIFICATION`.) ([Medium][6])

2. **Leverage SQS/SNS metadata**

   * Put S3 event messages in an SQS queue with a **DLQ (dead-letter queue)**. Configure an external monitor that alerts when messages land in the DLQ (indicates delivery problem between AWS and Snowflake).

3. **Instrument COPY behavior**

   * In the `COPY INTO` inside the pipe, use `ON_ERROR = 'CONTINUE'` or `ON_ERROR = 'SKIP_FILE'` depending on your tolerance — but in any case, log failed filenames to an `error_table` via `COPY` options or use downstream `VALIDATE_PIPE_LOAD` to find them.

4. **Store failure rows / auditing**

   * Use a staging table where Snowpipe first loads raw rows, then run a `VALIDATION` pipeline (Stream+Task) that verifies data and writes bad rows to `staging_errors` with full context. This is the best pattern for production.

> TL;DR: **don’t rely on implicit Snowpipe alerts** — poll VALIDATE_PIPE_LOAD, COPY_HISTORY, and SYSTEM$PIPE_STATUS and wire those checks into your alerting system.

References: VALIDATE_PIPE_LOAD docs and Troubleshooting guide. ([Snowflake Documentation][1])

---

## 3) “You can't alter the COPY command under same pipe object! You have to recreate the pipe object” — correctness & safe recipe

**Fact:** You **cannot** change the `COPY INTO <table>` statement of a pipe using `ALTER PIPE`. The Snowflake docs explicitly list that the `COPY INTO` clause cannot be modified via `ALTER PIPE`. To change the COPY definition you must `CREATE OR REPLACE PIPE` (or `DROP` and `CREATE`). ([Snowflake Documentation][7])

**Safe procedure to change COPY logic with minimal risk** (recommended runbook):

1. `ALTER PIPE mypipe SET PIPE_EXECUTION_PAUSED = TRUE;`
2. Confirm pending queue is empty:

   ```sql
   SELECT parse_json(SYSTEM$PIPE_STATUS('MYDB.MYSCHEMA.MY_PIPE')):numOutstandingMessagesOnChannel;
   ```

   Wait until pending = 0 (or use a small wait + check loop). ([Snowflake Documentation][4])
3. `CREATE OR REPLACE PIPE mypipe AS COPY INTO ...`  ← new definition (this internally recreates the pipe).
4. Optionally run `ALTER PIPE mypipe REFRESH;` to push recent staged files into the pipe ingest queue (see next section for REFRESH semantics).
5. `ALTER PIPE mypipe SET PIPE_EXECUTION_PAUSED = FALSE;`
6. Validate with `SYSTEM$PIPE_STATUS` and `PIPE_USAGE_HISTORY` / `VALIDATE_PIPE_LOAD`.

**Why you must pause first?** Because otherwise in-flight notifications can be processed against the old/new definitions leading to unpredictable behavior and possible duplicates.

References/notes: docs explain `COPY` cannot be altered with `ALTER PIPE` and suggest pausing/checking status before recreating. ([Snowflake Documentation][7])

---

## 4) If you CREATE OR REPLACE a pipe, do you need to refresh pipe metadata? How to refresh?

**Yes — two relevant facts:**

* Snowpipe tracks a *pipe-level internal file-load metadata* that prevents reloading the same files (this metadata is associated with the pipe object). That metadata remains tied to the specific pipe object. If you recreate a pipe, the new pipe gets its own metadata/history. You may need to explicitly `REFRESH` the pipe to have Snowpipe (re)consider files that are already staged. ([Snowflake Documentation][8])

* **`ALTER PIPE ... REFRESH`** exists and copies files staged **within the previous 7 days** into the Snowpipe ingest queue for loading (this is the supported refresh mechanism). Use `ALTER PIPE mypipe REFRESH;` to make Snowpipe attempt to load recent staged files (subject to the pipe’s file tracking semantics and age limits). The REFRESH function is intended for short-term recovery or to pull in recent historical files — it is not for regular loads of very old archives. ([Snowflake Documentation][7])

Example:

```sql
-- after recreating pipe
ALTER PIPE mydb.myschema.mypipe REFRESH;
SELECT SYSTEM$PIPE_STATUS('MYDB.MYSCHEMA.MY_PIPE');
```

**Important caveats:**

* The REFRESH command copies staged files from **last 7 days** by default (older files need manual `COPY INTO` or REST API to re-ingest). ([Stack Overflow][9])
* Recreating a pipe generates a *new* internal load-history for the pipe. If you need to preserve the prior pipe’s history or avoid duplicates, capture timestamps or file lists before dropping the pipe and design a safe refresh window. Many teams record the last-processed file timestamp before replacement, and then `REFRESH` only files after that timestamp. Community threads discuss this pattern. ([Stack Overflow][10])

---

## 5) Is it always good practice to have 1 pipe object for one S3 bucket? (notification channel behavior)

**Reality & recommendation:**

* **Notification channel behavior:** if you create multiple pipes that target the same external bucket (or the same account) Snowflake often shows the **same `notification_channel`** (SQS) value for those pipes — because the cloud messaging channel (SNS/SQS integration) is tied to the **account/region** and bucket mapping. So yes, multiple pipes can end up sharing the same channel ARN. That’s expected. ([Cloudyard][11])

* **But** sharing a notification channel **does not mean** pipes won’t compete or accidentally load the same files. If multiple pipes point to overlapping paths (or use the same stage/prefix), you can accidentally have the same file eligible to be loaded by more than one pipe (and lead to duplicate rows). The basic rule: **file load metadata is tracked per-pipe**, so two different pipes do not share load history — they can both load the same file unless you design separation.

**Best practice (recommended):**

1. **One ingest pipeline per logical target**: create one pipe per logical source→table line (e.g., `bucket/prefix/payments/` → `payments_table`). This minimizes coordination and dedup complexity.
2. **Use distinct stages or prefix path** for different pipes: `@stage/payments/`, `@stage/customers/` — this prevents overlap.
3. If you must have multiple pipes for the same bucket, **explicitly use `PATTERN` or `PREFIX`** in the pipe definition to guarantee disjoint file sets.
4. **Avoid relying on shared notification channel semantics** to isolate loads — treat notification channel as transport only and separate at the pipe definition level. ([Cloudyard][11])

Short summary: **notification channel may be the same**, but best practice is *one pipe per logical prefix/table* (or use patterns) to avoid accidental duplicate ingestion and to simplify operational ownership.

---

## 6) Truncate scenario — exact behavior

> Scenario: You uploaded `file1.csv` in S3; Snowpipe ingested it into `table T`. Then you `TRUNCATE TABLE T`. Next you upload `file2.csv`. Will Snowpipe reload `file1.csv` (again) or only `file2.csv`?

**Answer (explicit):** Snowpipe **decides whether to load files based on the pipe’s internal file-load metadata**, not the current contents of the target table. Because `file1.csv` was already processed by that pipe, Snowpipe will **not** attempt to re-ingest `file1.csv` simply because the table was truncated. Only `file2.csv` (new file) will be loaded automatically.

**If you need to reload `file1.csv` after truncating**, your options are:

* Run a manual `COPY INTO <table> ... FILES = ('file1.csv') FORCE = TRUE;` — this will explicitly re-load duplicates. (`FORCE = TRUE` forces reloading the files even if previously loaded). ([Snowflake Documentation][2])
* Or delete the pipe and recreate it (new pipe has no load history) and then `ALTER PIPE ... REFRESH` (or rely on notifications) — but recreate means you must be careful not to accidentally duplicate *other* files; this approach is more disruptive and not recommended solely to reprocess files. ([Stack Overflow][10])
* Or upload the file with a **new unique filename** (recommended operational pattern) — Snowpipe sees it as new and will ingest it. (Use timestamp/UUID in file name.)

**Key takeaways:**

* Truncating the table does **not** reset the pipe’s loaded-file tracking.
* To re-ingest an already-loaded file you must force the COPY or change file identity (filename) or recreate the pipe (with side-effects). ([Snowflake Documentation][8])

---

## 7) “Snowpipe copies data to the table based on the name of the files but only copy command do that based on the hash value of the files” — explanation & scenarios

There’s a subtle but important distinction here — let’s break it down precisely.

**How Snowpipe prevents duplicates (what it *tracks*)**

* **Snowpipe’s internal metadata** (associated **with the pipe**) stores the *path + filename* of files that have been loaded by that pipe, and prevents re-loading files with the **same name**. The docs explicitly state Snowpipe prevents loading files with the same **name** even if their eTag/metadata changed. In other words: **file identity for Snowpipe = file path + name** (not checksum). That’s why changing file contents but keeping the same filename typically will *not* trigger re-ingestion by Snowpipe. ([Snowflake Documentation][8])

**How `COPY INTO` (manual bulk load) handles duplicates**

* `COPY INTO <table>` (manual/bulk copy) uses a different set of metadata: bulk loads store load history with the **target table** for some retention period (docs mention bulk load history stored in table metadata for 64 days). Bulk `COPY` supports `FORCE = TRUE` to explicitly force reloading even if file checksums match prior loads. COPY can use checksum/hash behavior or internal tracking to determine if file contents were already loaded by the same `COPY` job. ([Snowflake Documentation][8])

**Scenarios — what happens and why**

1. **Scenario A — Snowpipe + same filename overwritten in S3**

   * Upload `data.csv` → Snowpipe loads it. Later overwrite `data.csv` in S3 with a different payload but same filename. Snowpipe will **not** re-load it (because load history for that pipe records the filename). If you need re-load, you must use a manual `COPY` with `FORCE=TRUE`, change the filename, or recreate the pipe (not recommended). ([Snowflake Documentation][8])

2. **Scenario B — Bulk COPY run twice with same files and `FORCE=FALSE`**

   * The second `COPY` may skip files if table metadata indicates those files were already loaded; `FORCE=TRUE` will re-load them. (COPY has options to override.) ([Snowflake Documentation][2])

3. **Scenario C — Multiple pipes point to same bucket and same file**

   * Because load metadata is **per-pipe**, each pipe might load the same file independently (duplicates). Avoid this by segregating prefixes, using PATTERNs, or designing only one pipe to handle a prefix. ([Cloudyard][11])

4. **Scenario D — Reprocess historic files**

   * Use `ALTER PIPE ... REFRESH` (last 7 days) or `COPY INTO` manually (with `FORCE=TRUE`) or REST API `insertFiles` to explicitly trigger ingestion of older files. ([Snowflake Documentation][7])

**Rule of thumb:**

* **Snowpipe = filename-based, per-pipe tracking (short retention, ~14 days)**.
* **Bulk COPY = table-level tracking + COPY options (FORCE, PURGE, VALIDATION_MODE)** and can be used to force re-ingestion.

---

## Practical runbook — common tasks & SQL snippets

**Check pipe state + pending messages**

```sql
SELECT parse_json(SYSTEM$PIPE_STATUS('MYDB.MYSCHEMA.MY_PIPE')) AS pipe_status;
```

**Find validation errors (last 24 hours)**

```sql
SELECT *
FROM TABLE(VALIDATE_PIPE_LOAD(
  PIPE_NAME=>'MYDB.MYSCHEMA.MY_PIPE',
  START_TIME=>DATEADD('day', -1, CURRENT_TIMESTAMP())
));
```

**Inspect COPY history for table in last 6 hours**

```sql
SELECT *
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
  TABLE_NAME=>'TRANSACTIONS',
  START_TIME=>DATEADD('hour', -6, CURRENT_TIMESTAMP())
));
```

**Pause pipe safely**

```sql
ALTER PIPE MYDB.MYSCHEMA.MY_PIPE SET PIPE_EXECUTION_PAUSED = TRUE;
-- wait until SYSTEM$PIPE_STATUS shows pending messages = 0
```

**Recreate pipe safely (change COPY)**

```sql
-- 1) pause
ALTER PIPE mypipe SET PIPE_EXECUTION_PAUSED = TRUE;

-- 2) verify pending = 0
SELECT parse_json(SYSTEM$PIPE_STATUS('MYDB.MYSCHEMA.MY_PIPE')):numOutstandingMessagesOnChannel;

-- 3) create or replace (new COPY)
CREATE OR REPLACE PIPE mypipe AUTO_INGEST = TRUE AS
  COPY INTO ... ;

-- 4) optionally refresh
ALTER PIPE mypipe REFRESH;

-- 5) resume
ALTER PIPE mypipe SET PIPE_EXECUTION_PAUSED = FALSE;
```

**Force re-load a file explicitly**

```sql
COPY INTO MYDB.MYSCHEMA.TRANSACTIONS
  FROM @my_stage/path/
  FILES = ('file1.csv')
  FORCE = TRUE;
```

---

## Short list of gotchas and final recommendations

* **Retention windows differ**: Snowpipe file metadata ≈ **14 days**; bulk load history stored in table metadata is longer (docs mention 64 days). Plan reprocessing strategies accordingly. ([Snowflake Documentation][8])
* **Avoid overwriting files** with the same name when you expect Snowpipe to re-ingest the changed content. Use unique names (timestamp/UUID). ([Stack Overflow][12])
* **Pause→Check pending→Recreate→Refresh→Resume** is the safe way to change pipe definitions. ([Stack Overflow][5])
* **Use VALIDATE_PIPE_LOAD and COPY_HISTORY** in automation to surface file-level errors; build alerts off these. ([Snowflake Documentation][1])

---


[1]: https://docs.snowflake.com/en/sql-reference/functions/validate_pipe_load?utm_source=chatgpt.com "VALIDATE_PIPE_LOAD"
[2]: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table?utm_source=chatgpt.com "COPY INTO <table>"
[3]: https://docs.snowflake.com/en/sql-reference/functions/pipe_usage_history?utm_source=chatgpt.com "PIPE_USAGE_HISTORY"
[4]: https://docs.snowflake.com/en/sql-reference/functions/system_pipe_status?utm_source=chatgpt.com "SYSTEM$PIPE_STATUS"
[5]: https://stackoverflow.com/questions/60728379/how-to-move-or-alter-a-pipe-without-missing-or-duplicating-any-records?utm_source=chatgpt.com "How to Move or Alter a Pipe without missing or duplicating ..."
[6]: https://medium.com/snowflake/introduction-to-snowflakes-data-pipeline-alerts-notifications-9beac8d127cc?utm_source=chatgpt.com "Introduction to Snowflake's data pipeline alerts & notifications"
[7]: https://docs.snowflake.com/en/sql-reference/sql/alter-pipe?utm_source=chatgpt.com "ALTER PIPE"
[8]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-intro?utm_source=chatgpt.com "Snowpipe"
[9]: https://stackoverflow.com/questions/72675419/why-snow-pipe-is-not-loading-all-the-data?utm_source=chatgpt.com "Why snow pipe is not loading all the data"
[10]: https://stackoverflow.com/questions/69166544/preserving-the-load-history-when-re-creating-pipes-in-snowflake?utm_source=chatgpt.com "Preserving the load history when re-creating pipes in ..."
[11]: https://cloudyard.in/2022/06/snowpipe-load-multiple-tables-with-same-bucket/?utm_source=chatgpt.com "Snowpipe: Load multiple tables with same bucket"
[12]: https://stackoverflow.com/questions/76660028/how-much-time-snowpipe-keeps-track-of-files-that-has-being-already-loaded?utm_source=chatgpt.com "How much time SnowPipe keeps track of files that has ..."
