
---

# 1) Quick glossary (one-liner)

* **Stage** = pointer (URL + credentials) to file(s) in S3 (or Snowflake internal storage). You can `LIST` and query the files directly. ([Snowflake Documentation][1])
* **File format** = reusable object that describes CSV/JSON/PARQUET layout. ([Snowflake Documentation][2])
* **Storage Integration** = Snowflake object that stores the managed identity / allowed locations used to access S3 (and that ties into the AWS IAM role). Stage can reference a storage integration. ([Snowflake Documentation][3])
* **External table** = a Snowflake object that maps files in an external stage to columns, keeps metadata, and can be refreshed automatically (closer to a real table but data remains in S3). ([Snowflake Documentation][4])

---

# 2) Complete demo: from file → query (replace placeholders with your values)

### Assumptions / sample files

S3 bucket: `s3://retailx-raw`
Files under `orders/` with csv like `orders_20250828.csv` containing:

```
order_id,customer_id,created_at,total_usd
1001,c_001,2025-08-28 10:01:02, 199.50
1002,c_002,2025-08-28 10:05:34, 9.99
```

---

### Step A — create a File Format (CSV)

```sql
CREATE OR REPLACE FILE FORMAT retailx_csv_fmt
  TYPE = CSV
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  SKIP_HEADER = 1
  FIELD_DELIMITER = ','
  TRIM_SPACE = TRUE
  NULL_IF = ('', 'NULL', 'null')
  COMPRESSION = 'AUTO';
```

(You can create JSON / PARQUET formats similarly.) ([Snowflake Documentation][2])

---

### Step B — (you already may have) create a Storage Integration

> If you already created an integration & AWS role earlier, skip to Step C.

```sql
CREATE OR REPLACE STORAGE INTEGRATION retailx_s3_int
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = 'S3'
  ENABLED = TRUE
  STORAGE_ALLOWED_LOCATIONS = ('s3://retailx-raw/');
```

After create, run:

```sql
DESC INTEGRATION retailx_s3_int;
-- copy STORAGE_AWS_IAM_USER_ARN and STORAGE_AWS_EXTERNAL_ID and follow Snowflake/AWS steps to create AWS role/trust policy
```

(You need to finish the AWS side: create IAM role, trust the Snowflake ARN and external id, attach a least-privilege S3 policy — see the Snowflake storage integration docs.) ([Snowflake Documentation][5])

---

### Step C — create an external stage that uses the storage integration and the file format

```sql
CREATE OR REPLACE STAGE retailx_orders_stage
  URL = 's3://retailx-raw/orders/'
  STORAGE_INTEGRATION = retailx_s3_int
  FILE_FORMAT = retailx_csv_fmt;
```

(You can also omit `FILE_FORMAT` here and pass it when querying.) ([Snowflake Documentation][1])

---

### Step D — sanity checks

```sql
-- show files
LIST @retailx_orders_stage;
```

If `LIST` works, you have connectivity. If `ACCESS_DENIED`, check IAM role trust policy, `STORAGE_ALLOWED_LOCATIONS` and the S3 bucket policy/role policy.

---

### Step E — ad-hoc query of CSV files in the stage (no COPY)

When you query a staged CSV directly, Snowflake exposes column positions as `$1`, `$2`, ... — then you can cast/alias them:

```sql
SELECT
  TRY_CAST(t.$1 AS INTEGER)        AS order_id,
  t.$2::STRING                    AS customer_id,
  TRY_TO_TIMESTAMP(t.$3, 'YYYY-MM-DD HH24:MI:SS') AS created_at,
  TRY_CAST(t.$4 AS NUMBER(10,2))  AS total_usd
FROM @retailx_orders_stage (FILE_FORMAT => 'retailx_csv_fmt') t
WHERE TRY_CAST(t.$4 AS NUMBER) > 50;
```

Notes:

* You can query the whole stage, a path (`@stage/subpath/`), or a single file (`@stage/file.csv.gz`). Querying is *on-the-fly* parsing of files. ([Snowflake Documentation][6])

---

# 3) Filters, joins, views, tables — examples

### 3A — Filtering while reading files

You saw `WHERE TRY_CAST(t.$4 AS NUMBER) > 50` above — predicate filters are allowed, and Snowflake will only scan files it needs if you restrict the stage path or use an external table with partitions. But note: when reading raw files directly from a stage, Snowflake usually scans files on the fly (less efficient than table pruning). ([Snowflake Documentation][4], [Stack Overflow][7])

### 3B — Join the staged data with an internal table (works fine)

```sql
-- internal customer master table
CREATE OR REPLACE TABLE retailx_customers (customer_id STRING, customer_name STRING);

-- join ad-hoc stage data with internal table
SELECT s.order_id, s.customer_id, c.customer_name, s.total_usd
FROM (
  SELECT
    TRY_CAST(t.$1 AS INTEGER) AS order_id,
    t.$2::STRING             AS customer_id,
    TRY_CAST(t.$4 AS NUMBER(10,2)) AS total_usd
  FROM @retailx_orders_stage (FILE_FORMAT => 'retailx_csv_fmt') t
) s
JOIN retailx_customers c ON s.customer_id = c.customer_id
WHERE s.total_usd > 100;
```

Yes — you can join staged data with tables. The staged side is parsed on the fly (positional columns), the table side benefits from Snowflake table optimizations.

### 3C — Create a view that wraps the stage query

```sql
CREATE OR REPLACE VIEW v_orders_stage AS
SELECT
  TRY_CAST(t.$1 AS INTEGER) AS order_id,
  t.$2::STRING               AS customer_id,
  TRY_TO_TIMESTAMP(t.$3,'YYYY-MM-DD HH24:MI:SS') AS created_at,
  TRY_CAST(t.$4 AS NUMBER(10,2)) AS total_usd
FROM @retailx_orders_stage (FILE_FORMAT => 'retailx_csv_fmt') t;
```

A normal `VIEW` over a stage is allowed (view stores the query). When you query the view, files are re-read and re-parsed at query time. *You cannot create a materialized view on a stage query.* ([Snowflake Documentation][8], [Medium][9])

### 3D — Create a table (persist/load the data into Snowflake)

If you want the performance and Snowflake features, load into a table:

```sql
CREATE OR REPLACE TABLE retailx_orders AS
SELECT
  TRY_CAST(t.$1 AS INTEGER) AS order_id,
  t.$2::STRING              AS customer_id,
  TRY_TO_TIMESTAMP(t.$3,'YYYY-MM-DD HH24:MI:SS') AS created_at,
  TRY_CAST(t.$4 AS NUMBER(10,2)) AS total_usd
FROM @retailx_orders_stage (FILE_FORMAT => 'retailx_csv_fmt') t;
```

Now data lives in Snowflake storage, will be micro-partitioned, and queries will typically be far faster and cheaper for repeated workloads. ([Snowflake Documentation][10])

---

# 4) External table (the “nice middle ground”)

External tables create metadata in Snowflake and let you define named columns (mapping into `VALUE:c1`, `VALUE:c2`, etc.), and support automatic refresh (SNS/SQS) for S3 to pick up new files. They are useful when you need table-like access without copying data into Snowflake. Example:

```sql
CREATE OR REPLACE EXTERNAL TABLE retailx_orders_ext (
  order_id    varchar as (value:c1::varchar),
  customer_id varchar as (value:c2::varchar),
  created_at  timestamp_ntz as (value:c3::timestamp_ntz),
  total_usd   number(10,2) as (value:c4::number)
)
WITH LOCATION = @retailx_orders_stage
FILE_FORMAT = (FORMAT_NAME = 'retailx_csv_fmt')
AUTO_REFRESH = TRUE
REFRESH_ON_CREATE = TRUE;
```

Key points:

* External tables can be partitioned and their metadata refreshed automatically (SNS/SQS) to register new files. ([Snowflake Documentation][4])

---

# 5) Differences vs “traditional” Snowflake table queries (detailed)

### What internal Snowflake tables give you (advantages):

* **Micro-partitions & pruning** (very fast scanning / min/max metadata) — Snowflake stores column min/max and can prune partitions. This massively reduces scan cost for selective filters. ([Snowflake Documentation][10])
* **Clustering, time travel, cloning, DML, materialized views on tables** — advanced features you don’t get on stage reads or (some of) external tables. ([Snowflake Documentation][4])
* **Faster repeated queries via micro-partition caching** and more cost-efficient compute for analytical workloads. ([Teej][11])

### What stage / external reads **lack** or are weaker at:

* **No micro-partition pruning** because files are not loaded into Snowflake’s micro-partition storage — queries must re-scan/parsing files (slower for heavy analytic workloads). ([Stack Overflow][7])
* **Less pushdown & metadata**: stage queries return positional columns (\$1,\$2) unless you wrap them or use external tables to define columns. External tables can add metadata but still won’t have micro-partitions. ([Snowflake Documentation][6])
* **No materialized views directly on stage queries** (external tables can have materialized views in some cases, but there are caveats). ([Medium][9])

---

# 6) Benefits of keeping data in S3 & querying from Snowflake

* **Storage cost savings** (S3 can be cheaper than Snowflake long-term storage for cold/raw data). ([phData][12])
* **Single source of truth / interoperability** — other systems can read the same S3 files.
* **Fast exploration & QA** — you can sample raw files without duplicating them into Snowflake.
* **Immediate availability** for new raw files if using external tables + auto refresh or Snowpipe. ([Snowflake Documentation][4])

---

# 7) Disadvantages / costs / caveats

* **Query performance**: external queries are generally slower and can be more compute-heavy compared to native Snowflake tables. ([Stack Overflow][7])
* **Limited Snowflake features**: no micro-partitioning, no time travel, no cloning, limited materialized view options, some DDL/DML limitations. ([Snowflake Documentation][4])
* **Metadata maintenance**: external tables require refreshes (manual or event-driven) to reflect new/removed files; auto-refresh costs (Snowpipe charges) can appear. ([Snowflake Documentation][4])
* **Potential egress costs**: if bucket and Snowflake are in *different* regions, you may pay cross-region data transfer. (Keep them same region where possible.) ([Medium][13])

---

# 8) When to use each option (rules of thumb)

* **Query stage / external table**

  * Use for: exploration, profiling raw data, QA, cross-platform sharing, and for rarely-queried historical datasets where you want to avoid copying.
  * Use external table (not ad-hoc stage select) when you want table-like convenience (named columns, refresh) without loading. ([Snowflake Documentation][4])
* **Load into internal Snowflake table**

  * Use for: production analytics, frequent queries, joins/aggregations at scale, low latency dashboards, or any workload that benefits from micro-partitions, clustering, and Snowflake optimizations. ([Snowflake Documentation][10])

---

# 9) Can I query *only* the S3 files I added to the Integration?

Short answer: **No single magic permission in the integration alone controls the exact files — three things must align**:

1. **Stage URL / external table LOCATION** — the stage or external table points to a specific bucket/path (that's your first limit). You can only query files under that stage path. ([Snowflake Documentation][4])
2. **Storage Integration `STORAGE_ALLOWED_LOCATIONS`** — on the Snowflake side you can restrict allowed S3 URIs for that integration; stage URLs must align to those locations. ([Snowflake Documentation][3])
3. **AWS IAM role / bucket policy** — AWS must grant the Snowflake role `s3:ListBucket` / `s3:GetObject` for the specific bucket & prefixes. If the IAM policy does not allow a prefix, Snowflake cannot read it even if the integration lists it. ([Snowflake Documentation][5])

So: **you can only successfully query files that are (A) in the stage path, (B) allowed by STORAGE\_ALLOWED\_LOCATIONS, and (C) allowed by the AWS role/bucket policy**. All three must allow access.

---

# 10) Short checklist / action items (practical)

* For quick inspection: create FILE FORMAT + STAGE → `LIST` → `SELECT $1,$2... FROM @stage(...)` (good for QA). ([Snowflake Documentation][2])
* For production queries: prefer loading data into a Snowflake table or create an EXTERNAL TABLE with `AUTO_REFRESH` and partition definitions. ([Snowflake Documentation][4])
* If you must query external often: benchmark both (external table vs internal table CTAS) — the cost/latency difference can be large. ([Stack Overflow][7])

---

# 11) Quick Q\&A recap (tiny)

* Can you filter while querying a stage? **Yes.**
* Can you join stage data with tables? **Yes** (but stage side parsed on the fly).
* Can you create views over stage selects? **Yes** (but not materialized views over a stage query). ([Snowflake Documentation][8], [Medium][9])
* Can you create tables from stage data? **Yes** (CTAS or `COPY INTO`), and this gives best performance. ([Snowflake Documentation][10])

---


[1]: https://docs.snowflake.com/en/sql-reference/sql/create-stage?utm_source=chatgpt.com "CREATE STAGE - Snowflake Documentation"
[2]: https://docs.snowflake.com/en/sql-reference/sql/create-file-format?utm_source=chatgpt.com "CREATE FILE FORMAT - Snowflake Documentation"
[3]: https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration?utm_source=chatgpt.com "CREATE STORAGE INTEGRATION - Snowflake Documentation"
[4]: https://docs.snowflake.com/en/sql-reference/sql/create-external-table "CREATE EXTERNAL TABLE | Snowflake Documentation"
[5]: https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration?utm_source=chatgpt.com "Configuring a Snowflake storage integration to access Amazon S3"
[6]: https://docs.snowflake.com/en/user-guide/querying-stage?utm_source=chatgpt.com "Querying Data in Staged Files - Snowflake Documentation"
[7]: https://stackoverflow.com/questions/70755218/snowflake-query-performance-is-unexpectedly-slower-for-external-parquet-tables-v?utm_source=chatgpt.com "Snowflake query performance is unexpectedly slower for external ..."
[8]: https://docs.snowflake.com/en/sql-reference/sql/create-view?utm_source=chatgpt.com "CREATE VIEW - Snowflake Documentation"
[9]: https://medium.com/snowflake/snowflake-external-table-vs-query-on-stage-pros-cons-a839b52dbab1?utm_source=chatgpt.com "Snowflake External Table Vs Query on Stage…Pros & Cons - Medium"
[10]: https://docs.snowflake.com/en/user-guide/tables-clustering-micropartitions?utm_source=chatgpt.com "Micro-partitions & Data Clustering - Snowflake Documentation"
[11]: https://teej.ghost.io/a-guide-to-the-snowflake-results-cache/?utm_source=chatgpt.com "A Guide To The Snowflake Results Cache - Teej - Ghost"
[12]: https://www.phdata.io/blog/when-to-use-internal-versus-external-stages-in-snowflake/?utm_source=chatgpt.com "When To Use Internal vs. External Stages in Snowflake - phData"
[13]: https://medium.com/%40zakary.leblanc/snowflake-cliff-notes-internal-external-stage-7a702bbe8748?utm_source=chatgpt.com "Snowflake Cliff Notes: Internal/External Stage | by Zakary LeBlanc"



---

### **1. Querying a CSV in External Stage + `$1,$2` notation**

```sql
-- Assume stage already exists
SELECT $1 AS id, 
       $2 AS name, 
       $3 AS salary
FROM @my_ext_stage/sales/ 
(FILE_FORMAT => my_csv_format);
```

* `$1`, `$2`, `$n` → column **positions** in the file (not table columns).
* If CSV has no header, Snowflake doesn’t know column names → `$1` means "first column in file".
* You can alias them to meaningful names.

---

### **2. What is a Storage Integration?**

* A **Storage Integration** is a Snowflake object that stores an **IAM role ARN** instead of hardcoding AWS keys.
* Why? → **Security**. You don’t drop permanent AWS keys into Snowflake, instead you trust a Snowflake-generated IAM role.
* `STORAGE_ALLOWED_LOCATIONS` → restricts which S3 paths this integration can access. Example:

  ```sql
  CREATE STORAGE INTEGRATION my_s3_int
    TYPE = EXTERNAL_STAGE
    STORAGE_PROVIDER = S3
    ENABLED = TRUE
    STORAGE_ALLOWED_LOCATIONS = ('s3://my-company-bucket/data/');
  ```

---

### **3. Create External Stage with Integration + AWS Trust**

```sql
CREATE STAGE my_ext_stage
  STORAGE_INTEGRATION = my_s3_int
  URL = 's3://my-company-bucket/data/'
  FILE_FORMAT = my_csv_format;
```

* On **AWS side**:

  * Create IAM role.
  * Trust relationship → allow **Snowflake’s generated external ID + Snowflake AWS account** to assume the role.
* Snowflake docs provide you Snowflake’s **AWS IAM principal ARN** for your region.

---

### **4. Performance tradeoffs External Table vs Native Table**

* **External Table**:

  * Reads directly from S3 each time.
  * No micro-partitions → can’t do clustering, pruning efficiently.
  * Slower queries if files are small/many.
* **Native Table (after COPY INTO)**:

  * Data ingested → stored in **Snowflake’s micro-partitions**.
  * Partition pruning, clustering, stats, caching, materialized views → all work.
    👉 Best practice: Use external tables for **discovery / staging**, COPY INTO for **production**.

---

### **5. Partitioning semantics with METADATA\$FILENAME**

```sql
CREATE OR REPLACE EXTERNAL TABLE sales_ext (
  id STRING,
  amount NUMBER,
  sales_date DATE AS TO_DATE(SUBSTRING(METADATA$FILENAME, 12, 10), 'YYYY-MM-DD')
)
WITH LOCATION = @my_ext_stage/sales/
FILE_FORMAT = my_csv_format
AUTO_REFRESH = TRUE;
```

* Here Snowflake extracts `sales_date` from the **filename path**.
* Allows partition pruning (e.g., only scan `2025-08-29/` files).

---

### **6. Can you run UPDATE on an external table?**

* ❌ No. External tables are **read-only metadata layer** on top of files.
* If you need updates → COPY data into a Snowflake table.

---

### **7. Auto-refresh with SNS/SQS**

* Flow:

  1. New files land in S3.
  2. S3 event → SNS → SQS.
  3. Snowflake subscribes to SQS → gets notified → auto-refreshes external table metadata.
* Without this, you’d need manual `ALTER EXTERNAL TABLE … REFRESH`.

---

### **8. PATTERN parameter**

* Regex filter for staged files. Example:

```sql
SELECT * 
FROM @my_ext_stage
(PATTERN => '.*2025-08.*.csv');
```

* Needed when your stage has mixed files but query should only read a subset.

---

### **9. JSON Array file format (STRIP\_OUTER\_ARRAY)**

```sql
CREATE FILE FORMAT my_json_format 
  TYPE = JSON 
  STRIP_OUTER_ARRAY = TRUE;
```

* Makes each JSON array element a separate row.

---

### **10. Result Cache validity**

* Snowflake **Result Cache = 24 hours** per user, per warehouse, if underlying data doesn’t change.
* For external staged files: if files unchanged → repeated queries return from cache, no re-scan.

---

### **11. COPY INTO vs CTAS**

* **COPY INTO**:

  * Standard for ingestion.
  * Handles errors, can retry, supports validation, staging, incremental loads.
* **CTAS**:

  * Creates new table from query result.
  * One-off operation, not designed for pipelines.
    👉 Production ingestion = **COPY INTO**.

---

### **12. S3 bucket region placement**

* Snowflake account in `us-east-1` → keep bucket in `us-east-1`.
* If bucket in another region → **cross-region data transfer costs + latency**.

---

### **13. Inspect staged file metadata**

```sql
SELECT METADATA$FILENAME, 
       METADATA$FILE_ROW_NUMBER, 
       METADATA$FILE_LAST_MODIFIED
FROM @my_ext_stage/sales/;
```

* Lets you see which file/row/time data came from.

---

### **14. STORAGE\_BLOCKED\_LOCATIONS**

* Opposite of `STORAGE_ALLOWED_LOCATIONS`.
* Prevents Snowflake from accessing certain S3 paths.
* Use case: bucket has sensitive PII zone → block it in integration.

---

### **15. External Parquet Schema Inference**

* Snowflake infers Parquet schema, but if evolving schema → use:

```sql
CREATE EXTERNAL TABLE my_parquet_table
  USING TEMPLATE (
    SELECT ARRAY_AGG(OBJECT_CONSTRUCT(*))
    FROM TABLE(
      INFER_SCHEMA(
        LOCATION=>'@my_ext_stage/parquet/',
        FILE_FORMAT=>'my_parquet_format'
      )
    )
  );
```

* `USING TEMPLATE` locks schema based on inferred files.
* Ensures stable schema for BI queries.

---



---

### **1) Named external stage vs direct S3 URL with storage integration**

* **Named external stage**:

  * You pre-create it once, storing the S3 path and integration.
  * Centralized security: credentials are hidden, not exposed in every query.
  * Reusable: many teams/queries can reference `@MY_STAGE/...`.
  * Allows `LIST`, `READ`, `REMOVE` operations directly in Snowflake.

* **Direct S3 URL with storage integration**:

  ```sql
  copy into 's3://my-bucket/path/'
  from MYTABLE
  storage_integration = MY_INT
  file_format = (type=csv);
  ```

  * Quick and flexible.
  * But path and integration must be repeated in every query.
  * No way to `LIST` from Snowflake (since it’s not a named stage).

👉 **Best practice**: use named stages for production pipelines (security + reusability).

---

### **2) Read consistency & `AT (TIMESTAMP => …)`**

Snowflake queries always run on a **consistent snapshot** of data.

* But if you run **two separate queries** (e.g., `COUNT(*)` and `COPY INTO`), the snapshots may differ unless you explicitly **lock them to the same time**.
* Use `AT (TIMESTAMP => $SNAP_TS)` to ensure both queries see the *same frozen version* of the table.

📌 Example:

```sql
set SNAP_TS = current_timestamp();

select count(*) from ORDERS at (timestamp => $SNAP_TS);

copy into @MY_STAGE/orders/
from (select * from ORDERS at (timestamp => $SNAP_TS));
```

This guarantees the count and the unloaded files match exactly.

---

### **3) Why `INCLUDE_QUERY_ID` + `DETAILED_OUTPUT`**

* **`INCLUDE_QUERY_ID=TRUE`**:

  * Appends the Snowflake query ID into each filename → ensures uniqueness and traceability.
  * Prevents overwriting when two processes export to the same prefix.
  * Great for audit trails.

* **`DETAILED_OUTPUT=TRUE`**:

  * Makes `COPY INTO` return **one row per file unloaded**.
  * Shows: file path, file size, rows unloaded.
  * You can `SUM(rows_unloaded)` and prove row counts match expectations.

📌 Example:

```sql
copy into @MY_STAGE/orders/
from (select * from ORDERS)
include_query_id = true
detailed_output = true;
```

---

### **4) Why `PARTITION BY` can’t be combined with `SINGLE=TRUE` or `OVERWRITE=TRUE`**

* **`PARTITION BY`** → Snowflake must create *multiple folders/files* (one per partition).
* **`SINGLE=TRUE`** → forces *one file total*. These conflict.
* **`OVERWRITE=TRUE`** → deletes and replaces everything under the prefix. Partitioning creates dynamic subfolders, so overwrite could break structure.

👉 **Design workaround**:

* Instead of `OVERWRITE=TRUE`, unload into a **date-stamped prefix**:

  ```
  s3://bucket/orders/dt=2025-08-30/
  ```

  Each run is isolated. Use S3 lifecycle rules to expire old folders.

---

### **5) Parquet vs CSV (and CSV gotchas)**

* **Parquet**:

  * Columnar, compressed, schema preserved.
  * Best for big data tools (Athena, Spark, Glue).
  * Much smaller files → saves cost.

* **CSV**:

  * Universal (every tool reads it).
  * Human-readable.
  * But bigger, and fragile with nulls, quotes, line breaks.

**CSV fixes to avoid corruption**:

```sql
create or replace file format FF_CSV
  type = csv
  field_delimiter = ','
  field_optionally_enclosed_by = '"'
  null_if = ('\\N','NULL')
  empty_field_as_null = true
  compression = gzip;
```

---

### **6) How to read back files from S3 (without loading into a table)**

This is how you validate after unloading:

📌 Parquet:

```sql
select count(*) 
from @MY_STAGE/orders/ (file_format => 'FF_PARQUET');
```

📌 CSV:

```sql
select count(*)
from @MY_STAGE/orders/ (file_format => 'FF_CSV');
```

You can also spot check:

```sql
select metadata$filename, count(*)
from @MY_STAGE/orders/ (file_format => 'FF_CSV')
group by 1;
```

---

### **7) Permissions & storage integration (IAM + external ID)**

* Snowflake **never stores AWS keys**. Instead, you set up a **storage integration**.

* In Snowflake:

  ```sql
  create storage integration MY_INT
    type = external_stage
    storage_provider = s3
    storage_aws_role_arn = 'arn:aws:iam::123456789012:role/my-snowflake-role'
    storage_allowed_locations = ('s3://my-bucket/path/')
    enabled = true;
  ```

* Then `DESCRIBE INTEGRATION` gives you:

  * Snowflake **user ARN**
  * **External ID**

* In AWS IAM:

  * Create a role with trust policy: “Allow Snowflake’s user ARN to assume this role, but only with this external ID.”
  * Attach an S3 policy allowing `s3:PutObject`, `s3:ListBucket` (and optionally `DeleteObject`).

This prevents cross-account hijacking.

---

### **8) Encryption options**

When unloading, you can specify encryption:

```sql
copy into @MY_STAGE/orders/
from MYTABLE
file_format = (type=parquet)
encryption = ( type='AWS_SSE_KMS', kms_key_id='arn:aws:kms:us-east-1:123:key/abcd...' );
```

Options:

* `AWS_SSE_S3` → Default S3 server-side encryption (AES-256).
* `AWS_SSE_KMS` → Use a customer-managed KMS key.
* `NONE` → No encryption (rarely used).

---

### **9) What if query returns 0 rows?**

* Snowflake won’t write any data file at all. The prefix may be empty.
* This can break downstream jobs expecting at least one file.

👉 Solutions:

* Design consumer jobs to **tolerate empty directories**.
* Or create a **marker file** manually (e.g., an empty `_SUCCESS` file in S3).
* Or in the unload query, force a dummy row:

  ```sql
  select * from MYTABLE
  union all
  select null, null, null where not exists (select 1 from MYTABLE);
  ```

  (This ensures at least one file lands, even if empty.)

---
