
---

# The Story: “Retail-X” builds a clean S3→Snowflake lane

Your team at **Retail-X** has raw CSV/JSON files landing in an S3 bucket every hour. You want a secure, least-privilege connection so Snowflake can read just the right prefixes, with optional KMS encryption, and you want to be able to add more S3 locations later without re-wiring everything.

We’ll do this properly using a **Snowflake Storage Integration** + **AWS IAM Role** + **bucket policy**. This is Snowflake’s recommended pattern today (it avoids sharing long-lived access keys and uses an External ID to prevent “confused deputy” attacks). ([Snowflake Documentation][1])

---

## 0) Ground rules & fundamentals (read this once)

* **Regions & cost reality.** If your **Snowflake account and S3 bucket are in the same AWS region**, you avoid cross-region *data transfer* fees from AWS. You’ll still pay normal S3 **request** costs (GET/LIST) and Snowflake compute for the COPY. If you cross Regions, expect extra egress charges and latency. Check S3 pricing for current rules. ([Amazon Web Services, Inc.][2], [AWS Documentation][3])
* **Why Storage Integrations?** They’re first-class Snowflake objects that store a Snowflake-managed AWS identity and the S3 locations you allow. No secrets in SQL, easier audits, safer by design. (The older “role-chaining” flow is deprecated.) ([Snowflake Documentation][4])
* **Least privilege always.** You’ll give Snowflake exactly `s3:ListBucket` on the bucket and `s3:GetObject` on object paths you list in Snowflake’s **STORAGE\_ALLOWED\_LOCATIONS**. Add `s3:PutObject` only if you’ll **unload** to S3. Add KMS permissions if you use SSE-KMS. ([Snowflake Documentation][1])

---

## 1) Design the access: pick your S3 locations and permissions

Decide which S3 prefixes Snowflake can touch. Example:

```text
s3://retailx-raw/orders/       (read)
s3://retailx-raw/customers/    (read)
s3://retailx-exports/          (optional: write if UNLOAD)
```

We will enforce these both in **Snowflake** (allowed locations) *and* in **AWS IAM** (policy Resource ARNs). Defense in depth. ([Snowflake Documentation][5])

---

## 2) Create the Snowflake Storage Integration (first!)

Create it **before** touching IAM. You’ll then **DESCRIBE** it to fetch the Snowflake **AWS IAM User ARN** and the **External ID** that must be trusted by your AWS role.

```sql
-- As ACCOUNTADMIN or a role with INTEGRATION privileges
CREATE OR REPLACE STORAGE INTEGRATION retailx_s3_int
  TYPE                     = EXTERNAL_STAGE
  STORAGE_PROVIDER         = 'S3'
  ENABLED                  = TRUE
  STORAGE_ALLOWED_LOCATIONS = (
    's3://retailx-raw/orders/',
    's3://retailx-raw/customers/',
    's3://retailx-exports/'
  )
  STORAGE_BLOCKED_LOCATIONS = ()  -- optionally list disallowed prefixes
  COMMENT = 'Retail-X: read raw data, optional unload to exports';
```

Now pull the Snowflake-managed identity values:

```sql
DESC INTEGRATION retailx_s3_int;
```

From this output, copy:

* **STORAGE\_AWS\_IAM\_USER\_ARN** (Snowflake’s IAM user that will assume your role)
* **STORAGE\_AWS\_EXTERNAL\_ID** (unique ExternalId you must require in your role trust policy) ([Snowflake Documentation][1])

> Why this order? Because AWS needs to *trust* the exact Snowflake ARN + ExternalId you just generated.

---

## 3) In AWS IAM: create the role **trusted by Snowflake**

### 3.1 Create an **IAM role** (console or IaC)

* **Trusted entity:** “AWS account / another AWS account” → paste the **account/user ARN** shown by `DESC INTEGRATION` (Snowflake docs show using Snowflake’s IAM user ARN).
* Add a **trust policy** that requires `sts:ExternalId` to equal the `STORAGE_AWS_EXTERNAL_ID` value you copied. This prevents confused deputy risks.

**Example trust policy (AssumeRole)**—replace the placeholders with your values:

```json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Sid": "TrustSnowflake",
    "Effect": "Allow",
    "Principal": { "AWS": "arn:aws:iam::<SNOWFLAKE_AWS_ACCOUNT_ID>:user/<snowflake-user>" },
    "Action": "sts:AssumeRole",
    "Condition": { "StringEquals": { "sts:ExternalId": "<STORAGE_AWS_EXTERNAL_ID>" } }
  }]
}
```

External IDs are the AWS-recommended way to secure third-party role assumption.

### 3.2 Attach an **inline policy** with least-privilege S3 access

If you only **load** data (read-only):

```json
{
  "Version": "2012-10-17",
  "Statement": [
    { "Sid": "ListBucket",
      "Effect": "Allow",
      "Action": [ "s3:ListBucket" ],
      "Resource": "arn:aws:s3:::retailx-raw",
      "Condition": { "StringLike": { "s3:prefix": [ "orders/*", "customers/*" ] } }
    },
    { "Sid": "ReadObjects",
      "Effect": "Allow",
      "Action": [ "s3:GetObject", "s3:GetObjectVersion" ],
      "Resource": [
        "arn:aws:s3:::retailx-raw/orders/*",
        "arn:aws:s3:::retailx-raw/customers/*"
      ]
    }
  ]
}
```

If you’ll **unload** results to `s3://retailx-exports/`, add:

```json
{
  "Sid": "WriteExports",
  "Effect": "Allow",
  "Action": [ "s3:PutObject", "s3:AbortMultipartUpload", "s3:ListBucketMultipartUploads" ],
  "Resource": [
    "arn:aws:s3:::retailx-exports",
    "arn:aws:s3:::retailx-exports/*"
  ]
}
```

If your files are **SSE-KMS encrypted**, add the KMS permissions (and make sure the **KMS key policy** allows this role):

```json
{
  "Sid": "AllowKMS",
  "Effect": "Allow",
  "Action": [ "kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey" ],
  "Resource": "arn:aws:kms:<region>:<account-id>:key/<key-id>"
}
```

These permissions mirror Snowflake’s docs for S3 access & KMS. ([Snowflake Documentation][1])

> Subtle but important: permission the **bucket** separately from the **objects** (List vs Get/Put). Most “AccessDenied” headaches are mismatches between allowed **prefixes** here and **STORAGE\_ALLOWED\_LOCATIONS** back in Snowflake.

---

## 4) Back to Snowflake: link the integration to your AWS role

Update the integration with your new **role ARN**:

```sql
ALTER STORAGE INTEGRATION retailx_s3_int
  SET STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::<your-account-id>:role/retailx-snowflake-role';
```

Re-`DESC INTEGRATION retailx_s3_int;` and confirm `STORAGE_AWS_ROLE_ARN` is set and `ENABLED` = TRUE. This aligns with Snowflake’s current integration flow. ([Snowflake Documentation][5])

> Can I allow **multiple S3 paths** with one integration? **Yes.** Put as many URIs as you need into **STORAGE\_ALLOWED\_LOCATIONS**; you can also **ALTER** later to add more. Remember to update your IAM policy too, or AWS will still block you. ([Snowflake Documentation][5])

---

## 5) Create an external stage in Snowflake (the “friendly handle”)

```sql
CREATE OR REPLACE STAGE retailx_ext_stage
  URL='s3://retailx-raw/orders/'
  STORAGE_INTEGRATION=retailx_s3_int
  FILE_FORMAT = ( TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1 )
  -- If files are SSE-KMS encrypted, add:
  -- ENCRYPTION = ( TYPE='AWS_SSE_KMS' KMS_KEY_ID='arn:aws:kms:region:acct:key/key-id' );
;
```

External stages encapsulate the URL + integration + file format (+ encryption settings). ([Snowflake Documentation][6])

List files to sanity-check access:

```sql
LIST @retailx_ext_stage;
```

---

## 6) Load data with COPY INTO (and handle errors like a pro)

Create the table:

```sql
CREATE OR REPLACE TABLE retailx_orders(
  order_id    STRING,
  customer_id STRING,
  created_at  TIMESTAMP_NTZ,
  total_usd   NUMBER(10,2)
);
```

Load:

```sql
COPY INTO retailx_orders
FROM @retailx_ext_stage
FILES = ('2025/08/28/orders_*.csv.gz')
ON_ERROR = 'CONTINUE'  -- collect bad rows; don’t abort
VALIDATION_MODE = 'RETURN_ERRORS' -- try this first to preview issues
;
```

* `VALIDATION_MODE` lets you “dry-run” parsing to see errors without loading.
* `ON_ERROR='CONTINUE'` will load good rows and record bad ones. See load history to review problem rows. ([Snowflake Documentation][7])

Check load history:

```sql
SELECT * FROM snowflake.account_usage.copy_history
WHERE table_name = 'RETAILX_ORDERS'
ORDER BY last_load_time DESC;
```

This view shows file, row counts, and first error messages for the last 365 days. ([Snowflake Documentation][8])

> Quick workflow: start with `VALIDATION_MODE=RETURN_ERRORS`, fix file format/columns, then run the real `COPY`. For stubborn files, use a staging table with all-STRING columns and **parse/clean** inside Snowflake.

---

## 7) Optional but important: Auto-ingest with Snowpipe

If you want files to load as soon as they land, configure **Snowpipe auto-ingest** (S3 → SNS/SQS → Snowflake). This needs the right events and queue permissions. Snowflake’s guide walks you step-by-step. ([Snowflake Documentation][9])

---

## 8) Security & networking boosters (add as needed)

* **Block lists.** You can set `STORAGE_BLOCKED_LOCATIONS` to explicitly deny risky prefixes even if someone later adds them to policies. ([Snowflake Documentation][5])
* **Private connectivity.** If you use AWS PrivateLink/VPC endpoints, see Snowflake’s “private connectivity to AWS” notes for allow-listing Snowflake VPC IDs. ([Snowflake Documentation][10])
* **KMS hygiene.** If you use SSE-KMS, ensure both the **IAM role policy** *and* the **KMS key policy** allow decrypt/encrypt for that role. ([Snowflake Documentation][11])

---

## 9) Clean answers to your specific questions

**Q: “If we keep S3 and Snowflake in the same region, we won’t incur extra cost.”**
**A:** Mostly right—but nuanced. Same-region avoids AWS **data transfer out** fees you’d pay for cross-region reads. You still pay S3 **request** charges and Snowflake compute for loading. Always check the current S3 pricing page. ([Amazon Web Services, Inc.][2])

**Q: “S3 policies—what do we need?”**
**A:** On the IAM **role** that Snowflake assumes: `s3:ListBucket` (limited to your prefixes) and `s3:GetObject` (+ `s3:PutObject` if unloading). If using KMS: allow `kms:Decrypt/Encrypt/GenerateDataKey`. ([Snowflake Documentation][1])

**Q: “How to create an IAM policy?”**
**A:** Use the inline JSON shown in §3.2 (tailor bucket names/prefixes). Attach it to the **role** you created for Snowflake. ([Snowflake Documentation][1])

**Q: “IAM Role—step by step?”**
**A:** Create role → set **trust policy** to the **Snowflake IAM User ARN** + require `sts:ExternalId` from `DESC INTEGRATION` → attach the S3 permissions inline policy. (Trust is on the **role**, not on S3 itself—S3 doesn’t have “trust relationships.”)

**Q: “Create Snowflake Integration object—full process?”**
**A:** `CREATE STORAGE INTEGRATION` with your allowed locations → `DESC INTEGRATION` to get **IAM user ARN** + **External ID** → build AWS **role + trust policy** → `ALTER STORAGE INTEGRATION ... SET STORAGE_AWS_ROLE_ARN='arn:aws:iam::...:role/...'`. ([Snowflake Documentation][1])

**Q: “Can I add more S3 paths to one integration?”**
**A:** Yes. Use `ALTER STORAGE INTEGRATION ... SET STORAGE_ALLOWED_LOCATIONS = (...)` to add more URIs. Also update the IAM policy `Resource` ARNs accordingly. ([Snowflake Documentation][5])

**Q: “How do I get the Snowflake ARN and add it to the trust relationship? External ID too?”**
**A:** `DESC INTEGRATION <name>` returns `STORAGE_AWS_IAM_USER_ARN` and `STORAGE_AWS_EXTERNAL_ID`. Put the ARN under `Principal.AWS` and add a `Condition` requiring `sts:ExternalId` to equal that value. ([Snowflake Documentation][1])

---

## 10) End-to-end “happy path” checklist (copy/paste to your runbook)

1. **CREATE STORAGE INTEGRATION** with allowed S3 URIs. ([Snowflake Documentation][5])
2. **DESC INTEGRATION** → copy **IAM User ARN** + **External ID**. ([Snowflake Documentation][1])
3. **Create IAM Role** in AWS with **trust policy** (Principal = Snowflake ARN; `sts:ExternalId` required).
4. **Attach inline S3 policy** (List/Get for load; Put for unload; KMS if needed). ([Snowflake Documentation][1])
5. **ALTER STORAGE INTEGRATION** to set `STORAGE_AWS_ROLE_ARN`. ([Snowflake Documentation][5])
6. **CREATE STAGE** with `STORAGE_INTEGRATION=...` and file format (+ KMS if used). ([Snowflake Documentation][6])
7. **LIST @stage** → **COPY INTO** with `VALIDATION_MODE` first, then real load with `ON_ERROR` tuned. ([Snowflake Documentation][7])
8. **Monitor** via `ACCOUNT_USAGE.COPY_HISTORY` (and Snowsight). ([Snowflake Documentation][8])

---

## 11) Common gotchas (and how to spot them fast)

* **AccessDenied:** Prefix mismatch between IAM policy `Resource` ARNs and the actual folder you listed in Snowflake. Fix both sides. ([Snowflake Documentation][1])
* **KMS failures:** You set `ENCRYPTION=AWS_SSE_KMS` but the role lacks KMS permissions or the **KMS key policy** doesn’t trust the role. Add both. ([Snowflake Documentation][11])
* **Wrong trust entity:** You used an AWS account ID (or wrong ARN) instead of the exact **Snowflake IAM User ARN** from `DESC INTEGRATION`. Update trust policy. ([Snowflake Documentation][1])
* **Cross-region surprises:** S3 bucket and Snowflake region differ; expect slower loads and possible egress fees from AWS. Align regions where possible. ([Amazon Web Services, Inc.][2])

---

## 12) Practice / must-answer questions

1. Why are Storage Integrations safer than embedding AWS keys in stages? (Name two reasons.) ([Snowflake Documentation][4])
2. Show the exact **trust policy** snippet you’d use to require Snowflake’s `ExternalId`. Why is it important?
3. Which two permissions do you *always* need for read-only loads, and where do they apply (bucket vs object)? ([Snowflake Documentation][1])
4. How do `STORAGE_ALLOWED_LOCATIONS` and your IAM `Resource` ARNs work together to enforce least privilege? ([Snowflake Documentation][5])
5. Your files are SSE-KMS. What two places must mention the KMS key to succeed? (Hint: stage `ENCRYPTION` and IAM/KMS policies.) ([Snowflake Documentation][11])
6. Give a `COPY INTO` command that first validates and then loads, capturing bad rows (what two clauses do you change?). ([Snowflake Documentation][7])
7. Where do you find the **Snowflake IAM User ARN** and **External ID** you must put in AWS? Which command shows them? ([Snowflake Documentation][1])
8. What happens (cost-wise and latency-wise) if your S3 bucket is in a different region than Snowflake? Where would you verify current pricing? ([Amazon Web Services, Inc.][2])

---

## 13) Bonus patterns you’ll likely need soon

* **Multiple teams, one integration.** Use one integration with many **allowed** URIs, but isolate each team with separate **stages** and **roles** in Snowflake; keep IAM policy tightly scoped per prefix. ([Snowflake Documentation][5])
* **Auto-ingest.** Add Snowpipe with S3→SNS/SQS notifications when freshness matters. ([Snowflake Documentation][9])
* **Private connectivity.** If you require no public internet paths, review Snowflake’s Private Connectivity notes and allow-listed VPC IDs. ([Snowflake Documentation][10])

---


[1]: https://docs.snowflake.com/en/user-guide/data-load-s3-config-storage-integration "Option 1: Configuring a Snowflake storage integration to access Amazon S3 | Snowflake Documentation"
[2]: https://aws.amazon.com/s3/pricing/?utm_source=chatgpt.com "S3 Pricing - AWS"
[3]: https://docs.aws.amazon.com/cur/latest/userguide/cur-data-transfers-charges.html?utm_source=chatgpt.com "Understanding data transfer charges - AWS Documentation"
[4]: https://docs.snowflake.com/en/user-guide/data-load-s3-config "Configuring secure access to Amazon S3 | Snowflake Documentation"
[5]: https://docs.snowflake.com/en/sql-reference/sql/create-storage-integration "CREATE STORAGE INTEGRATION | Snowflake Documentation"
[6]: https://docs.snowflake.com/en/user-guide/data-load-s3-create-stage?utm_source=chatgpt.com "Creating an S3 stage - Snowflake Documentation"
[7]: https://docs.snowflake.com/en/sql-reference/sql/copy-into-table?utm_source=chatgpt.com "COPY INTO <table> | Snowflake Documentation"
[8]: https://docs.snowflake.com/en/sql-reference/account-usage/copy_history "COPY_HISTORY view | Snowflake Documentation"
[9]: https://docs.snowflake.com/en/user-guide/data-load-snowpipe-auto-s3?utm_source=chatgpt.com "Automating Snowpipe for Amazon S3 - Snowflake Documentation"
[10]: https://docs.snowflake.com/en/user-guide/data-load-s3-allow "Allowing the Virtual Private Cloud IDs | Snowflake Documentation"
[11]: https://docs.snowflake.com/en/user-guide/data-load-s3-encrypt "AWS data file encryption | Snowflake Documentation"



---

## **1) Why are Storage Integrations safer than embedding AWS keys in stages? (Name two reasons.)**

✅ **Answer:**

* Snowflake manages the AWS identity internally—**no long-lived access/secret keys** stored in your SQL objects.
* You get **least-privilege enforcement** via `STORAGE_ALLOWED_LOCATIONS`, plus AWS IAM trust policies with **External ID** to block unauthorized assume-role attempts.

📌 **Example:**
If you hard-coded AWS keys in a `CREATE STAGE`, any Snowflake user with `USAGE` on that stage could view the keys and misuse them. With Storage Integrations, there are no keys exposed in Snowflake.

---

## **2) Show the exact trust policy snippet you’d use to require Snowflake’s `ExternalId`. Why is it important?**

✅ **Answer:**

```json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {
      "AWS": "arn:aws:iam::<SNOWFLAKE_ACCOUNT>:user/<snowflake-iam-user>"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
        "sts:ExternalId": "<STORAGE_AWS_EXTERNAL_ID>"
      }
    }
  }]
}
```

📌 **Why important?**
The `ExternalId` stops the **confused deputy problem**—ensures only Snowflake (not another service in the same AWS account) can assume the role.

---

## **3) Which two permissions do you *always* need for read-only loads, and where do they apply (bucket vs object)?**

✅ **Answer:**

1. `s3:ListBucket` → on the **bucket** (`arn:aws:s3:::bucket-name`) to enumerate object keys.
2. `s3:GetObject` → on the **objects** (`arn:aws:s3:::bucket-name/prefix/*`) to actually read data.

📌 Without `ListBucket`, Snowflake can’t even “see” your files. Without `GetObject`, it can’t read them.

---

## **4) How do `STORAGE_ALLOWED_LOCATIONS` and your IAM `Resource` ARNs work together to enforce least privilege?**

✅ **Answer:**

* `STORAGE_ALLOWED_LOCATIONS` → Snowflake-side filter: limits what URIs you can use in a stage.
* IAM `Resource` ARNs → AWS-side filter: limits what Snowflake’s role can actually access.

📌 Both must allow access. If either side denies, access fails → “defense in depth.”

---

## **5) Your files are SSE-KMS. What two places must mention the KMS key to succeed?**

✅ **Answer:**

1. In the **Snowflake stage definition**, add:

   ```sql
   ENCRYPTION = ( TYPE='AWS_SSE_KMS' KMS_KEY_ID='<kms-key-arn>' )
   ```
2. In AWS IAM and **KMS key policy**: grant the Snowflake IAM role permissions (`kms:Decrypt`, `kms:Encrypt`, `kms:GenerateDataKey`).

📌 Both are needed. If you forget the key policy, you’ll see `AccessDeniedException` even if IAM role has permissions.

---

## **6) Give a `COPY INTO` command that first validates and then loads, capturing bad rows (what two clauses do you change?).**

✅ **Answer:**

* **Step 1 – validate only:**

  ```sql
  COPY INTO retailx_orders
  FROM @retailx_ext_stage
  FILES=('2025/08/28/orders_*.csv.gz')
  VALIDATION_MODE = 'RETURN_ERRORS';
  ```

* **Step 2 – real load with bad rows captured:**

  ```sql
  COPY INTO retailx_orders
  FROM @retailx_ext_stage
  FILES=('2025/08/28/orders_*.csv.gz')
  ON_ERROR = 'CONTINUE';
  ```

📌 Change **`VALIDATION_MODE`** for dry run, and then use **`ON_ERROR`** when doing the actual load.

---

## **7) Where do you find the Snowflake IAM User ARN and External ID you must put in AWS? Which command shows them?**

✅ **Answer:**
Run:

```sql
DESC INTEGRATION retailx_s3_int;
```

Look for:

* `STORAGE_AWS_IAM_USER_ARN`
* `STORAGE_AWS_EXTERNAL_ID`

📌 These values must be plugged into the AWS IAM Role trust policy.

---

## **8) What happens (cost-wise and latency-wise) if your S3 bucket is in a different region than Snowflake? Where would you verify current pricing?**

✅ **Answer:**

* **Cost:** You pay AWS **cross-region data transfer charges** when Snowflake reads from the bucket.
* **Latency:** Data travels across regions → slower load times.
* **Where to check:** AWS **S3 Pricing** page (specifically “Data Transfer” section).

📌 Always align Snowflake region with S3 bucket region if possible.

---
