*You‚Äôve just joined my team, and I need to make sure you‚Äôre not only able to run AWS CLI commands but also understand why we use them, how they connect to Snowflake, and what pitfalls you must avoid.*

We‚Äôll go step by step. üöÄ

---

# 1. **AWS CLI Basics ‚Äì The Foundation**

Imagine AWS CLI as your **remote control** for AWS services.
Instead of logging in to the AWS Console (the web UI), you can send commands from your terminal and instantly interact with **S3 buckets, EC2, IAM, etc.**

For Snowflake, why do we care?
üëâ Because **Snowflake integrates heavily with S3** ‚Äì we often load/unload data between Snowflake and S3. If you don‚Äôt know AWS CLI, you‚Äôll be blind when verifying files, moving data, or debugging failures.

Example scenario:

* You set up a Snowflake `COPY INTO` command to pull data from an S3 bucket.
* Your job fails. Snowflake error message says `File not found`.
* With AWS CLI, you can instantly check: ‚ÄúDoes the file even exist in the bucket?‚Äù

So, CLI is like your **stethoscope as a doctor** when diagnosing S3‚ÄìSnowflake pipelines.

---

# 2. **How to Create AWS CLI Configuration**

This is your first step.
AWS CLI needs to know **‚ÄúWho are you?‚Äù** and **‚ÄúWhich AWS account should I connect to?‚Äù**

### Step 2.1 ‚Äì Install AWS CLI

* Windows: Download MSI installer.
* Mac/Linux: `brew install awscli` or `sudo apt-get install awscli`.
  Verify installation:

```bash
aws --version
```

### Step 2.2 ‚Äì Get AWS Access Credentials

Now here‚Äôs where students usually get stuck. To use AWS CLI, you need an **Access Key ID** and a **Secret Access Key**.
Think of this as your **username + password** for programmatic access.

üëâ How do you get them?

1. Login to AWS Console (with your company credentials).
2. Go to **IAM (Identity & Access Management)**.
3. Choose **Users** ‚Üí Select your user account.
4. Under **Security Credentials**, create a new **Access Key**.

   * AWS gives you two values:

     * `AWS_ACCESS_KEY_ID = AKIAIOSFODNN7EXAMPLE`
     * `AWS_SECRET_ACCESS_KEY = wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY`

‚ö†Ô∏è Important:

* Treat the secret key like your bank card PIN. If leaked ‚Üí someone can delete your S3 buckets!
* Many companies (including top data teams) prefer using **IAM Roles** instead of static keys, but for learning, keys are fine.

### Step 2.3 ‚Äì Configure AWS CLI

Run:

```bash
aws configure
```

It will ask for:

```
AWS Access Key ID [None]: <paste here>
AWS Secret Access Key [None]: <paste here>
Default region name [None]: us-east-1
Default output format [None]: json
```

üëâ Region is critical!

* If your bucket is in `us-east-1` but you accidentally set `ap-south-1`, you‚Äôll keep getting ‚ÄúBucket not found‚Äù errors.

üëâ Output format:

* `json` (default)
* `table` (pretty printed)
* `text` (simplified)

Pro tip: I keep mine as `table` for human readability.

‚úÖ Now your AWS CLI is ready.

---

# 3. **Checking List of Files in an S3 Bucket**

Let‚Äôs say your Snowflake pipeline needs to read files from `s3://company-data/raw/2025/`.

To confirm what‚Äôs inside:

```bash
aws s3 ls s3://company-data/raw/2025/
```

Output example:

```
2025-08-01 12:45:32     145123 sales_data_2025-08-01.csv
2025-08-02 14:22:10     167890 sales_data_2025-08-02.csv
```

* First column = Date uploaded
* Second = Time
* Third = File size (in bytes)
* Fourth = File name

üëâ Why is this important for Snowflake?

* Before you do `COPY INTO my_table FROM @stage`, you can **verify the files exist** and check if they have the right size.

Scenario:
Your pipeline fails with ‚ÄúFile empty‚Äù ‚Üí You run `aws s3 ls` and notice the file size is `0`. Boom! Problem solved.

---

# 4. **How to Copy Files from S3 to Local Folder**

Sometimes, as a Data Engineer, you need to quickly download files from S3 to check their raw content.

Command:

```bash
aws s3 cp s3://company-data/raw/2025/sales_data_2025-08-01.csv ./local_folder/
```

* `s3://company-data/...` = S3 path
* `./local_folder/` = Local path

üëâ If you want to copy **entire folders**:

```bash
aws s3 cp s3://company-data/raw/2025/ ./local_folder/ --recursive
```

Pro Tip:
If you don‚Äôt add `--recursive`, only 1 file will copy, not the whole folder.

Real-world case:
You download a file locally and open it in Excel/Notepad. Suddenly you realize:

* The delimiter is `|` not `,`.
* That‚Äôs why Snowflake `COPY INTO` failed.

---

# 5. **How to Upload Files from Local to S3 Bucket**

Now imagine you‚Äôre testing a pipeline. You created a dummy CSV file locally and want to push it into S3 so Snowflake can read it.

Command:

```bash
aws s3 cp ./local_folder/my_test_file.csv s3://company-data/raw/2025/
```

üëâ For multiple files:

```bash
aws s3 cp ./local_folder/ s3://company-data/raw/2025/ --recursive
```

---

# 6. **Additional Must-Know AWS CLI for Snowflake**

Since you‚Äôll use AWS CLI for **Snowflake integrations**, here are extra commands you must know:

### 6.1 ‚Äì Sync local and S3

```bash
aws s3 sync ./local_folder/ s3://company-data/raw/2025/
```

* Only copies new/changed files.
* Very handy when refreshing test data.

### 6.2 ‚Äì Remove file from S3

```bash
aws s3 rm s3://company-data/raw/2025/sales_data_2025-08-01.csv
```

### 6.3 ‚Äì Check bucket region

```bash
aws s3api get-bucket-location --bucket company-data
```

üëâ Useful when Snowflake gives region mismatch error.

---

# 7. **Real-Life Story: AWS CLI + Snowflake**

You‚Äôre on-call for a production issue:

* BI team complains: ‚ÄúToday‚Äôs sales data is missing in Snowflake.‚Äù
* You check Snowflake table ‚Üí empty.
* You check Snowflake stage ‚Üí no new file.
* You run `aws s3 ls s3://company-data/raw/2025/` ‚Üí today‚Äôs file isn‚Äôt there.

Turns out, the upstream team forgot to upload the file.
You request the file, they give you CSV.
You run `aws s3 cp ./sales_data_today.csv s3://company-data/raw/2025/`.
Then in Snowflake:

```sql
COPY INTO sales_table
FROM @my_s3_stage
FILE_FORMAT = (TYPE = 'CSV');
```

‚úÖ Problem solved. AWS CLI saved the day.

---

# 8. **Must-Know Questions (for you to self-check later)**

1. How do you configure AWS CLI for the first time?
2. What is the difference between `aws s3 cp` and `aws s3 sync`?
3. Why is the `region` important when configuring AWS CLI?
4. How can you check if a file exists in an S3 bucket before running a Snowflake `COPY INTO`?
5. What‚Äôs the difference between `--recursive` and without it when using `aws s3 cp`?
6. How would you debug a Snowflake load failure using AWS CLI?

---


---

### **1. How do you configure AWS CLI for the first time?**

You configure AWS CLI with the `aws configure` command.

Steps:

1. Install AWS CLI (`aws --version` to verify).
2. Generate **Access Key** and **Secret Key** from AWS Console ‚Üí IAM ‚Üí Security Credentials.
3. Run:

   ```bash
   aws configure
   ```

   It will ask you for:

   * **AWS Access Key ID**
   * **AWS Secret Access Key**
   * **Default Region Name** (e.g., `us-east-1`)
   * **Default Output Format** (`json`, `table`, or `text`)

üëâ Example:

```
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: abcdefghijklmnopqrstuvwx123456789
Default region name [None]: us-east-1
Default output format [None]: table
```

Now CLI is connected to your AWS account.

---

### **2. What is the difference between `aws s3 cp` and `aws s3 sync`?**

* **`cp` (copy)** ‚Üí Copies files **one by one** (or multiple with `--recursive`).
  Example:

  ```bash
  aws s3 cp ./file.csv s3://company-data/raw/
  ```

  üëâ Always copies even if the file already exists in S3.

* **`sync` (synchronize)** ‚Üí Makes two locations identical by copying only new or changed files.
  Example:

  ```bash
  aws s3 sync ./local_folder/ s3://company-data/raw/
  ```

  üëâ Faster and efficient when you have thousands of files.
  üëâ Useful in pipelines when you need to keep local + S3 aligned.

üîë Think of `cp` as "manual copy", while `sync` is "keep both folders up to date".

---

### **3. Why is the `region` important when configuring AWS CLI?**

Because **S3 buckets are region-specific**.

* If your bucket is in `us-east-1` but your CLI is set to `ap-south-1`, you‚Äôll see errors like:

  ```
  An error occurred (PermanentRedirect) when calling the ListObjectsV2 operation:
  The bucket you are attempting to access must be addressed using the specified region.
  ```
* Snowflake stages also require you to specify the correct S3 region.

üëâ Example:
If your company S3 bucket is in `eu-west-1`, but you set `us-east-1`, Snowflake will not be able to read/write files properly.

So always **match CLI region with bucket region**.

---

### **4. How can you check if a file exists in an S3 bucket before running a Snowflake `COPY INTO`?**

You use `aws s3 ls` to check.

Example:

```bash
aws s3 ls s3://company-data/raw/2025/sales_data_2025-08-01.csv
```

If the file exists ‚Üí You‚Äôll see file details (date, size, name).
If not ‚Üí No output.

üëâ Why this matters?
Before you run:

```sql
COPY INTO sales_table FROM @my_s3_stage;
```

You want to confirm the file is really there in S3, otherwise your Snowflake query will fail.

---

### **5. What‚Äôs the difference between `--recursive` and without it when using `aws s3 cp`?**

* Without `--recursive`:

  ```bash
  aws s3 cp s3://company-data/raw/2025/ ./local_folder/
  ```

  üëâ Only one file (if you specify it) is copied.

* With `--recursive`:

  ```bash
  aws s3 cp s3://company-data/raw/2025/ ./local_folder/ --recursive
  ```

  üëâ Copies **all files and subfolders** under that prefix.

Real case:

* You need all daily CSVs in `/2025/`. Without `--recursive`, you‚Äôll only get one file.
* With `--recursive`, you‚Äôll get the whole folder at once.

---

### **6. How would you debug a Snowflake load failure using AWS CLI?**

Let‚Äôs say your Snowflake load command fails:

```sql
COPY INTO sales_table
FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV');
```

Steps to debug with AWS CLI:

1. **Check if file exists in bucket**

   ```bash
   aws s3 ls s3://company-data/raw/2025/
   ```

   * If missing ‚Üí upstream team didn‚Äôt upload.

2. **Check file size**

   * If `0` bytes ‚Üí empty file, Snowflake won‚Äôt load.

3. **Download and inspect file locally**

   ```bash
   aws s3 cp s3://company-data/raw/2025/sales_data_2025-08-01.csv .
   cat sales_data_2025-08-01.csv | head -5
   ```

   * Maybe wrong delimiter (`|` instead of `,`) or header issues.

4. **Check bucket region**

   ```bash
   aws s3api get-bucket-location --bucket company-data
   ```

   * If region doesn‚Äôt match Snowflake external stage ‚Üí you must fix stage definition.

üëâ By combining these checks, you can quickly identify why Snowflake is failing.

---



---

## üîë **Core AWS CLI Commands You Must Know (Grouped by Service)**

### **1. General Utility**

1. `aws configure` ‚Äì Configure credentials (access key, secret key, region, output).
2. `aws configure list` ‚Äì See which credentials/region CLI is currently using.
3. `aws sts get-caller-identity` ‚Äì Shows *who you are* (AWS account ID, user/role). Helps debug credential issues.

---

### **2. S3 (Snowflake‚Äôs Best Friend)**

üëâ These are **critical** since Snowflake loads/unloads data from S3.

4. `aws s3 ls` ‚Äì List buckets or files inside a bucket.
5. `aws s3 cp` ‚Äì Copy files between local ‚Üî S3 or S3 ‚Üî S3.
6. `aws s3 mv` ‚Äì Move/rename files in S3 (removes from source).
7. `aws s3 rm` ‚Äì Delete files in S3.
8. `aws s3 sync` ‚Äì Synchronize folders between local and S3.
9. `aws s3 presign` ‚Äì Generate a **presigned URL** for temporary file access (Snowflake can use this sometimes).
10. `aws s3api get-bucket-location` ‚Äì Check which region the bucket is in.

---

### **3. IAM (Security & Access Control)**

üëâ Snowflake often uses **IAM Roles** or **IAM Users** for S3 integration.

11. `aws iam list-users` ‚Äì List IAM users.
12. `aws iam list-roles` ‚Äì List IAM roles.
13. `aws iam get-user` ‚Äì See details of your IAM user.
14. `aws iam attach-user-policy` ‚Äì Attach a policy to a user (like S3 read-only).
15. `aws iam create-access-key` ‚Äì Generate access/secret key for a user.

---

### **4. STS (Temporary Security Tokens)**

üëâ Many companies don‚Äôt give static keys, instead they use **STS tokens** for security.

16. `aws sts assume-role` ‚Äì Assume a role and get temporary credentials.
17. `aws sts get-session-token` ‚Äì Get a temporary session (for MFA-enabled accounts).

---

### **5. CloudWatch (Logs & Monitoring)**

üëâ Snowflake won‚Äôt directly use this, but as a data engineer you may debug upstream AWS jobs here.

18. `aws cloudwatch list-metrics` ‚Äì List available metrics.
19. `aws cloudwatch get-metric-data` ‚Äì Retrieve metric data (like S3 request counts).
20. `aws logs tail` ‚Äì Tail log groups in real time.

---

### **6. EC2 (Infrastructure Basics)**

üëâ Even if you‚Äôre Snowflake-focused, you‚Äôll often check EC2 because upstream/downstream apps may be running on it.

21. `aws ec2 describe-instances` ‚Äì Show EC2 instances details.
22. `aws ec2 start-instances` ‚Äì Start EC2 instance.
23. `aws ec2 stop-instances` ‚Äì Stop EC2 instance.

---

### **7. Miscellaneous (Often Needed in Data Engineering)**

24. `aws kms list-keys` ‚Äì List encryption keys (important for S3 + Snowflake when using SSE-KMS).
25. `aws secretsmanager get-secret-value` ‚Äì Retrieve credentials/secrets (some Snowflake connectors store DB passwords here).

---

## üìå Why These Matter for Snowflake Engineers

* **S3 commands (4‚Äì10)** ‚Üí Daily life when loading/unloading data to Snowflake.
* **IAM commands (11‚Äì15)** ‚Üí You‚Äôll configure Snowflake external stages using IAM Roles/Users.
* **STS commands (16‚Äì17)** ‚Üí Many companies enforce temporary tokens for Snowflake pipelines.
* **CloudWatch & EC2 (18‚Äì23)** ‚Üí When debugging why data didn‚Äôt reach Snowflake (e.g., logs from ETL jobs).
* **KMS & Secrets Manager (24‚Äì25)** ‚Üí For handling encrypted S3 buckets and managing DB credentials securely.

---



---

# **When is it important to use AWS CLI instead of the web console?**

Think of AWS CLI as the **toolbelt** of a data engineer, while the web console is like a **manual screwdriver**. Both can get the job done, but sometimes the CLI is the only efficient way.

---

## **1Ô∏è‚É£ Automation and Repeatable Processes**

* **Scenario:** You have a Snowflake pipeline that loads **thousands of daily CSV files** into an external stage in S3.
* If you use the web console: You‚Äôd have to manually upload each file or folder.
* With CLI:

```bash
aws s3 sync ./daily_csv/ s3://company-data/raw/2025/
```

* ‚úÖ Outcome: All files uploaded automatically. You can schedule it with cron/airflow.

**Key takeaway:** Whenever tasks are repetitive or scheduled, **CLI is essential**. Manual console clicks are error-prone and slow.

---

## **2Ô∏è‚É£ Working in Remote or Headless Environments**

* **Scenario:** Your Snowflake ETL server runs in AWS EC2 (Linux) and has no GUI.
* To debug missing files in S3, you **cannot use a web browser**.
* With CLI:

```bash
aws s3 ls s3://company-data/raw/2025/
```

* ‚úÖ Outcome: You can instantly see all files, sizes, and timestamps from the terminal.

**Key takeaway:** CLI is critical for servers, automation scripts, or when GUI isn‚Äôt available.

---

## **3Ô∏è‚É£ Large-Scale File Operations**

* **Scenario:** Your company ingests **millions of rows per day**, spread across **hundreds of CSV files** in S3.
* Web console: Uploading, moving, or deleting hundreds of files manually is impossible.
* CLI:

```bash
aws s3 rm s3://company-data/raw/2025/ --recursive
aws s3 cp ./new_files/ s3://company-data/raw/2025/ --recursive
```

* ‚úÖ Outcome: Bulk operations done in seconds.

**Key takeaway:** CLI is **much faster and reliable** for large-scale data movement, which is common in Snowflake pipelines.

---

## **4Ô∏è‚É£ Debugging and Diagnostics**

* **Scenario:** A Snowflake `COPY INTO` command fails with `access denied`.
* Web console might let you check policies manually, but CLI gives precise answers:

```bash
aws s3api get-bucket-policy --bucket company-data
aws iam simulate-principal-policy --policy-source-arn arn:aws:iam::123456789012:role/SnowflakeRole --action-names s3:GetObject
```

* ‚úÖ Outcome: You can see exactly **what permissions exist** and whether the Snowflake role can read/write files.

**Key takeaway:** CLI lets you **inspect, debug, and validate permissions programmatically**.

---

## **5Ô∏è‚É£ Integration with Snowflake Scripts and Pipelines**

* **Scenario:** You‚Äôre writing an Airflow DAG or a Python ETL script that loads files into Snowflake external stages.
* CLI commands can be integrated directly:

```bash
os.system("aws s3 cp ./data.csv s3://company-data/raw/2025/")
```

* ‚úÖ Outcome: Fully automated Snowflake pipeline without human intervention.

**Key takeaway:** CLI is **mandatory for DevOps and CI/CD workflows**.

---

## **6Ô∏è‚É£ Security & Temporary Access**

* **Scenario:** Your company uses **temporary STS credentials** for Snowflake external stages (no permanent keys).
* Web console doesn‚Äôt let you assume temporary credentials programmatically.
* CLI allows:

```bash
aws sts assume-role --role-arn arn:aws:iam::123456789012:role/SnowflakeRole --role-session-name SnowflakeSession
```

* ‚úÖ Outcome: You can use temporary credentials to access S3 from scripts safely.

**Key takeaway:** CLI is essential when **security policies require ephemeral credentials**.

---

## **7Ô∏è‚É£ Versioning, Metadata, and Detailed Inspection**

* **Scenario:** A Snowflake load failed because the wrong file version was uploaded.
* CLI allows inspecting metadata:

```bash
aws s3api head-object --bucket company-data --key raw/2025/sales_data.csv
```

* ‚úÖ Outcome: Check **file size, last modified date, encryption** ‚Üí fix the pipeline.

**Key takeaway:** CLI provides **fine-grained inspection** that web console often doesn‚Äôt show easily.

---

# **‚úÖ Summary Table: When CLI is Better**

| Use Case                    | Why CLI is Preferred                                     |
| --------------------------- | -------------------------------------------------------- |
| Automation & Scheduling     | Repeatable tasks without manual clicks                   |
| Remote servers / headless   | Works without GUI                                        |
| Large-scale file operations | Faster for bulk upload/download/move/delete              |
| Debugging / Permissions     | Inspect policies, simulate access, check bucket metadata |
| Integration with pipelines  | Embeddable in scripts/DAGs for CI/CD                     |
| Temporary credentials       | STS and ephemeral access keys work only via CLI          |
| Detailed file inspection    | File metadata, size, last modified, encryption           |

---

**Story-Based Perspective:**

Imagine it‚Äôs **3 AM** and a Snowflake pipeline failed. The web console is slow and requires a VPN. Using the CLI, you:

1. Check which files are missing: `aws s3 ls ‚Ä¶`
2. Confirm file size & encryption: `aws s3api head-object ‚Ä¶`
3. Upload fixed files: `aws s3 cp ‚Ä¶`
4. Test IAM permissions: `aws iam simulate-principal-policy ‚Ä¶`

Within **10 minutes**, the pipeline is back online ‚Äî something you could never do through a web browser at that speed.

---

üí° **Bottom Line:**
You use **CLI over web console** whenever you need **automation, speed, debugging, large-scale operations, or programmatic access**, which is almost daily in Snowflake pipelines.

---



---

# **1Ô∏è‚É£ Copying Through Staging (Snowflake Stage)**

A **stage** in Snowflake is like a **temporary storage area** for files before loading into tables. You can use **internal stages** (inside Snowflake) or **external stages** (like S3).

---

### **Scenario:**

You receive multiple daily CSV files in S3 and want to load them into a Snowflake table.

### **Step 1 ‚Äì Create a Stage**

```sql
-- Internal stage
CREATE OR REPLACE STAGE my_stage;

-- OR External stage (S3)
CREATE OR REPLACE STAGE my_s3_stage
URL='s3://company-data/raw/2025/'
STORAGE_INTEGRATION = my_s3_integration;
```

* **Internal stage** ‚Üí Snowflake manages storage.
* **External stage** ‚Üí Snowflake reads files directly from S3 using an **integration**.

---

### **Step 2 ‚Äì Copy Files into Stage**

If internal:

```sql
PUT file://C:\local\sales_data_2025.csv @my_stage;
```

If external ‚Üí Stage already points to S3, so you don‚Äôt need to `PUT`.

---

### **Step 3 ‚Äì Copy into Table**

```sql
COPY INTO sales_table
FROM @my_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"');
```

‚úÖ Advantages:

* You can **inspect files** before loading.
* Works with **multiple files**.
* Supports **error handling** (`ON_ERROR = CONTINUE/ABORT_STATEMENT/ SKIP_FILE`).

‚ùå Disadvantages:

* Extra step ‚Üí staging can increase latency.
* Slightly more storage cost if using internal stage.

---

# **2Ô∏è‚É£ Direct Copy from S3 to Table**

Snowflake also supports **direct copy from external stage** (S3 bucket) **without first uploading to internal stage**.

---

### **Query Example:**

```sql
COPY INTO sales_table
FROM 's3://company-data/raw/2025/'
STORAGE_INTEGRATION = my_s3_integration
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"')
ON_ERROR = 'CONTINUE';
```

* Snowflake reads **directly from S3**.
* Saves time because you **skip the PUT to internal stage**.

‚úÖ Advantages:

* Faster for **large datasets**.
* No extra internal storage cost.

‚ùå Disadvantages:

* Harder to **pre-check file content**.
* Error handling may require **re-downloading/uploading files** to fix issues.

---

# **3Ô∏è‚É£ Direct Unload (Snowflake ‚Üí S3)**

Direct unload is used to **export table/query results from Snowflake to S3**, optionally compressed.

---

### **Query Example:**

```sql
COPY INTO 's3://company-data/unload/2025/'
FROM sales_table
STORAGE_INTEGRATION = my_s3_integration
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"' COMPRESSION = GZIP)
MAX_FILE_SIZE = 50000000;
```

* **COMPRESSION** ‚Üí GZIP, BZIP2, etc.
* **MAX\_FILE\_SIZE** ‚Üí Control chunk size of files.

‚úÖ Advantages:

* Efficient export to S3.
* Supports **parallel unload**.

‚ùå Disadvantages:

* If your query returns millions of rows, **many small files** may be generated ‚Üí requires later consolidation.
* Only works **from Snowflake ‚Üí S3**, cannot push directly to local.

---

# **4Ô∏è‚É£ Zip Compression Format in Snowflake**

* Snowflake **cannot load zip files** natively if they contain **multiple files**.
* It can load **gzip (single file) or compressed files** directly.

### **Important Notes:**

| Compression    | Snowflake Support | Notes                      |
| -------------- | ----------------- | -------------------------- |
| GZIP (.gz)     | ‚úÖ Yes             | Single-file only           |
| BZIP2 (.bz2)   | ‚úÖ Yes             | Single-file only           |
| ZIP (.zip)     | ‚ùå Multiple files  | Only if single file inside |
| Internal Stage | ‚úÖ                 | Can use PUT + staged file  |

---

# **5Ô∏è‚É£ How to Copy Zip Files from S3 to Snowflake**

### **Scenario:** Single compressed CSV `.zip` in S3.

```sql
COPY INTO sales_table
FROM @my_s3_stage
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION = ZIP);
```

* `COMPRESSION = ZIP` tells Snowflake it‚Äôs compressed.
* Works **only if ZIP has one CSV file inside**.

---

### **What happens with multiple files inside a ZIP?**

* Snowflake **cannot read** multiple files inside a single ZIP.
* Attempting this will result in:

```
Error: ZIP archive contains multiple files.
```

---

### **What happens if ZIP has a single file?**

* ‚úÖ Snowflake reads it without problem.
* Data loads normally.

---

### **How to solve the multiple-file ZIP problem?**

1. **Unzip before uploading:**

   * Using AWS CLI or local unzip:

```bash
unzip my_data.zip -d ./unzipped_files/
aws s3 cp ./unzipped_files/ s3://company-data/raw/2025/ --recursive
```

2. **Split ZIP into single-file archives** (if needed for automation).

3. **Use internal staging + PUT** ‚Üí you can manage files before COPY.

---

# **6Ô∏è‚É£ Summary Table ‚Äì Copy Types & Notes**

| Type          | Example                       | Pros                          | Cons                                  | Notes                               |
| ------------- | ----------------------------- | ----------------------------- | ------------------------------------- | ----------------------------------- |
| Through Stage | `COPY INTO table FROM @stage` | Inspect files, error handling | Extra step, storage cost              | PUT optional if internal stage      |
| Direct Copy   | `COPY INTO table FROM S3`     | Fast, no extra storage        | Hard to pre-check files               | Ideal for large datasets            |
| Direct Unload | `COPY INTO S3 FROM table`     | Parallel export, compressed   | Many small files, only Snowflake ‚Üí S3 | Use COMPRESSION and MAX\_FILE\_SIZE |

---

### **Story Perspective**

Imagine a Snowflake pipeline:

* Morning: Files arrive in **S3 ZIP archive** with **multiple CSVs inside**.
* Direct copy fails ‚Üí Snowflake can‚Äôt read multiple files in ZIP.
* Solution:

  1. Unzip locally or in staging ‚Üí single files.
  2. Upload to S3 ‚Üí use `COPY INTO` table.
* End result: Table updated, error-free, ready for BI dashboards.

---

‚úÖ **Takeaways:**

1. Use **staging** when you want control, inspection, and error handling.
2. Use **direct copy** when files are large and you want speed.
3. **Direct unload** is great for pushing data back to S3 efficiently.
4. Snowflake supports **ZIP only if single file inside**; multiple files ‚Üí unzip first.

---
