
---

# Snowflake Staging, Unloading & Downloading — from zero to expert

## 0) Mental model (why stages exist at all)

* **Tables = kitchen** where data is prepared.
* **Stages = pickup counters** (temporary file space). You can write data **out** from tables/queries to a stage (this is called **unloading**), and you can **GET** those staged files down to your laptop.
* **Kinds of stages** you’ll meet while downloading:

  * **User stage**: `@~` (each user has one).
  * **Table stage**: `@%MY_TABLE` (attached to a table).
  * **Named stage**: `@MY_STAGE` (a first-class object you create; can be internal or external like S3/GCS/Azure).

> Key truth: You don’t download rows directly from a table. You first **unload** rows to a stage (files), then **GET** those files to local.

---

## 1) Two legit ways to “download data” with SnowSQL

### Way A — Quick and dirty: print query results to a local file

Sometimes you just need a one-off CSV of a query. SnowSQL can spool the **query result** (not staged files) straight to disk.

**Example**

```bash
# 1) Connect
snowsql -a <account> -u <user> -r <role> -w <warehouse> -d <database> -s <schema>

# 2) (Optional) Make the output CSV-friendly
!set output_format=csv
!set timing=false

# 3) Run your query and send results to a local file
!spool /home/you/downloads/orders_2025-08-24.csv
SELECT ORDER_ID, CUSTOMER_ID, TOTAL_AMOUNT
FROM ANALYTICS.SALES.ORDERS
WHERE ORDER_DATE >= '2025-08-01';
!spool off
```

**When to use**: Small/medium result sets, ad-hoc exports, no need to manage stage files.

**Limitations**: This is *client* export—no parallelization, no server-side partitioning, and fewer controls (compression, file sizing) than a proper unload.

---

### Way B — Production-grade: UNLOAD to a stage, then GET to local

This is the **recommended** way for anything real: scalable, resumable, controllable formats, partitioning, and good for very large datasets.

**Step B1: Create a file format (repeatable, explicit)**

```sql
CREATE OR REPLACE FILE FORMAT ff_csv_gz
  TYPE = CSV
  COMPRESSION = GZIP
  FIELD_OPTIONALLY_ENCLOSED_BY = '"'
  NULL_IF = ('\\N', 'NULL', '');
```

**Step B2: Choose a stage**

* User stage (quick): `@~`
* Named internal stage (reusable):

```sql
CREATE OR REPLACE STAGE mystage_internal
  FILE_FORMAT = ff_csv_gz;  -- optional default
```

**Step B3: UNLOAD (server-side export)**

```sql
-- Example: unload recent orders to a subfolder in a named stage
COPY INTO @mystage_internal/exports/orders/2025-08-24/
FROM (
  SELECT ORDER_ID, CUSTOMER_ID, TOTAL_AMOUNT, ORDER_DATE
  FROM ANALYTICS.SALES.ORDERS
  WHERE ORDER_DATE >= '2025-08-01'
)
FILE_FORMAT = (FORMAT_NAME = ff_csv_gz HEADER = TRUE)
OVERWRITE = TRUE           -- replace any prior files in this target path
SINGLE = FALSE             -- produce multiple files for parallelism (good for big data)
MAX_FILE_SIZE = 50000000;  -- ~50MB target chunks (tune as you like)
```

> **What is “unloading”?**
> In Snowflake, **unloading** means using `COPY INTO <stage>` to write the *result of a table or query* into files (CSV/JSON/PARQUET, compressed or not) on a stage (internal or external). Other databases might have a statement literally named `UNLOAD`; Snowflake uses `COPY INTO` for both *loading* (files → table) and *unloading* (table/query → files).

**Step B4: GET the files to your laptop (SnowSQL)**

```bash
# Connect if not already connected
snowsql -a <account> -u <user> -r <role>

# Download all files from that path into a local folder
GET @mystage_internal/exports/orders/2025-08-24/ file:///home/you/downloads/orders_2025-08-24/ \
  PATTERN='.*' \
  PARALLEL=8;
```

**Notes**

* `GET` is a **SnowSQL** command (runs in the SnowSQL client, not in SQL worksheets).
* Files come down **as they exist on the stage**. If you wrote `.gz` files, you’ll get `.gz` locally (which is usually what you want).
* Use a **new subfolder per run** (like `.../YYYY-MM-DD/`) for tidy versioning.

---

## 2) “Can I unload on an existing staging file?”

Short, honest answer: **No, you can’t append to an existing file.**

* Snowflake **generates file names** for unloads (e.g., `data_0_0_0.csv.gz`, possibly with a query id).
* You **cannot** say “write exactly into `orders.csv`” or “append to `orders.csv`”.
* You **can** set `OVERWRITE = TRUE` to **replace** any files **already present** in the *target path* (Snowflake will delete/replace files it would otherwise create).
* If you need strict idempotency without clobbering, add `INCLUDE_QUERY_ID = TRUE` so filenames include a unique query id:

```sql
COPY INTO @mystage_internal/exports/orders/2025-08-24/
FROM (SELECT ... )
FILE_FORMAT=(FORMAT_NAME=ff_csv_gz HEADER=TRUE)
OVERWRITE=FALSE
INCLUDE_QUERY_ID=TRUE;
```

* Best practice: **write to a new prefix/folder** each run (`.../run_ts=<timestamp>/`). That makes lineage and reruns painless.

---

## 3) End-to-end mini “stories” you’ll actually do

### Story A: Analyst wants a quick CSV today

* Needs small extract, doesn’t care about staging.

```bash
snowsql -a <acct> -u <user>
!set output_format=csv
!spool ~/Downloads/top_customers.csv
SELECT CUSTOMER_ID, SUM(TOTAL_AMOUNT) AS revenue
FROM ANALYTICS.SALES.ORDERS
GROUP BY 1
ORDER BY revenue DESC
LIMIT 1000;
!spool off
```

Done in a minute. Perfect for ad-hoc shares.

---

### Story B: Data engineer ships a repeatable daily export

* Wants compressed, partitioned files, clean foldering, same shape every day.

```sql
-- 1) One-time setup
CREATE OR REPLACE FILE FORMAT ff_csv_gz TYPE=CSV COMPRESSION=GZIP HEADER=TRUE;
CREATE OR REPLACE STAGE exports_stage FILE_FORMAT=ff_csv_gz;

-- 2) Daily job (e.g., in Task or Airflow)
SET run_dt = TO_CHAR(CURRENT_DATE, 'YYYY-MM-DD');

COPY INTO @exports_stage/orders/${run_dt}/
FROM (
  SELECT ORDER_ID, CUSTOMER_ID, ORDER_DATE, TOTAL_AMOUNT
  FROM ANALYTICS.SALES.ORDERS
  WHERE ORDER_DATE = CURRENT_DATE - 1
)
FILE_FORMAT=(FORMAT_NAME=ff_csv_gz)
OVERWRITE=TRUE
SINGLE=FALSE
MAX_FILE_SIZE=50000000;
```

Then your operator (or you) pulls files as needed:

```bash
snowsql -a <acct> -u <user>
GET @exports_stage/orders/2025-08-23/ file:///data/exports/orders/2025-08-23/ PARALLEL=8;
```

---

### Story C: Use the user stage for quick experiments

```sql
COPY INTO @~/scratch/orders_sample/
FROM (SELECT * FROM ANALYTICS.SALES.ORDERS SAMPLE (1));
```

```bash
GET @~/scratch/orders_sample/ file:///home/you/tmp/orders_sample/;
```

No objects to manage. Great for trying things.

---

## 4) Common options you’ll actually tune

* **FILE\_FORMAT**: CSV/JSON/PARQUET; compression `GZIP`, `BZIP2`, `ZSTD`, or `NONE`.
* **HEADER** (CSV): `TRUE`/`FALSE`.
* **SINGLE**:

  * `TRUE` = one file (convenient but can be large & slower).
  * `FALSE` = multiple files (parallel, scalable).
* **MAX\_FILE\_SIZE**: Target size per file (helps downstream limits).
* **OVERWRITE**: `TRUE` to replace any preexisting output files at that path.
* **INCLUDE\_QUERY\_ID**: Helps make filenames unique across retries/reruns.
* **PATTERN (GET)**: Regex to filter which staged files to download.

---

## 5) Permissions you need (quick checklist)

* To **unload**: `SELECT` on the source table/view/query’s objects **and** ability to write to the **target stage** (own user stage is fine; for named stages you need appropriate privileges on the stage’s schema/object).
* To **GET**: Ability to **read** from that stage (your user stage is automatically fine; for named stages ensure your role has been granted access).
* For **external stages** (S3/GCS/Azure): the stage must be created with working credentials or a **storage integration**; your role needs usage rights on the stage.

---

## 6) Practical troubleshooting & gotchas

* **“GET command not found”**: `GET` is a SnowSQL client command. It won’t run in web UI worksheets. Use the SnowSQL CLI.
* **“Overwrite or append?”**: You can **replace** (`OVERWRITE=TRUE`), but not **append to an existing file**. Write to new folders for each run if you need both old and new.
* **Huge single files**: If downstream tools struggle with one giant file, use `SINGLE=FALSE` and tune `MAX_FILE_SIZE`.
* **Local decompression**: If you unloaded with `COMPRESSION=GZIP`, you’ll download `.gz`. Decompress locally if needed (`gunzip`, 7-Zip, etc.).
* **Consistent schemas**: Lock your export schema via a view so column order/types remain stable over time.

---

## 7) Fully worked example you can copy-paste

**SQL (one-time + daily)**

```sql
-- One-time setup
CREATE OR REPLACE FILE FORMAT ff_csv_gz
  TYPE=CSV COMPRESSION=GZIP FIELD_OPTIONALLY_ENCLOSED_BY='"' NULL_IF=('');

CREATE OR REPLACE STAGE daily_exports FILE_FORMAT=ff_csv_gz;

-- Daily export (yesterday’s orders)
COPY INTO @daily_exports/orders/dt=${TO_CHAR(CURRENT_DATE-1,'YYYY-MM-DD')}/
FROM (
  SELECT ORDER_ID, CUSTOMER_ID, ORDER_DATE, TOTAL_AMOUNT
  FROM ANALYTICS.SALES.ORDERS
  WHERE ORDER_DATE = CURRENT_DATE - 1
)
FILE_FORMAT=(FORMAT_NAME=ff_csv_gz HEADER=TRUE)
OVERWRITE=TRUE
SINGLE=FALSE
MAX_FILE_SIZE=52428800; -- 50MB
```

**SnowSQL (download)**

```bash
snowsql -a <account> -u <user> -r <role>
GET @daily_exports/orders/dt=2025-08-23/ file:///home/you/exports/orders/2025-08-23/ \
  PATTERN='.*\.csv\.gz' PARALLEL=8;
```

---

## 8) Extra fundamentals that matter (even if not asked)

* **Table stage** `@%TABLE_NAME`: Handy when exporting data related to a single table and you want to keep artifacts “with” the table.
* **External stages**: If you eventually want others (or Spark/Databricks) to pick up files from S3/GCS/Azure, create a **storage integration** and unload there; the GET step becomes optional because consumers read directly from cloud storage.
* **Data formats**: Prefer **PARQUET** for analytics pipelines (columnar, typed); prefer **CSV** for human-sharing or tools that expect CSV.
* **Versioning**: Put **dates or run IDs in the path**; don’t rely on overwriting in place.
* **Idempotency**: Use `INCLUDE_QUERY_ID=TRUE` or new folders to avoid collisions on retries.

---

## 9) Must-know questions to test yourself

1. What are the differences between **user**, **table**, and **named** stages, and when would you choose each for an export?
2. Explain **unloading** in Snowflake. Which statement do you use and why is it used for both load and unload?
3. How do **SINGLE**, **MAX\_FILE\_SIZE**, and **HEADER** affect your exported files?
4. Can you **append** to a staged file during unload? If not, what pattern achieves the same business goal?
5. When would you choose **CSV** vs **PARQUET** for exports?
6. How do you **GET** only certain files from a stage to your laptop?
7. What privileges are required for unloading and GET-ing from **named stages** vs the **user stage**?
8. Why might you add `INCLUDE_QUERY_ID=TRUE` to your unload, and what problem does it solve?
9. How do you design a **repeatable daily export** so that reruns don’t corrupt or mix outputs?
10. What are the trade-offs of using **SnowSQL spooling** vs **server-side unload + GET**?

---

## Quick cheat sheet

* **Ad-hoc to local**: `!spool file.csv` … `SELECT ...;` `!spool off`
* **Proper export**: `COPY INTO @stage/path/ FROM (SELECT ...) FILE_FORMAT=(...) OVERWRITE=TRUE;`
* **Download files**: `GET @stage/path/ file:///local/path/ PATTERN='.*' PARALLEL=8;`
* **No append**: Use new path per run or `OVERWRITE=TRUE` to replace.

---



---

# Must-Know Q\&A on Snowflake Unloading & Staging

---

## 1. Do you need to create a separate stage to unload data?

* **No**.

  * You can unload into **existing built-in stages**:

    * **User stage** (`@~`)
    * **Table stage** (`@%MY_TABLE`)
  * You create a **named stage** only when you want something reusable, shared, or external.
* **Example**:

```sql
-- Unload to user stage
COPY INTO @~/orders_unload/ FROM (SELECT * FROM orders);
```

---

## 2. Do internal stages cost money?

* **Yes**, because files are stored in **Snowflake-managed storage**, billed at the same per-TB rate as table storage.
* If you unload 100GB and don’t remove it, you keep paying for it until you `REMOVE`.
* **Best practice**: `GET` → download → `REMOVE` from stage.
* **Example**:

```sql
-- After downloading, free up space
REMOVE @~/orders_unload/;
```

---

## 3. Do external stages cost money?

* **No Snowflake storage charge**.
* Files sit in your cloud bucket (S3, GCS, Azure Blob), so you pay **cloud provider storage fees** instead.
* You still pay Snowflake **compute** to run the unload (`COPY INTO`).

---

## 4. What is “unloading” in Snowflake?

* **Unloading = exporting query/table results into files on a stage**.
* Done using `COPY INTO @stage ...` (the same command used for loading).
* Supports **CSV, PARQUET, JSON** with compression options.
* **Example**:

```sql
COPY INTO @mystage/orders/2025-08-24/
FROM (SELECT * FROM orders WHERE order_date = '2025-08-23')
FILE_FORMAT=(TYPE=CSV COMPRESSION=GZIP HEADER=TRUE);
```

---

## 5. Can you unload onto an existing staging file?

* **No append to a file.**
* You can only:

  * **Overwrite files** (`OVERWRITE=TRUE`)
  * Or **write to a new folder/prefix** (best practice).
* **Workaround for uniqueness**: Use `INCLUDE_QUERY_ID=TRUE`.
* **Example**:

```sql
COPY INTO @mystage/orders/run_2025_08_24/
FROM (SELECT * FROM orders)
OVERWRITE=TRUE;
```

---

## 6. How do you download staged data to your laptop?

* Use the **SnowSQL `GET` command** (client-side).
* Runs outside SQL worksheets (only in SnowSQL CLI).
* **Example**:

```bash
GET @mystage/orders/run_2025_08_24/ file:///home/you/downloads/orders/2025-08-24/ PARALLEL=8;
```

---

## 7. When would you choose CSV vs Parquet for unloads?

* **CSV**:

  * For business users, Excel, ad-hoc sharing.
  * Easy to read, but larger size, no data types.
* **Parquet**:

  * For data pipelines, analytics, Spark/Databricks/BigQuery.
  * Columnar, compressed, type-aware, efficient.
* **Rule of thumb**: CSV for humans, Parquet for machines.

---

## 8. What’s the difference between spooling (`!spool`) vs unloading (`COPY INTO + GET`)?

* **Spooling**:

  * Done in SnowSQL with `!spool filename.csv`.
  * Good for small ad-hoc exports.
  * No stage involved.
* **Unloading**:

  * Server-side export with `COPY INTO`.
  * Scalable, supports partitioning, compression, multiple files.
  * Best for large/production exports.

---

## 9. What permissions do you need to unload and download?

* **Unloading**:

  * `SELECT` privilege on source tables/views.
  * `USAGE` + `WRITE` on target stage.
* **Downloading (GET)**:

  * `READ` privilege on stage.
  * User stage (`@~`) needs no extra grant (it’s private).
* **External stage**: stage must be created with valid cloud credentials or storage integration, and your role needs access to that stage.

---

## 10. How do you design a repeatable daily export safely?

* Use **new subfolders per run** (`.../YYYY-MM-DD/`).
* Use `OVERWRITE=TRUE` only within that run’s folder.
* Optionally add `INCLUDE_QUERY_ID=TRUE` for unique file names.
* Clean up when done.
* **Example**:

```sql
SET dt = TO_CHAR(CURRENT_DATE-1,'YYYY-MM-DD');

COPY INTO @daily_exports/orders/${dt}/
FROM (SELECT * FROM orders WHERE order_date = CURRENT_DATE-1)
FILE_FORMAT=(TYPE=CSV COMPRESSION=GZIP HEADER=TRUE)
OVERWRITE=TRUE;
```

---

# 🧠 Quick Memory Anchors

* **Stages = pickup counters** (files wait there before you grab them).
* **Internal stage = Snowflake’s pantry** → you pay rent (storage).
* **External stage = Your cloud pantry** → you pay AWS/GCP/Azure.
* **Unload = COPY INTO @stage**.
* **No appends** → overwrite or version folders.
* **Download = GET** in SnowSQL, not in web UI.

---