

# 1) Quick recap (fundamentals you must own)

* **What is an external table?**
  An external table in Snowflake is a read-only table definition that points to files in external cloud storage (S3 / GCS / Azure Blob). Snowflake stores *file-level metadata* (file names, etag/md5, last modified, registration timestamp, partition info) — but **the data files themselves live outside Snowflake**. ([Snowflake Documentation][1])

* **Two ways to keep metadata in sync:**

  1. **Manual refresh** (you call it).
  2. **Auto refresh** — Snowflake wires an internal Snowpipe pipe to a cloud messaging queue (SQS / Pub/Sub / Event Grid) so events from the storage trigger metadata refreshes. AUTO\_REFRESH is `TRUE` by default but requires you configure event notifications. ([Snowflake Documentation][2])

---

# 2) The single most important command you’ll use

**Refresh external table metadata (manual)**

```sql
-- refresh entire external table metadata
ALTER EXTERNAL TABLE mydb.public.my_ext_table REFRESH;

-- refresh only a relative path (useful for partitioned folders)
ALTER EXTERNAL TABLE mydb.public.my_ext_table REFRESH '2025/09/05/';
```

This reads the cloud path, synchronizes the external table’s file metadata (adds new files, marks removed files, updates changed files). Use when auto-refresh is not configured or as a manual corrective step. ([Snowflake Documentation][2])

Also useful variants:

```sql
ALTER EXTERNAL TABLE my_ext_table ADD FILES ('path/file1.parquet');    -- register specific files
ALTER EXTERNAL TABLE my_ext_table REMOVE FILES ('path/file1.parquet'); -- un-register specific files
ALTER EXTERNAL TABLE my_ext_table SET AUTO_REFRESH = TRUE;            -- enable auto refresh
```

(Only the table owner or higher can run refresh/add/remove; you also need `USAGE` on the stage and file format). ([Snowflake Documentation][2])

---

# 3) How to *inspect* what Snowflake actually knows about files

* List currently registered files (and see ETag / MD5):

```sql
SELECT * 
FROM TABLE(information_schema.external_table_files(TABLE_NAME => 'MY_EXT_TABLE'));
```

This returns `FILE_NAME`, `REGISTERED_ON`, `FILE_SIZE`, `LAST_MODIFIED`, `ETAG`, `MD5`. Use this to confirm what files Snowflake *thinks* exist. ([Snowflake Documentation][3])

* See the *history* of registration events (what was REGISTERED / UNREGISTERED / REGISTERED\_UPDATE / FAILS):

```sql
SELECT * 
FROM TABLE(information_schema.external_table_file_registration_history(
       TABLE_NAME => 'MY_EXT_TABLE',
       START_TIME => DATEADD('hour', -24, CURRENT_TIMESTAMP())
));
```

The `OPERATION_STATUS` column tells you `REGISTERED_NEW`, `REGISTERED_UPDATE`, `UNREGISTERED`, `REGISTER_FAILED`, etc. This is your audit trail for metadata refreshes. ([Snowflake Documentation][4])

* Check the hidden pipe status used for auto-refresh (Snowpipe internal):

```sql
SELECT SYSTEM$EXTERNAL_TABLE_PIPE_STATUS('MYDB.PUBLIC.MY_EXT_TABLE');
```

This gives JSON about execution state, pendingFileCount, notificationChannel, lastReceivedMessageTimestamp — very handy when auto refresh looks broken. ([Snowflake Documentation][5])

---

# 4) Story time — “Daily reports” (add / delete / update scenarios explained)

Imagine: your biz team dumps a daily file to `s3://company/data/daily/2025-09-05/file.csv`, and you have an external table `reports.daily`.

## A) **Add a new file** (common)

**What you do:** upload file to S3 under the external-table path.

**What Snowflake sees (immediately):**

* **If AUTO\_REFRESH = FALSE** (or notifications not set): nothing changes. The external table *metadata* still does not list the new file until someone runs `ALTER EXTERNAL TABLE ... REFRESH` or explicitly `ADD FILES`. So the file won’t show in `EXTERNAL_TABLE_FILES` until refresh. ([Snowflake Documentation][2])

* **If AUTO\_REFRESH = TRUE** and S3 event notifications / SQS are correctly configured: Snowflake’s hidden pipe receives event messages and will queue a metadata refresh automatically. Use `SYSTEM$EXTERNAL_TABLE_PIPE_STATUS` to confirm activity. ([Snowflake Documentation][6])

**Command to force it right now:**

```sql
ALTER EXTERNAL TABLE reports.daily REFRESH;
```

---

## B) **Delete a file** (the tricky one you asked about)

**Scenario:** Someone deletes `s3://company/data/daily/2025-09-03/file.csv`.

**Key points you must know (and why folks get surprised):**

1. **Snowflake keeps file-level metadata until you refresh.** If the external table metadata is not refreshed after the deletion, `EXTERNAL_TABLE_FILES` will still show the deleted file as *registered* (Snowflake’s metadata wasn’t updated yet). After you run `ALTER EXTERNAL TABLE ... REFRESH`, the registration history will show an `UNREGISTERED` event for that file. ([Snowflake Documentation][3])

2. **Even after physical deletion, queries might still return the old rows because of caching — three caches to understand:**

   * **Result cache (Cloud Services layer)**: if someone ran the *same* query recently and the result is still in the **result cache** (persisted results), Snowflake will return that cached result instantly without re-scanning files. This makes the query appear to still "see" the deleted data. The result cache is used when the query text is identical and Snowflake believes underlying data hasn’t changed. Result cache lives \~24 hours (with extensions). You can control it with `USE_CACHED_RESULT`. ([Snowflake Documentation][7])

   * **Metadata cache (Cloud Services layer)**: Snowflake’s cloud control plane caches file metadata (what files are registered). If metadata hasn’t been refreshed, Snowflake still thinks the deleted file exists and will plan to read it. (Refreshing syncs the metadata store.) ([Snowflake Community][8], [Snowflake Documentation][2])

   * **Warehouse (local) cache (compute layer / SSD)**: when a virtual warehouse previously read a file, it can cache blocks locally. If the same warehouse is still running and the data blocks are in its cache, subsequent queries can read those cached blocks without contacting S3. Note: the warehouse cache is dropped when the warehouse suspends. ([Snowflake Documentation][9])

**So when will a query fetch data from the cloud services cache vs remaining S3 files?**

* **It will return cached results (cloud services result cache) when**: the query text is identical to a recently executed query whose result is still in cache and metadata indicates “no underlying change”. In that case Snowflake doesn’t read files at all (fast, zero compute). ([Snowflake Documentation][7])

* **It will read from the warehouse local cache (not contacting S3) when**: the same warehouse previously read that file and the local cache still holds the blocks (warehouse not suspended / cache not evicted). This results in data returned even if S3 file was later deleted — but only for queries executed on that warehouse and while the cache exists. ([Snowflake Documentation][9])

* **It will fetch directly from S3 (or skip the file) when**: the query is not using the result cache and the warehouse cache doesn't contain the data. If metadata still lists the file, Snowflake will attempt to read it from S3: if the file is truly gone, Snowflake’s behavior depends on the error — the docs say if Snowflake encounters an error scanning a file it may skip it and continue scanning other files (query might partially succeed). If the metadata has been refreshed and the file is unregistered, Snowflake will not attempt to read that file — it will read only the remaining registered files. ([Snowflake Documentation][1])

**Practical flow when you see stale rows after a deletion:**

1. Check if result cache was used (`USE_CACHED_RESULT` or repeat the query with `ALTER SESSION SET USE_CACHED_RESULT = FALSE;`). If cached, run with caching disabled to force real read. ([Snowflake Documentation][7])
2. Run `SELECT * FROM TABLE(information_schema.external_table_files(TABLE_NAME=>'MY_EXT_TABLE'));` to see registered files and `EXTERNAL_TABLE_FILE_REGISTRATION_HISTORY` to see if an UNREGISTERED event exists. ([Snowflake Documentation][3])
3. If metadata still lists the file, `ALTER EXTERNAL TABLE my_ext_table REFRESH;` to sync metadata. ([Snowflake Documentation][2])
4. If you still see deleted data, ensure you’re not reading from a warehouse's local SSD cache (suspend/resume the warehouse to purge local cache or run on a different warehouse). ([Snowflake Documentation][9])

---

## C) **Update (overwrite) a file**

**Action:** an existing file `path/f1.parquet` is overwritten with new contents (same path/name).

**What Snowflake does:**

* When metadata is refreshed, the registration history will show `REGISTERED_UPDATE` for that file and `EXTERNAL_TABLE_FILES` will report new `ETAG` / `MD5` values. Use those to detect content changes. ([Snowflake Documentation][4])

**How to perform a safe replace (atomic at metadata level):**

```sql
BEGIN;
  ALTER EXTERNAL TABLE my_ext_table REMOVE FILES ('path/f1.parquet');
  ALTER EXTERNAL TABLE my_ext_table ADD FILES ('path/f1.parquet'); -- newly uploaded file
COMMIT;
```

Using an explicit transaction keeps the metadata change consistent for readers. The `REGISTERED_UPDATE` operation appears if Snowflake detects a change during refresh. ([Snowflake Documentation][2])

**If you rely on AUTO\_REFRESH:** event notification should trigger a refresh that picks up the update. But if you don’t have notifications set, run manual `REFRESH` to pick up the new hash/register an update. ([Snowflake Documentation][6])

---

# 5) Short troubleshooting playbook (what to run when things go wrong)

1. Did the new/updated/deleted file land in S3? → confirm with `aws s3 ls` (outside Snowflake).
2. What does Snowflake *think*?

   ```sql
   SELECT * FROM TABLE(information_schema.external_table_files(TABLE_NAME=>'MY_EXT_TABLE'));
   SELECT * FROM TABLE(information_schema.external_table_file_registration_history(
       TABLE_NAME => 'MY_EXT_TABLE',
       START_TIME => DATEADD('hour', -12, CURRENT_TIMESTAMP())
   ));
   SELECT SYSTEM$EXTERNAL_TABLE_PIPE_STATUS('MYDB.PUBLIC.MY_EXT_TABLE');
   ```

   ([Snowflake Documentation][3])
3. Force metadata sync if auto refresh not configured:
   `ALTER EXTERNAL TABLE my_ext_table REFRESH;` ([Snowflake Documentation][2])
4. Check whether result cache used: temporarily disable cached results and rerun:
   `ALTER SESSION SET USE_CACHED_RESULT = FALSE;` — rerun query. ([Snowflake Documentation][7])
5. If data still “stuck”, suspend/resume warehouse (clears local SSD cache) or run query on a different warehouse. ([Snowflake Documentation][9])
6. If you need to explicitly un-register a file: `ALTER EXTERNAL TABLE my_ext_table REMOVE FILES ('path/that/file');` ([Snowflake Documentation][2])

---

# 6) Best practices & things smart teams do

* **Use S3/GCS/Azure event notifications + `AUTO_REFRESH = TRUE`** so metadata syncs automatically. (Default is TRUE but requires event notifications configured.) ([Snowflake Documentation][6])
* **Use `EXTERNAL_TABLE_FILE_REGISTRATION_HISTORY` and `EXTERNAL_TABLE_FILES` regularly** in alerts to detect registration failures. ([Snowflake Documentation][4])
* **If you need transactional consistency for replaces**, use explicit `BEGIN; REMOVE/ADD FILES; COMMIT;` so you control when metadata switches. ([Snowflake Documentation][2])
* **For frequent updates / ACID needs**, consider Iceberg-managed tables on object storage (Snowflake supports Iceberg) — external tables are read-only and not meant for heavy update semantics. ([Snowflake Documentation][10])
* **If performance matters**, copy important files into native Snowflake tables (or use materialized views on the external table) — external queries scan object storage and are generally slower. ([Snowflake Documentation][1])

---

# 7) Quick command cheat-sheet (copy / paste friendly)

```sql
-- refresh metadata
ALTER EXTERNAL TABLE db.schema.ext_table REFRESH;
-- refresh only a prefix
ALTER EXTERNAL TABLE db.schema.ext_table REFRESH '2025/09/';

-- add/remove specific files (manual control)
ALTER EXTERNAL TABLE db.schema.ext_table ADD FILES ('path/file1.parquet', 'path/file2.parquet');
ALTER EXTERNAL TABLE db.schema.ext_table REMOVE FILES ('path/file1.parquet');

-- check registered files and MD5/ETAG
SELECT * FROM TABLE(information_schema.external_table_files(TABLE_NAME=>'DB.SCHEMA.EXT_TABLE'));

-- check registration history (REGISTERED_UPDATE, UNREGISTERED, etc)
SELECT * FROM TABLE(information_schema.external_table_file_registration_history(
    TABLE_NAME => 'DB.SCHEMA.EXT_TABLE',
    START_TIME => DATEADD('day', -1, CURRENT_TIMESTAMP())
));

-- check auto-refresh (hidden pipe) status
SELECT SYSTEM$EXTERNAL_TABLE_PIPE_STATUS('DB.SCHEMA.EXT_TABLE');

-- temporarily disable cached results to force real reads
ALTER SESSION SET USE_CACHED_RESULT = FALSE;
```

Sources: Snowflake docs for `ALTER EXTERNAL TABLE`, `EXTERNAL_TABLE_FILES`, `EXTERNAL_TABLE_FILE_REGISTRATION_HISTORY`, `SYSTEM$EXTERNAL_TABLE_PIPE_STATUS`, and caching docs. ([Snowflake Documentation][2])

---

# 8) Common questions you should be able to answer (and short answers)

(Practice answering these out loud — I’d grade you on clarity.)

1. **How does Snowflake know which files belong to an external table?**
   Snowflake stores file-level metadata in the external table metadata (names, etag/md5, last modified, registered\_on). This is updated by `ALTER EXTERNAL TABLE ... REFRESH` or by AUTO\_REFRESH via cloud events. ([Snowflake Documentation][3])

2. **What happens if you delete or overwrite a file in S3 that an external table references?**
   Metadata must be refreshed to reflect deletion/overwrite. Registration history will show `UNREGISTERED` or `REGISTERED_UPDATE`. Until refresh, Snowflake may still try to read the old file (and caches may hide the deletion). ([Snowflake Documentation][4])

3. **Explain Snowflake’s caching layers and how they affect external tables.**
   There are (1) result cache (cloud, returns whole query results), (2) metadata cache (cloud services layer), and (3) warehouse local cache (compute local SSD). These can cause queries to return “stale” data if metadata and caches are not synced/cleared. ([Snowflake Documentation][7], [Snowflake Community][8])

4. **How do you force an external table to forget about a file?**
   Run `ALTER EXTERNAL TABLE ... REMOVE FILES ('path')` or run a `REFRESH` that removes the file automatically. Use `external_table_file_registration_history` to verify `UNREGISTERED`. ([Snowflake Documentation][2])

5. **How to safely replace a file used by an external table without producing inconsistent reads?**
   Use an explicit transaction around `REMOVE FILES` and `ADD FILES` or upload the new file under a new name and then `ADD FILES` and `REMOVE FILES` in a BEGIN/COMMIT. ([Snowflake Documentation][2])

---

# 9) Final checklist you can paste into runbooks

* [ ] Configure cloud event notifications (S3 → SNS → Snowflake SQS or GCS Pub/Sub) and set `AUTO_REFRESH = TRUE`. ([Snowflake Documentation][6])
* [ ] Add monitoring: daily check of `external_table_file_registration_history` for FAILED statuses. ([Snowflake Documentation][4])
* [ ] When adding/removing/updating: prefer `ALTER EXTERNAL TABLE ... REFRESH` (or ADD/REMOVE FILES inside a transaction for replace). ([Snowflake Documentation][2])
* [ ] If stale data observed: disable result cache (`USE_CACHED_RESULT = FALSE`) and/or suspend/resume warehouse to clear local cache, and run `REFRESH`. ([Snowflake Documentation][7])

---



[1]: https://docs.snowflake.com/en/user-guide/tables-external-intro?utm_source=chatgpt.com "Introduction to external tables | Snowflake Documentation"
[2]: https://docs.snowflake.com/en/sql-reference/sql/alter-external-table "ALTER EXTERNAL TABLE | Snowflake Documentation"
[3]: https://docs.snowflake.com/en/sql-reference/functions/external_table_files "EXTERNAL_TABLE_FILES | Snowflake Documentation"
[4]: https://docs.snowflake.com/en/sql-reference/functions/external_table_registration_history "EXTERNAL_TABLE_FILE_REGISTRATION_HISTORY | Snowflake Documentation"
[5]: https://docs.snowflake.com/en/sql-reference/functions/system_external_table_pipe_status "SYSTEM$EXTERNAL_TABLE_PIPE_STATUS | Snowflake Documentation"
[6]: https://docs.snowflake.com/en/user-guide/tables-external-s3?utm_source=chatgpt.com "Refresh external tables automatically for Amazon S3"
[7]: https://docs.snowflake.com/en/user-guide/querying-persisted-results?utm_source=chatgpt.com "Using Persisted Query Results | Snowflake Documentation"
[8]: https://community.snowflake.com/s/article/Caching-in-the-Snowflake-Cloud-Data-Platform?utm_source=chatgpt.com "Caching in the Snowflake Cloud Data Platform"
[9]: https://docs.snowflake.com/en/user-guide/performance-query-warehouse-cache "Optimizing the warehouse cache | Snowflake Documentation"
[10]: https://docs.snowflake.com/en/user-guide/tables-iceberg?utm_source=chatgpt.com "Apache Iceberg™ tables | Snowflake Documentation"



---

# 1. Why partitions in external tables?

Imagine your company stores **10 million JSON files** in S3 under this structure:

```
s3://company/logs/year=2025/month=09/day=05/file1.json
s3://company/logs/year=2025/month=09/day=05/file2.json
s3://company/logs/year=2025/month=09/day=06/file1.json
...
```

If you query an external table without partitions:

```sql
SELECT COUNT(*) FROM logs_ext;
```

Snowflake must scan **all registered files** — millions — even if you only want “yesterday’s logs”. That’s slow and costly.

👉 **Partitioning solves this by telling Snowflake which files belong to which logical partitions** (year, month, day). Snowflake can prune partitions and only read a fraction of files.

---

# 2. How to define a partitioned external table

Example: S3 data organized by folder path:

```
s3://company/logs/year=2025/month=09/day=05/file1.json
```

Create external table with partition columns:

```sql
CREATE OR REPLACE EXTERNAL TABLE logs_ext (
   log_data VARIANT,
   year     STRING AS (metadata$external_table_partition['year']),
   month    STRING AS (metadata$external_table_partition['month']),
   day      STRING AS (metadata$external_table_partition['day'])
)
PARTITION BY (year, month, day)
LOCATION = @my_s3_stage/logs/
FILE_FORMAT = (TYPE = JSON);
```

### Explanation:

* `metadata$external_table_partition` → special object that maps to directory-partitioned values (`year=2025`, `month=09`).
* Snowflake stores partitions at metadata-level.
* When you run:

  ```sql
  SELECT COUNT(*) FROM logs_ext WHERE year = '2025' AND month = '09' AND day = '05';
  ```

  Snowflake prunes metadata and only reads files under `year=2025/month=09/day=05`.
  ✅ No need to scan millions of files.

---

# 3. Are partitions just for “pattern queries”?

Your assumption:

> “Basically to query a particular pattern of files, we need partition (Am I correct?)”

✔️ You’re **half right**.

* Yes, partitions allow us to **query specific subsets of files** efficiently (e.g., `WHERE year='2025' AND month='09'`).
* But also, partitions **reduce metadata scan cost**: without partitions, Snowflake must evaluate *all file names* to find matches. With partitions, metadata is hierarchical and pruned faster.

So partitions = performance + efficiency + manageability.

---

# 4. Querying specific files

Snowflake gives us **pseudo-columns** for file inspection:

* `metadata$filename` → full file path
* `metadata$file_row_number` → row number within file
* `metadata$file_last_modified` → last modified timestamp

👉 To query a **single file**:

```sql
SELECT *
FROM logs_ext
WHERE metadata$filename LIKE '%day=05/file1.json';
```

👉 To query using **partition pruning + file name**:

```sql
SELECT *
FROM logs_ext
WHERE year = '2025'
  AND month = '09'
  AND day = '05'
  AND metadata$filename LIKE '%file1.json';
```

This is super handy when debugging a single corrupted file.

---

# 5. Other partition-related functions / pseudo-columns

Besides `metadata$external_table_partition`, Snowflake provides:

* `metadata$filename` → full path
* `metadata$file_row_number` → row within file (great for dedup testing)
* `metadata$file_last_modified` → timestamp for freshness checks
* `metadata$file_size` (available via `information_schema.external_table_files`)

These are very helpful for **auditing, partition management, or debugging**.

---

# 6. How to add partitions manually

Normally, Snowflake infers partitions from folder structure during `REFRESH`.
But sometimes you need **manual partitions**.

For example:
Suppose your company has **quarterly financial data** stored by quarter:

```
s3://company/finance/2025-Q1/report.parquet
s3://company/finance/2025-Q2/report.parquet
```

But you want to define partitions manually (`quarter`) instead of auto-extracting.

```sql
CREATE OR REPLACE EXTERNAL TABLE finance_ext (
   report VARIANT,
   quarter STRING
)
PARTITION BY (quarter)
LOCATION=@my_s3_stage/finance/
FILE_FORMAT=(TYPE=PARQUET);
```

Now, add partitions manually:

```sql
ALTER EXTERNAL TABLE finance_ext ADD PARTITION (quarter='2025-Q1')
   LOCATION=@my_s3_stage/finance/2025-Q1/;

ALTER EXTERNAL TABLE finance_ext ADD PARTITION (quarter='2025-Q2')
   LOCATION=@my_s3_stage/finance/2025-Q2/;
```

Check them:

```sql
SELECT * FROM TABLE(information_schema.external_table_partitions(
    TABLE_NAME=>'FINANCE_EXT'
));
```

---

# 7. Purpose of `metadata$external_table_partition`

This pseudo-column **stores partition values** assigned during registration.
Think of it as a **dictionary of partition keys/values**:

Example row:

```json
metadata$external_table_partition = {
   "year":"2025",
   "month":"09",
   "day":"05"
}
```

You can extract keys into real columns (`year`, `month`, `day`) when defining the table. Without this, partitions wouldn’t be queryable.

---

# 8. Big gotcha — Refresh with manual partitions

❌ **Important limitation**:

> “Operation external table refresh not supported for external tables with user-specified partition”

Why?

* Auto-refresh/refresh scans the whole external stage and automatically detects new partitions/files.
* If you defined **manual partitions**, Snowflake doesn’t know how to refresh automatically (you told it explicitly what partitions exist).

👉 So, for manual partitioned external tables:

* You must use `ALTER EXTERNAL TABLE ... ADD PARTITION` and `REMOVE PARTITION` manually.
* No `REFRESH` allowed.

This prevents conflicts between “manual control” vs “auto-discovery”.

---

# 9. Real use-case story — Marketing campaign data

Your marketing team uploads campaign data daily in this folder:

```
s3://company/campaigns/country=US/year=2025/month=09/day=05/file.json
s3://company/campaigns/country=CA/year=2025/month=09/day=05/file.json
```

* If you create an **auto-partitioned table** (using `metadata$external_table_partition`), you just run `REFRESH` and new partitions are auto-detected.
* If you instead use **manual partitions**, you’d need to run `ALTER EXTERNAL TABLE ... ADD PARTITION (country='US', year='2025', month='09', day='05')` for each new partition every day. That’s painful.

👉 Rule of thumb:

* Use **auto-partition** when data is continuously added (logs, events).
* Use **manual partition** when data is structured by fixed business periods (quarters, fiscal years) and you want control.

---

# 10. Cheat sheet of partition queries

```sql
-- See registered partitions
SELECT * 
FROM TABLE(information_schema.external_table_partitions(
    TABLE_NAME=>'LOGS_EXT'
));

-- Query by partition keys
SELECT COUNT(*)
FROM logs_ext
WHERE year='2025' AND month='09';

-- Query by filename
SELECT *
FROM logs_ext
WHERE metadata$filename LIKE '%2025-09-05/file1.json';

-- Add a manual partition
ALTER EXTERNAL TABLE finance_ext
  ADD PARTITION (quarter='2025-Q3')
  LOCATION=@my_s3_stage/finance/2025-Q3/;

-- Remove a partition
ALTER EXTERNAL TABLE finance_ext
  DROP PARTITION (quarter='2025-Q2');
```

---

# 11. Questions you should be ready to answer

1. Why do we need partitions in external tables?
2. Difference between auto-partitioning (using `metadata$external_table_partition`) vs manual partitioning.
3. Why is `REFRESH` not allowed on manually partitioned external tables?
4. How do you query a single file from an external table?
5. What is `metadata$filename` vs `metadata$external_table_partition`?
6. When would you choose manual partitioning over auto-partitioning?

---



---

## 🔑 Questions You Should Be Ready to Answer

---

### **1. What is an External Table in Snowflake? Why is it used?**

**Answer:**

* An external table is a schema object that references data stored in external stages (e.g., S3, Azure Blob, GCS).
* It lets you **query semi-structured or structured files without loading them into Snowflake tables.**
* **Use cases:**

  * Data lake architecture (query data “in place”).
  * Quick exploration of raw data.
  * Avoid unnecessary storage costs when data doesn’t need to be in Snowflake permanently.

---

### **2. Why do we need to refresh metadata for external tables?**

**Answer:**

* Snowflake doesn’t **automatically track new/removed/updated files** in S3.
* When files are added, removed, or updated → the **external table’s metadata registry** is stale.
* Refresh ensures Snowflake knows about file states (REGISTERED, UNREGISTERED, REGISTERED\_UPDATE).

**Command:**

```sql
ALTER EXTERNAL TABLE my_ext_table REFRESH;
```

---

### **3. What happens if you delete a file from S3 but don’t refresh the external table?**

**Answer:**

* The **registry still shows the file**.
* If queried soon after deletion, Snowflake may still **return data from the cloud services cache layer** (Snowflake’s internal cache).
* Eventually, if refresh is done, the file becomes **UNREGISTERED** in the metadata.

---

### **4. Explain caching behavior with external tables.**

**Answer:**

* **Case 1: File deleted, but query runs before metadata refresh**

  * Snowflake may serve data from the **cache layer** (query still works temporarily).
* **Case 2: File deleted, after metadata refresh**

  * File shows as **UNREGISTERED**, and queries will only read from remaining files in S3.

---

### **5. What happens if you update a file in S3?**

**Answer:**

* Snowflake does not auto-detect updates.
* On refresh, the file status changes to **REGISTERED\_UPDATE**.
* File’s hash value changes → Snowflake recognizes new content.

---

### **6. Why do we need partitions in external tables?**

**Answer:**

* Imagine **10M log files across years** stored in S3.
* Querying all files every time is expensive and slow.
* **Partitioning** (by date, region, etc.) helps Snowflake **prune irrelevant files** and only scan what’s needed.

**Example:**

```sql
CREATE EXTERNAL TABLE logs_ext
  ( event_time TIMESTAMP, log_data VARIANT )
  PARTITION BY (to_date(split_part(metadata$filename, '/', 2)))
  LOCATION = @my_stage/logs;
```

---

### **7. Is partitioning only for file filtering?**

**Answer:**

* Mostly yes, but also for **performance scaling**.
* By partitioning, queries only touch **a subset of files**, saving costs and improving performance.
* Example:

  * Querying all logs from `2025-09-06` doesn’t need to scan logs from 2024.

---

### **8. How can you query a single file in an external table?**

**Answer:**
Using `METADATA$FILENAME`:

```sql
SELECT *
FROM logs_ext
WHERE METADATA$FILENAME LIKE '%2025-09-06-log1.json';
```

---

### **9. How to add partitions manually to an external table?**

**Answer:**

* Normally partitions are derived from file paths.
* But in some cases, you need **user-specified partitions** (e.g., multiple datasets in the same folder).
* Use `ALTER EXTERNAL TABLE ... ADD PARTITION`.

**Example:**

```sql
ALTER EXTERNAL TABLE logs_ext
  ADD PARTITION (region='US', dt='2025-09-06')
  LOCATION=@my_stage/logs/US/2025/09/06/;
```

---

### **10. What is `METADATA$EXTERNAL_TABLE_PARTITION`?**

**Answer:**

* A **system metadata column** that shows partition values.
* Useful for debugging, monitoring, and verifying which partition is being scanned.

**Example:**

```sql
SELECT DISTINCT METADATA$EXTERNAL_TABLE_PARTITION
FROM logs_ext;
```

---

### **11. Why can’t you refresh a manually partitioned external table?**

**Answer:**

* Snowflake doesn’t know how to automatically scan for new partitions when you define them manually.
* That’s why you get:

  > “Operation external table refresh not supported for external tables with user-specified partition.”
* You must **manually add/drop partitions**.

---

### **12. Real Case Scenario: External Table in a Data Lake Pipeline**

Imagine you’re a data engineer at an e-commerce company.

* Daily transaction logs go into S3: `s3://company-logs/year/month/day/transactions.json`.
* You create an external table partitioned by date.
* **Case 1: New file arrives** → refresh to register it.
* **Case 2: Old file deleted** → refresh → marked UNREGISTERED.
* **Case 3: File updated** → refresh → REGISTERED\_UPDATE.
* **Case 4: Data scientist queries “yesterday’s data”** → partition pruning kicks in, only scans that folder.

This saves **costs** and improves **performance** massively in production.

---

✅ By being ready with these questions, you’ll be confident handling **any real-world external table discussion** — both technical fundamentals and practical scenarios.

---
