

---

# 🌟 Understanding Snowflake Tasks: The Story of “Daily Sales ETL Automation”

Imagine you’re working in **IQVIA**, where your Snowflake warehouse stores **sales data from multiple regions**.
Every night, new sales files land in an **S3 bucket**, loaded into a **raw staging table** through a **Snowpipe**.
Now, once that data arrives, you want to:

1. **Transform** it (cleaning, aggregating, joining, etc.)
2. **Load** it into your **analytics-ready fact table**
3. **Update reports** every morning at 6 AM

But you face one problem —
you don’t want to log in every night at 2 AM to run `INSERT INTO ... SELECT ...`. 😴

That’s where **Snowflake Tasks** come to the rescue.

---

## 🧠 What Are Snowflake Tasks?

> A **Task** in Snowflake is a *scheduled* SQL operation (or series of operations) that runs automatically — similar to a **cron job** or **Airflow DAG**, but *natively inside Snowflake*.

✅ You can use it to:

* Schedule SQL statements (like INSERT, UPDATE, MERGE, CALL procedure)
* Automate data pipelines
* Create dependencies between tasks (like workflows or DAGs)
* Chain tasks for ETL/ELT jobs

---

## 🎯 Purpose of Tasks

The **purpose** of Snowflake Tasks is to **automate data processing workflows** directly inside the Snowflake ecosystem.

Without tasks, you’d need:

* An external scheduler (Airflow, Azure Data Factory, etc.)
* Custom scripts to trigger SQL commands

With **tasks**, you can:

* Schedule recurring SQL logic inside Snowflake itself
* Reduce dependency on external orchestrators
* Control and monitor workflows easily using **TASK HISTORY**

---

## 🚨 The Problem They Solve

Let’s take our scenario again.

### Before Tasks

You had to:

* Write SQL transformations manually
* Run them daily using Airflow, or manually
* Monitor failures via external tools

### After Tasks

Now you can:

* Schedule Snowflake to automatically run your ETL SQL
* Chain multiple steps (load → transform → aggregate → report refresh)
* View logs using **TASK_HISTORY**
* Automatically pause, resume, or reschedule when needed

You’ve just automated your data pipeline *within Snowflake* — no external dependency.

---

## ⚙️ How to Create a Task — Step by Step

Let’s build one for our “Daily Sales ETL”.

### 🧩 Step 1: Create the Task

```sql
CREATE OR REPLACE TASK DAILY_SALES_ETL_TASK
  WAREHOUSE = COMPUTE_WH
  SCHEDULE = 'USING CRON 0 6 * * * Asia/Dhaka'
  AS
  INSERT INTO FACT_SALES (SELECT * FROM STAGING_SALES WHERE LOAD_DATE = CURRENT_DATE());
```

### 🧠 Explanation

| Clause                        | Description                                                                 |
| ----------------------------- | --------------------------------------------------------------------------- |
| `CREATE OR REPLACE TASK`      | Defines a new task                                                          |
| `WAREHOUSE = COMPUTE_WH`      | Specifies which warehouse will run the SQL                                  |
| `SCHEDULE = 'USING CRON ...'` | Defines when it runs (like cronjob syntax)                                  |
| `AS`                          | The actual SQL logic (can be `INSERT`, `MERGE`, `CALL` a stored proc, etc.) |

This means every day at **6 AM (Dhaka time)**, this task will:

* Wake up the **COMPUTE_WH** warehouse
* Run the SQL inside it
* Load daily sales data into your fact table

---

## 🕒 Scheduling a Task

You can schedule using two formats:

### 1️⃣ Using CRON

```sql
SCHEDULE = 'USING CRON 0 6 * * * Asia/Dhaka'
```

→ Runs daily at 6 AM Dhaka time.

### 2️⃣ Using INTERVAL

```sql
SCHEDULE = '1 HOUR'
```

→ Runs every 1 hour after last successful execution.

---

## 💤 Suspending a Task

Sometimes you need to pause a task temporarily — maybe for maintenance or debugging.

```sql
ALTER TASK DAILY_SALES_ETL_TASK SUSPEND;
```

To resume:

```sql
ALTER TASK DAILY_SALES_ETL_TASK RESUME;
```

🧩 *Tip:*
When you create a task, it starts in **SUSPENDED** state by default.
You need to **resume** it to activate.

---

## 📜 Checking Task History

To check when and how a task ran:

```sql
SELECT *
FROM TABLE(INFORMATION_SCHEMA.TASK_HISTORY(
  TASK_NAME => 'DAILY_SALES_ETL_TASK',
  RESULT_LIMIT => 10
));
```

This will show:

* Run start & end time
* Status (SUCCESS / FAILED)
* Query ID (to trace in query history)
* Error message (if any)

✅ Perfect for debugging and monitoring.

---

## 🧩 Creating a Task Workflow (Task Dependency)

Here’s where things get exciting.
Snowflake allows **dependent tasks**, where one task triggers **another** after it finishes successfully.

Imagine this flow:

```
RAW → STAGING → FACT → REPORT
```

Each arrow means:

* When RAW load completes, STAGING starts
* When STAGING completes, FACT starts
* When FACT completes, REPORT updates

We can model that in Snowflake.

---

### 🧠 Step 1: Create the Root Task

```sql
CREATE OR REPLACE TASK LOAD_RAW_SALES
  WAREHOUSE = COMPUTE_WH
  SCHEDULE = 'USING CRON 0 2 * * * Asia/Dhaka'
  AS
  INSERT INTO RAW_SALES
  SELECT * FROM EXTERNAL_STAGE;
```

---

### 🧠 Step 2: Create a Child Task (Depends On Parent)

```sql
CREATE OR REPLACE TASK TRANSFORM_STAGING
  WAREHOUSE = COMPUTE_WH
  AFTER LOAD_RAW_SALES
  AS
  INSERT INTO STAGING_SALES
  SELECT * FROM RAW_SALES WHERE LOAD_DATE = CURRENT_DATE();
```

---

### 🧠 Step 3: Another Child Task (Chain Further)

```sql
CREATE OR REPLACE TASK AGGREGATE_FACT
  WAREHOUSE = COMPUTE_WH
  AFTER TRANSFORM_STAGING
  AS
  INSERT INTO FACT_SALES
  SELECT REGION, SUM(SALES_AMOUNT)
  FROM STAGING_SALES
  GROUP BY REGION;
```

---

### 🧠 Step 4: Final Task for Reporting

```sql
CREATE OR REPLACE TASK REFRESH_REPORTS
  WAREHOUSE = COMPUTE_WH
  AFTER AGGREGATE_FACT
  AS
  CALL REFRESH_DAILY_REPORT();
```

---

### 🧭 Graph Visualization of the Workflow

Here’s the logical **Task Graph**:

```
          ┌──────────────────┐
          │  LOAD_RAW_SALES  │  (Root Task)
          └────────┬─────────┘
                   ↓
         ┌────────────────────┐
         │ TRANSFORM_STAGING  │
         └────────┬───────────┘
                   ↓
         ┌────────────────────┐
         │  AGGREGATE_FACT    │
         └────────┬───────────┘
                   ↓
         ┌────────────────────┐
         │  REFRESH_REPORTS   │
         └────────────────────┘
```

👉 The flow automatically continues once each parent finishes successfully.
👉 You only need to **schedule the root task**, others are triggered automatically.

---

## 🧩 Task Dependency Behavior

| Concept              | Description                                                                 |
| -------------------- | --------------------------------------------------------------------------- |
| **Root Task**        | Has a schedule defined                                                      |
| **Child Task**       | Uses `AFTER` clause to define dependency                                    |
| **Failure Handling** | Child tasks only run if parent succeeds                                     |
| **Graph Limit**      | One root can have multiple children, forming a DAG (Directed Acyclic Graph) |

---

## ⚡️ Most Used Variations of Tasks

| Task Type                     | Description                             | Example                                   |
| ----------------------------- | --------------------------------------- | ----------------------------------------- |
| **Single Task**               | Independent scheduled SQL               | Daily ETL job                             |
| **Chained Tasks (Workflow)**  | Dependent tasks forming DAG             | Multi-step ETL                            |
| **Procedure Task**            | Calls a stored procedure                | `AS CALL my_procedure();`                 |
| **Continuous Task**           | Runs based on parent task completion    | `AFTER parent_task`                       |
| **Stream + Task Combination** | Automates incremental load from streams | Use `SELECT * FROM my_stream` inside task |

---

## 🔍 Real-World Example — Stream + Task

You have a stream that captures changes from `STG_CUSTOMER`.
You want those changes to be automatically merged into your `DIM_CUSTOMER` table.

```sql
CREATE OR REPLACE TASK CUSTOMER_DIM_UPSERT
  WAREHOUSE = COMPUTE_WH
  SCHEDULE = '1 HOUR'
  AS
  MERGE INTO DIM_CUSTOMER d
  USING STG_CUSTOMER_STREAM s
  ON d.CUST_ID = s.CUST_ID
  WHEN MATCHED THEN UPDATE SET ...
  WHEN NOT MATCHED THEN INSERT (...);
```

This ensures every hour your dimension table stays up-to-date.

---

## 💡 Pro Tips

1. **Monitor using TASK_HISTORY** and **QUERY_HISTORY**
2. **Use stored procedures** for complex multi-step SQL logic
3. **Always test manually before scheduling**
4. **Check task status** in UI: `SHOW TASKS;`
5. **Warehouse suspension:**
   If warehouse is suspended, Snowflake auto-resumes it when task runs
6. **Error retries:**
   Tasks automatically retry 3 times before marking as FAILED

---

## 🧩 Commonly Asked Questions (Must Know)

1. **Can a task call another task directly?**
   → No. Only via dependency (`AFTER`).

2. **What happens if parent task fails?**
   → Child task won’t run.

3. **Can multiple tasks depend on one parent?**
   → Yes, that creates a branching DAG.

4. **Can we alter schedule or SQL of a task?**
   → Yes, using `ALTER TASK` commands.

5. **Can a task run a stored procedure?**
   → Absolutely, and that’s very common in production ETL.

6. **Where can I see all tasks?**
   → `SHOW TASKS;`

7. **What happens if a task runs long?**
   → It runs independently; next scheduled run won’t start until current finishes.

---

## 🧭 Summary

| Concept          | Description                                               |
| ---------------- | --------------------------------------------------------- |
| **Purpose**      | Automate and schedule SQL workflows in Snowflake          |
| **Benefit**      | No external orchestration needed                          |
| **Scheduling**   | CRON or INTERVAL                                          |
| **Dependencies** | Created with `AFTER` clause                               |
| **Monitoring**   | `INFORMATION_SCHEMA.TASK_HISTORY`                         |
| **Use Cases**    | ETL automation, incremental loads, data refresh pipelines |

---
