# Table of Contents 

- [Cron Jobs (Before Orchestration)](#cron-jobs-before-orchestration)

# Cron Jobs (Before Orchestration)

![Cron Logo](./images/cron.png)

## What is Cron?
**Cron** is a Unix/Linux utility (from the 1970s) that automatically runs commands or scripts on a schedule you define.

---

## How a Cron Job Works
A cron job line has **five timing fields** followed by the command:

    MINUTE(0–59) HOUR(0–23) DAY(1–31) MONTH(1–12) WEEKDAY(0–6)  command

You can use `*` (asterisk) to mean “any value”.

**Examples**
- Run at midnight on Jan 1 every year:

      0 0 1 1 * echo "Happy New Year"

- Run every night at midnight:

      0 0 * * * python ingest_from_rest_api.py

![Cron Fields](./images/cron_work.png)

---

## Before Orchestration: Pure Scheduling with Cron
Teams used to chain data pipeline steps by scheduling **multiple cron jobs** a little apart in time so they’d (hopefully) run in order:

- 12:00 AM → ingest API  
- 01:00 AM → transform  
- 02:00 AM → combine with DB  
- 03:00 AM → load to warehouse

This is a **pure scheduling approach**—no dependency awareness, just timed starts.

---

## Problems with Pure Cron Scheduling
- ❌ No dependency checks (a 1 AM job runs even if the midnight job failed or ran long)  
- ❌ Minimal monitoring/alerting; failures often discovered late  
- ❌ Debugging & observability are DIY (logs, alerts, retries)  
- ❌ Fragile when task durations vary

![Pure Scheduling Drawbacks](./images/pure_scheduiling.png)

---

## When Cron Is Still a Good Fit
- ✅ Simple, independent, recurring tasks (backups, log cleanup, small data fetches)  
- ✅ Quick prototypes where a full orchestrator is overkill  
- ✅ Environments with very light automation needs

**Rule of thumb:** If tasks depend on other tasks finishing, or you need retries, backfills, SLAs, or rich monitoring—use an **orchestration tool** (e.g., Airflow, Prefect, Dagster). Otherwise, Cron is perfectly fine for small periodic jobs.