A GitHub Action workflow for automating a WARN Act notice ETL pipeline.
The extract, transform and load Action runs every few hours. It does the following:
- 🔪 Gather raw WARN Act notices from all of our sources with warn-scraper
- 🪢 Consolidate the raw files into a single, standardized dataset with warn-transformer
- ⏫ Upload the files to our archive on biglocalnews.org with upload-files
- 📟 Send Slack and Teams alerts
flowchart TB
subgraph Extract
A[Scrape sources] --> B[Commit to source-specific branches]
B --> C[Upload raw files to biglocalnews.org]
end
subgraph Transform
subgraph Consolidate
D[Download raw files from biglocalnews.org] --> E[Merge into a single file]
end
subgraph Integrate
F[Reconcile latest data with current database]
F --> G[Identify any additions and amendments]
end
end
subgraph Load
H[Commit transformed files to `transformer` branch] --> I[Upload transformed files to biglocalnews.org]
end
subgraph Alert
subgraph Members
L[Forward new notices via Slack and Teams bots]
end
subgraph Administrators
J[Post status report to Big Local News Slack]
end
end
Extract --> Transform
Consolidate --> Integrate
Transform --> Load
Load --> Alert
The project is sponsored by Big Local News, a program at Stanford University that collects data for impactful journalism. The code is maintained by Ben Welsh, a visiting data journalist from the Los Angeles Times.