Helper script + GitHub workflow that scrapes a few Irish left media sources and mirrors them into static RSS feeds you can host yourself (or via GitHub Pages). Each feed keeps a JSON cache so we can retain older entries and avoid duplicates.
| Slug | Source URL | Output file |
|---|---|---|
journal9 |
https://www.thejournal.ie/topic/9-at-9/ | data/journal9.xml |
red_articles |
https://rednetwork.net/articles/ | data/red_articles.xml |
red_theory |
https://rednetwork.net/red-theory/ | data/red_theory.xml |
imr_issue |
https://irishmarxistreview.net/index.php/imr | data/imr_issue.xml |
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtpython3 journal9.py # refresh every feed
python3 journal9.py --source red_theory
python3 journal9.py --dry-run --source journal9
python3 journal9.py --list-sourcesKey flags:
--source <slug>(repeatable) limits the run to specific feeds; defaults to all.--max-items <n>overrides the per-feed history length for this run.--dry-runprints the generated XML instead of writing files/saving history.--list-sourcesshows the available slugs and exits.
All output XML + history JSON lives under data/.
- Push the repo and enable GitHub Pages in Settings → Pages, choosing the
mainbranch and/datafolder. Every XML underdata/will be reachable athttps://<user>.github.io/<repo>/data/<file>.xml. .github/workflows/journal9.yml(legacy name) runs daily at 09:15 UTC and on manual dispatch. It installs deps, executespython3 journal9.py, and commits any changed XML/history files back with the defaultGITHUB_TOKEN.- Kick off the workflow manually once so the feeds exist before sharing URLs with your reader (Inoreader, NetNewsWire, etc.).
- Share the Pages URLs (
.../journal9.xml,.../red_articles.xml,.../red_theory.xml). Every workflow run keeps them fresh automatically.
Prefer running it yourself? Drop a cron entry (example: every day at 09:05 local time):
5 9 * * * /usr/bin/env bash -lc 'cd /home/you/RSS-feed && source .venv/bin/activate && python3 journal9.py'
Point data/ (or symlinked files) at whatever directory your web server exposes.
- Journal.ie content is seeded from their official topic RSS feed and then we scrape the linked article to mirror the nine talking points.
- Red Network sections lack RSS, so we scrape their grid pages, follow the latest post, and mirror the intro paragraphs from
.reader__content. - Irish Marxist Review already exposes article-level RSS, but we scrape the homepage “Current Issue” block so you get one tidy notification per new issue (cover, date, contents).
- The
<description>HTML is entity-escaped to keep the XML simple; mainstream feed readers render it correctly.*** End Patch