Skip to content

Scope news-workflow git staging to $ARTICLE_DATE to avoid E003 >100 file PRs#1900

Merged
pethers merged 2 commits intomainfrom
copilot/debug-news-realtime-monitor
Apr 21, 2026
Merged

Scope news-workflow git staging to $ARTICLE_DATE to avoid E003 >100 file PRs#1900
pethers merged 2 commits intomainfrom
copilot/debug-news-realtime-monitor

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 21, 2026

News Realtime Monitor run 24719881413 failed both create_pull_request calls with E003: Cannot create pull request with more than 100 files (received 602/604). gh-aw's create_pull_request.cjs hard-caps patches at MAX_FILES = 100.

Root cause

Three workflows staged articles with archive-wide globs:

# news-realtime-monitor.md (pre-fix)
git add news/*realtime*.html news/*breaking*.html news/*monitor*.html

That glob matches every historical breaking/realtime/monitor article (222+ files × 14 langs). Any earlier step that touched those files on disk (Playwright validation, HTML auto-fix, translation pass) made git add stage the whole archive. The existing 90-file guard only unstaged analysis/data/ + analysis/weekly/, which didn't help when the bulk came from news/.

Changes

  • news-realtime-monitor.md — stage only news/$ARTICLE_DATE-*{breaking,realtime,monitor}*-{en,sv}.html; drop analysis/data/ (MCP cache, 240+ files) and analysis/weekly/ (cumulative rollup) from default stage; extend fallback chain to also unstage analysis/daily/.../documents/ then news/metadata/.

  • news-evening-analysis.md — same date-scoped pattern for *evening-analysis* and *evening* HTML.

  • news-article-generator.md — keep its existing diff-based staging, add the same defensive filter on top.

  • Shared defensive filter (all three) — unstage any news/YYYY-MM-DD-* path whose date ≠ $ARTICLE_DATE:

    git diff --cached --name-only > /tmp/staged_files.txt
    awk -v today="$ARTICLE_DATE" \
        '$0 ~ "^news/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]" && $0 !~ today {print}' \
        /tmp/staged_files.txt > /tmp/historical_news.txt
    [ -s /tmp/historical_news.txt ] && xargs -a /tmp/historical_news.txt git reset HEAD --

    The ^news/YYYY-MM-DD anchor leaves news/metadata/* (non-dated) untouched.

Not changed

The other 8 news workflows (committee-reports, motions, propositions, interpellations, week-ahead, month-ahead, weekly-review, monthly-review, translate) already use date-scoped staging from PR #1867 (run 24653843681, same class of failure).

Lock files regenerated via gh aw compile (v0.69.0). Incidental aw_context_workflows drift in sibling lock files is v0.69 handler-config noise that the Compile Agentic Workflows CI would produce anyway.

Copilot AI linked an issue Apr 21, 2026 that may be closed by this pull request
@github-actions github-actions Bot added the size-xs Extra small change (< 10 lines) label Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: size-xs

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

… >100 file PR failures

Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/c5e3770d-650a-4377-a606-1976c5ea38bd

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@github-actions github-actions Bot added documentation Documentation updates workflow GitHub Actions workflows ci-cd CI/CD pipeline changes news News articles and content generation agentic-workflow Agentic workflow changes size-m Medium change (50-250 lines) labels Apr 21, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copilot AI changed the title [WIP] Debug workflow failure in news realtime monitor Scope news-workflow git staging to $ARTICLE_DATE to avoid E003 >100 file PRs Apr 21, 2026
Copilot AI requested a review from pethers April 21, 2026 12:23
@pethers pethers marked this pull request as ready for review April 21, 2026 12:24
Copilot AI review requested due to automatic review settings April 21, 2026 12:24
@pethers pethers merged commit f14069a into main Apr 21, 2026
13 checks passed
@pethers pethers deleted the copilot/debug-news-realtime-monitor branch April 21, 2026 12:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates agentic news workflows to avoid gh-aw safe-outputs E003 failures by ensuring git add only stages files for the current $ARTICLE_DATE, preventing accidental staging of large historical news/ archives and bulk analysis caches.

Changes:

  • Scope workflow staging patterns to news/$ARTICLE_DATE-* (EN/SV) for realtime-monitor and evening-analysis.
  • Add a defensive “unstage historical dated news files” filter to realtime-monitor, evening-analysis, and the manual article-generator workflow.
  • Regenerate workflow lock files (adds aw_context input + aw_context_workflows drift from compilation).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
.github/workflows/news-realtime-monitor.md Date-scoped staging for realtime/breaking/monitor articles; unstage analysis/data + analysis/weekly; add defensive unstage filter and staged-count remediation.
.github/workflows/news-realtime-monitor.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-evening-analysis.md Date-scoped staging for evening analysis articles; unstage analysis/data; add defensive unstage filter and staged-count remediation.
.github/workflows/news-evening-analysis.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-article-generator.md Add defensive unstage filter to prevent diff-based staging from including historical dated news files.
.github/workflows/news-article-generator.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-propositions.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-motions.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-monthly-review.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-month-ahead.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-interpellations.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).
.github/workflows/news-committee-reports.lock.yml Compiled lock drift (dispatch schema updates incl. aw_context).

Comment on lines +872 to +873
# cases where an earlier bash step accidentally modified historical articles and their
# paths leaked in via `git add news/metadata/` globbing or similar subtle issues.
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment suggests historical news/YYYY-MM-DD-* files could be staged via git add news/metadata/ “globbing”, but git add news/metadata/ can only stage paths under news/metadata/. Consider rewording this to reflect the actual risk (e.g., earlier steps or broader git add patterns staging historical news/ files) to avoid misleading future debugging.

Suggested change
# cases where an earlier bash step accidentally modified historical articles and their
# paths leaked in via `git add news/metadata/` globbing or similar subtle issues.
# cases where an earlier bash step accidentally modified historical articles and broader
# `git add news/...` patterns or other staging steps pulled those paths into the index.

Copilot uses AI. Check for mistakes.
Comment on lines +1031 to 1032
echo "⚠️ $STAGED_COUNT files exceeds safe threshold. Removing weekly analysis."
git reset HEAD -- analysis/weekly/ 2>/dev/null || true
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log message says “Removing weekly analysis” when $STAGED_COUNT > 90, but this script no longer stages analysis/weekly/ in the first place, so this step will typically be a no-op and won’t reduce the staged count. Either stage analysis/weekly/ intentionally (if it’s needed) or adjust the fallback/remediation messaging/steps so they reflect what’s actually staged.

Suggested change
echo "⚠️ $STAGED_COUNT files exceeds safe threshold. Removing weekly analysis."
git reset HEAD -- analysis/weekly/ 2>/dev/null || true
awk '$0 ~ "^analysis/weekly/" {print}' /tmp/staged_files.txt > /tmp/staged_weekly_analysis.txt
if [ -s /tmp/staged_weekly_analysis.txt ]; then
WEEKLY_COUNT=0
awk 'END{print NR}' /tmp/staged_weekly_analysis.txt > /tmp/weekly_count.txt
read WEEKLY_COUNT < /tmp/weekly_count.txt 2>/dev/null || true
echo "⚠️ $STAGED_COUNT files exceeds safe threshold. Unstaging $WEEKLY_COUNT staged analysis/weekly/ files."
git reset HEAD -- analysis/weekly/ 2>/dev/null || true
else
echo "⚠️ $STAGED_COUNT files exceeds safe threshold, but no analysis/weekly/ files are currently staged. Continuing to next remediation step."
fi

Copilot uses AI. Check for mistakes.
Comment on lines +883 to +888
# Enforce safe-outputs 100-file PR limit (hard cap: 100; soft threshold: 90)
git diff --cached --name-only > /tmp/staged_files.txt
awk 'END{print NR}' /tmp/staged_files.txt > /tmp/staged_count.txt
STAGED_COUNT=0
read STAGED_COUNT < /tmp/staged_count.txt 2>/dev/null || true
echo "📊 Staged file count: $STAGED_COUNT (limit: 100)"
Copy link

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section labels 100 as a “hard cap” but the script only performs best-effort reductions when STAGED_COUNT > 90 and will still proceed even if the final staged count remains >100 (in which case safe-outputs PR creation will fail). Consider either adding an explicit > 100 guard that aborts with a clear error, or rewording to clarify that 100 is the safe-outputs tool cap rather than a guarantee enforced here.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agentic-workflow Agentic workflow changes ci-cd CI/CD pipeline changes documentation Documentation updates news News articles and content generation size-m Medium change (50-250 lines) size-xs Extra small change (< 10 lines) workflow GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[aw] News Realtime Monitor failed

3 participants