Scope news-workflow git staging to $ARTICLE_DATE to avoid E003 >100 file PRs#1900
Scope news-workflow git staging to $ARTICLE_DATE to avoid E003 >100 file PRs#1900
Conversation
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: size-xs Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
… >100 file PR failures Agent-Logs-Url: https://github.com/Hack23/riksdagsmonitor/sessions/c5e3770d-650a-4377-a606-1976c5ea38bd Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
There was a problem hiding this comment.
Pull request overview
This PR updates agentic news workflows to avoid gh-aw safe-outputs E003 failures by ensuring git add only stages files for the current $ARTICLE_DATE, preventing accidental staging of large historical news/ archives and bulk analysis caches.
Changes:
- Scope workflow staging patterns to
news/$ARTICLE_DATE-*(EN/SV) for realtime-monitor and evening-analysis. - Add a defensive “unstage historical dated news files” filter to realtime-monitor, evening-analysis, and the manual article-generator workflow.
- Regenerate workflow lock files (adds
aw_contextinput +aw_context_workflowsdrift from compilation).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/news-realtime-monitor.md | Date-scoped staging for realtime/breaking/monitor articles; unstage analysis/data + analysis/weekly; add defensive unstage filter and staged-count remediation. |
| .github/workflows/news-realtime-monitor.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-evening-analysis.md | Date-scoped staging for evening analysis articles; unstage analysis/data; add defensive unstage filter and staged-count remediation. |
| .github/workflows/news-evening-analysis.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-article-generator.md | Add defensive unstage filter to prevent diff-based staging from including historical dated news files. |
| .github/workflows/news-article-generator.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-propositions.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-motions.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-monthly-review.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-month-ahead.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-interpellations.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| .github/workflows/news-committee-reports.lock.yml | Compiled lock drift (dispatch schema updates incl. aw_context). |
| # cases where an earlier bash step accidentally modified historical articles and their | ||
| # paths leaked in via `git add news/metadata/` globbing or similar subtle issues. |
There was a problem hiding this comment.
The comment suggests historical news/YYYY-MM-DD-* files could be staged via git add news/metadata/ “globbing”, but git add news/metadata/ can only stage paths under news/metadata/. Consider rewording this to reflect the actual risk (e.g., earlier steps or broader git add patterns staging historical news/ files) to avoid misleading future debugging.
| # cases where an earlier bash step accidentally modified historical articles and their | |
| # paths leaked in via `git add news/metadata/` globbing or similar subtle issues. | |
| # cases where an earlier bash step accidentally modified historical articles and broader | |
| # `git add news/...` patterns or other staging steps pulled those paths into the index. |
| echo "⚠️ $STAGED_COUNT files exceeds safe threshold. Removing weekly analysis." | ||
| git reset HEAD -- analysis/weekly/ 2>/dev/null || true |
There was a problem hiding this comment.
The log message says “Removing weekly analysis” when $STAGED_COUNT > 90, but this script no longer stages analysis/weekly/ in the first place, so this step will typically be a no-op and won’t reduce the staged count. Either stage analysis/weekly/ intentionally (if it’s needed) or adjust the fallback/remediation messaging/steps so they reflect what’s actually staged.
| echo "⚠️ $STAGED_COUNT files exceeds safe threshold. Removing weekly analysis." | |
| git reset HEAD -- analysis/weekly/ 2>/dev/null || true | |
| awk '$0 ~ "^analysis/weekly/" {print}' /tmp/staged_files.txt > /tmp/staged_weekly_analysis.txt | |
| if [ -s /tmp/staged_weekly_analysis.txt ]; then | |
| WEEKLY_COUNT=0 | |
| awk 'END{print NR}' /tmp/staged_weekly_analysis.txt > /tmp/weekly_count.txt | |
| read WEEKLY_COUNT < /tmp/weekly_count.txt 2>/dev/null || true | |
| echo "⚠️ $STAGED_COUNT files exceeds safe threshold. Unstaging $WEEKLY_COUNT staged analysis/weekly/ files." | |
| git reset HEAD -- analysis/weekly/ 2>/dev/null || true | |
| else | |
| echo "⚠️ $STAGED_COUNT files exceeds safe threshold, but no analysis/weekly/ files are currently staged. Continuing to next remediation step." | |
| fi |
| # Enforce safe-outputs 100-file PR limit (hard cap: 100; soft threshold: 90) | ||
| git diff --cached --name-only > /tmp/staged_files.txt | ||
| awk 'END{print NR}' /tmp/staged_files.txt > /tmp/staged_count.txt | ||
| STAGED_COUNT=0 | ||
| read STAGED_COUNT < /tmp/staged_count.txt 2>/dev/null || true | ||
| echo "📊 Staged file count: $STAGED_COUNT (limit: 100)" |
There was a problem hiding this comment.
This section labels 100 as a “hard cap” but the script only performs best-effort reductions when STAGED_COUNT > 90 and will still proceed even if the final staged count remains >100 (in which case safe-outputs PR creation will fail). Consider either adding an explicit > 100 guard that aborts with a clear error, or rewording to clarify that 100 is the safe-outputs tool cap rather than a guarantee enforced here.
News Realtime Monitor run 24719881413 failed both
create_pull_requestcalls withE003: Cannot create pull request with more than 100 files (received 602/604). gh-aw'screate_pull_request.cjshard-caps patches atMAX_FILES = 100.Root cause
Three workflows staged articles with archive-wide globs:
That glob matches every historical breaking/realtime/monitor article (222+ files × 14 langs). Any earlier step that touched those files on disk (Playwright validation, HTML auto-fix, translation pass) made
git addstage the whole archive. The existing 90-file guard only unstagedanalysis/data/+analysis/weekly/, which didn't help when the bulk came fromnews/.Changes
news-realtime-monitor.md— stage onlynews/$ARTICLE_DATE-*{breaking,realtime,monitor}*-{en,sv}.html; dropanalysis/data/(MCP cache, 240+ files) andanalysis/weekly/(cumulative rollup) from default stage; extend fallback chain to also unstageanalysis/daily/.../documents/thennews/metadata/.news-evening-analysis.md— same date-scoped pattern for*evening-analysis*and*evening*HTML.news-article-generator.md— keep its existing diff-based staging, add the same defensive filter on top.Shared defensive filter (all three) — unstage any
news/YYYY-MM-DD-*path whose date ≠$ARTICLE_DATE:The
^news/YYYY-MM-DDanchor leavesnews/metadata/*(non-dated) untouched.Not changed
The other 8 news workflows (
committee-reports,motions,propositions,interpellations,week-ahead,month-ahead,weekly-review,monthly-review,translate) already use date-scoped staging from PR #1867 (run 24653843681, same class of failure).Lock files regenerated via
gh aw compile(v0.69.0). Incidentalaw_context_workflowsdrift in sibling lock files is v0.69 handler-config noise that theCompile Agentic WorkflowsCI would produce anyway.