[WIP] AI review error reporter by kacper-mikolajczak · Pull Request #72869 · Expensify/App

kacper-mikolajczak · 2025-10-17T12:29:26Z

Explanation of Change

Fixed Issues

$ #72802
PROPOSAL: Check readme file included in this PR

Tests

Verify that no errors appear in the JS console

Offline tests

QA Steps

// TODO: These must be filled out, or the issue title must include "[No QA]."

Verify that no errors appear in the JS console

PR Author Checklist

I linked the correct issue in the ### Fixed Issues section above
I wrote clear testing steps that cover the changes made in this PR
- I added steps for local testing in the Tests section
- I added steps for the expected offline behavior in the Offline steps section
- I added steps for Staging and/or Production testing in the QA steps section
- I added steps to cover failure scenarios (i.e. verify an input displays the correct error message if the entered data is not correct)
- I turned off my network connection and tested it while offline to ensure it matches the expected behavior (i.e. verify the default avatar icon is displayed if app is offline)
- I tested this PR with a High Traffic account against the staging or production API to ensure there are no regressions (e.g. long loading states that impact usability).
I included screenshots or videos for tests on all platforms
I ran the tests on all platforms & verified they passed on:
- Android: Native
- Android: mWeb Chrome
- iOS: Native
- iOS: mWeb Safari
- MacOS: Chrome / Safari
- MacOS: Desktop
I verified there are no console errors (if there's a console error not related to the PR, report it or open an issue for it to be fixed)
I verified there are no new alerts related to the canBeMissing param for useOnyx
I followed proper code patterns (see Reviewing the code)
- I verified that any callback methods that were added or modified are named for what the method does and never what callback they handle (i.e. toggleReport and not onIconClick)
- I verified that comments were added to code that is not self explanatory
- I verified that any new or modified comments were clear, correct English, and explained "why" the code was doing something instead of only explaining "what" the code was doing.
- I verified any copy / text shown in the product is localized by adding it to src/languages/* files and using the translation method
  - If any non-english text was added/modified, I used JaimeGPT to get English > Spanish translation. I then posted it in #expensify-open-source and it was approved by an internal Expensify engineer. Link to Slack message:
- I verified all numbers, amounts, dates and phone numbers shown in the product are using the localization methods
- I verified any copy / text that was added to the app is grammatically correct in English. It adheres to proper capitalization guidelines (note: only the first word of header/labels should be capitalized), and is either coming verbatim from figma or has been approved by marketing (in order to get marketing approval, ask the Bug Zero team member to add the Waiting for copy label to the issue)
- I verified proper file naming conventions were followed for any new files or renamed files. All non-platform specific files are named after what they export and are not named "index.js". All platform-specific files are named for the platform the code supports as outlined in the README.
- I verified the JSDocs style guidelines (in STYLE.md) were followed
If a new code pattern is added I verified it was agreed to be used by multiple Expensify engineers
I followed the guidelines as stated in the Review Guidelines
I tested other components that can be impacted by my changes (i.e. if the PR modifies a shared library or component like Avatar, I verified the components using Avatar are working as expected)
I verified all code is DRY (the PR doesn't include any logic written more than once, with the exception of tests)
I verified any variables that can be defined as constants (ie. in CONST.ts or at the top of the file that uses the constant) are defined as such
I verified that if a function's arguments changed that all usages have also been updated correctly
If any new file was added I verified that:
- The file has a description of what it does and/or why is needed at the top of the file if the code is not self explanatory
If a new CSS style is added I verified that:
- A similar style doesn't already exist
- The style can't be created with an existing StyleUtils function (i.e. StyleUtils.getBackgroundAndBorderStyle(theme.componentBG))
If new assets were added or existing ones were modified, I verified that:
- The assets are optimized and compressed (for SVG files, run npm run compress-svg)
- The assets load correctly across all supported platforms.
If the PR modifies code that runs when editing or sending messages, I tested and verified there is no unexpected behavior for all supported markdown - URLs, single line code, code blocks, quotes, headings, bold, strikethrough, and italic.
If the PR modifies a generic component, I tested and verified that those changes do not break usages of that component in the rest of the App (i.e. if a shared library or component like Avatar is modified, I verified that Avatar is working as expected in all cases)
If the PR modifies a component related to any of the existing Storybook stories, I tested and verified all stories for that component are still working as expected.
If the PR modifies a component or page that can be accessed by a direct deeplink, I verified that the code functions as expected when the deeplink is used - from a logged in and logged out account.
If the PR modifies the UI (e.g. new buttons, new UI components, changing the padding/spacing/sizing, moving components, etc) or modifies the form input styles:
- I verified that all the inputs inside a form are aligned with each other.
- I added Design label and/or tagged @Expensify/design so the design team can review the changes.
If a new page is added, I verified it's using the ScrollView component to make it scrollable when more elements are added to the page.
I added unit tests for any new feature or bug fix in this PR to help automatically prevent regressions in this user flow.
If the main branch was merged into this PR after a review, I tested again and verified the outcome was still expected according to the Test steps.

Screenshots/Videos

Android: Native

Android: mWeb Chrome

iOS: Native

iOS: mWeb Safari

MacOS: Chrome / Safari

MacOS: Desktop

kacper-mikolajczak · 2025-10-17T12:30:29Z

AI Review Error Reporter

A collection of scripts to analyze and aggregate errors from GitHub Actions workflow runs for the claude-review.yml workflow.

Overview

This toolset helps identify and track common errors in AI review workflows by:

Collecting workflow run data from GitHub
Extracting error messages from job summaries
Aggregating and ranking errors by frequency

Scripts

1. `ai_review_error_reporter.sh` (Main Orchestrator)

Main entry point that runs the entire pipeline.

Usage:

./ai_review_error_reporter.sh [runs_limit] [output_file]

Parameters:

runs_limit: Number of workflow runs to fetch (default: 100)
output_file: Output file for the error report (default: job_error_report.md)

Example:

./ai_review_error_reporter.sh 50 my_report.md

Internal scripts details

2. `claude_review_collector.sh` (Data Collection)

Fetches workflow run data from GitHub and caches it locally.

Usage:

./claude_review_collector.sh [limit]

Parameters:

limit: Number of workflow runs to fetch (default: 100)

3. `job_error_aggregator.sh` (Analysis)

Aggregates and ranks error messages from cached data.

Usage:

./job_error_aggregator.sh [directory] [output_file]

Parameters:

directory: Directory containing error title files (default: job_errors_titles)
output_file: Output markdown report (default: job_error_report.md)

Setup

Prerequisites

GitHub CLI (gh) - Must be installed and authenticated

# Install (macOS)
brew install gh

# Authenticate
gh auth login

jq - JSON processor
```
# Install (macOS)
brew install jq
```
curl - Usually pre-installed on macOS/Linux

Environment Variables

The scripts require the following environment variable:

export GITHUB_USER_SESSION="your_github_session_token"

How to Get Your GitHub Session Token:

Log in to GitHub in your browser
Open Developer Tools (F12 or Cmd+Opt+I)
Go to the "Application" or "Storage" tab
Find "Cookies" → "https://github.com"
Look for the user_session cookie
Copy its value

⚠️ Security Warning:

Never commit this token to version control
Keep it in your local environment or a .env file (which should be gitignored)
This token provides access to your GitHub account - treat it like a password

Recommended: Use a `.env` file

Create a .env file in the project root (make sure it's in .gitignore):

# .env
GITHUB_USER_SESSION=your_session_token_here

Then source it before running the scripts:

source .env
./ai_review_error_reporter.sh

Output & Caching

Cache Directories

The scripts create the following directories for caching (one file per workflow run):

`job_summary_url/`

Content: Raw URLs to job summary pages
Format: Text files named {run_id}.txt
Example: 18589302302.txt → /Expensify/App/actions/runs/18589302302/jobs/41169259293/summary_raw
Purpose: Avoids re-scraping HTML pages to find job summary URLs
Use Case: URL mapping for direct access to job summaries

`job_summary_md/`

Content: Complete job summary markdown from GitHub
Format: Markdown files named {run_id}.md
Example: 18589302302.md → Full AI review summary including errors, warnings, and statistics
Purpose: Preserves the complete context of each review
Size: Typically 5-50 KB per file
Use Case: Full historical record of all AI review outputs

`job_errors/`

Content: Extracted error blocks from job summaries (between --- delimiters)
Format: Markdown files named {run_id}.md
Example: Contains only the error sections with full context:

---
❌ **Error:** This command requires approval
**File:** src/libs/actions/Report.ts
**Details:** Command execution blocked by security policy
---

Purpose: Structured error data with surrounding context
Use Case: Contextual error analysis, pattern detection, debugging

`job_errors_titles/`

Content: Just the error message titles (one per line)
Format: Text files named {run_id}.txt
Example:

❌ **Error:** This command requires approval
❌ **Error:** could not determine current branch

Purpose: Quick frequency analysis without full context
Use Case: Current aggregation reports, trend analysis

Cached Data for Advanced Analysis

💡 Pro Tip: The cached data in job_errors/ and job_summary_md/ contains rich contextual information beyond just error titles. This data can be leveraged for:

Contextual Error Analysis: Understanding which files, functions, or code patterns trigger specific errors

Error Co-occurrence: Identifying errors that frequently appear together in the same run

Temporal Patterns: Analyzing how errors evolve over time or correlate with code changes

PR/Author Correlation: Linking errors to specific pull requests or authors

Error Classification: Automatically categorizing errors by type (permissions, syntax, rate limits, etc.)

Root Cause Analysis: Tracing errors back to specific code changes using the full context

Predictive Analysis: Building models to predict potential errors before they occur

The current aggregator focuses on frequency, but the cached data supports much deeper analysis. Consider building additional tools that parse job_errors/ for contextual insights.

Final Report

The final report (job_error_report.md by default) includes:

Generation timestamp - When the report was created
Source information - Which cache directory was analyzed
Files processed - Number of workflow runs included
Summary statistics:
- Total error count
- Unique error types
Frequency-ranked table - Errors sorted by occurrence count
Error messages - Full error text for each unique error

Example report structure:

# Error Frequency Report

**Generated:** 2024-10-17 13:00:18
**Source Directory:** `job_errors_titles`
**Files Processed:** 50

## Summary
- **Total errors:** 127
- **Unique errors:** 15

## Error Breakdown
| Count | Error Message |
|------:|---------------|
| 45    | `This command requires approval` |
| 23    | `could not determine current branch` |
...

Caching Behavior

The scripts use aggressive caching to avoid redundant API calls:

✅ Once a workflow run is processed, all its data is cached locally
✅ Re-running the scripts will use cached data for previously processed runs
✅ New runs are automatically fetched and added to the cache
✅ Cache persists across script invocations (stored on disk)
⚠️ To force a refresh of specific runs, delete their corresponding cache files
⚠️ To start fresh, delete entire cache directories

Cache Efficiency:

First run (100 workflows): ~8-10 minutes
Subsequent run (same 100): ~30 seconds (uses cache)
Adding 10 new runs: ~1 minute (only fetches new data)

Usage Examples

Quick Analysis (Last 10 runs)

./ai_review_error_reporter.sh 10

Comprehensive Analysis (Last 200 runs)

./ai_review_error_reporter.sh 200 comprehensive_report.md

Re-analyze Cached Data

# Just run the aggregator on already-cached data
./job_error_aggregator.sh job_errors_titles updated_report.md

Known Issues & Limitations

This section documents the current limitations and burning issues with the AI Review Error Reporter scripts.

🔥 Critical Issues

1. Authentication Method: Cookie-Based (Fragile)

Severity: 🔴 High
Impact: Script fails when cookie expires (typically every 30 days)

Problem:

Uses GITHUB_USER_SESSION cookie which requires manual extraction from browser
Cookie expires periodically, requiring manual renewal
Not suitable for automated/CI environments
Tied to a personal GitHub account

Workaround:

# Manual process required:
# 1. Open GitHub in browser
# 2. Open DevTools > Application > Cookies
# 3. Find user_session cookie
# 4. Copy and export:
export GITHUB_USER_SESSION="new_cookie_value"

2. HTML Scraping Instead of Official API

Severity: 🔴 High
Impact: Breaks if GitHub changes their HTML structure

Problem:

Scrapes HTML pages to find job summary URLs:

grep -oE -m 1 "/Expensify/App/actions/runs/${run_id}/jobs/[0-9]+/summary_raw"

Fragile: Any change to GitHub's HTML breaks the script
Requires authentication cookies (can't use API tokens)
Slower than direct API access
Not officially supported by GitHub

Why: Job summaries are not available through the official GitHub Actions API.

Better Solution Available:
Make the reviewer output its logs to GH artifacts instead as well as in summary. Artifacts can be retrieved via official GH CLI.

3. No Rate Limiting Handling

Severity: 🟡 Medium
Impact: Script fails silently when hitting GitHub API rate limits

Problem:

No detection of rate limit status
No retry logic when rate limited
No waiting/backoff mechanism
Silent failures that are hard to debug

Manifestation:

# Runs fine for 50-60 requests, then:
Processing run: 12345678
  No job summary found, skipping...  # Actually rate limited!

⚠️ Medium Priority Issues

4. Hardcoded Repository Path

Severity: 🟡 Medium
Impact: Can't easily use with other repositories

Problem:

# Hardcoded in the regex:
/Expensify/App/actions/runs/${run_id}/jobs/[0-9]+/summary_raw

Workaround: Edit the script to change repository path

5. Sequential Processing (Slow for Large Batches)

Severity: 🟡 Medium
Impact: Takes ~10-15 minutes to process 100 workflow runs

Problem:

Processes one run at a time in a while loop
Network latency multiplied by number of runs
Could be parallelized for 5-10x speedup

Current Performance:

10 runs: ~1 minute
50 runs: ~5 minutes
100 runs: ~10 minutes
500 runs: ~50 minutes

Potential Solution: Implement parallel processing with xargs -P or GNU parallel.

6. No CI/CD Integration

Severity: 🟡 Medium
Impact: Can't run automatically in GitHub Actions or other CI systems

Problem:

Requires manual cookie setup (not available in CI)
No GitHub Actions workflow provided
Can't leverage Actions cache for faster runs
No automatic scheduling

7. Limited Error Context in Reports

Severity: 🟡 Medium
Impact: Hard to understand root causes from aggregated reports

Problem:

Current aggregator only shows error titles
Loses valuable context from job_errors/ directory:
- Which files triggered errors
- Which PRs/branches were involved
- Full error details and stack traces
- Temporal patterns

Example - What's Available:

| Count | Error Message |
|------:|---------------|
| 45    | `This command requires approval` |

Example - What Could Be Available:

| Count | Error | Common Files | Common PRs | Trend |
|------:|-------|--------------|------------|-------|
| 45    | Command approval | src/libs/actions/*.ts | #12345, #12389 | ↑ +15% |

Potential Solution: Build enhanced analyzer that parses job_errors/ for contextual data. See "Cached Data for Advanced Analysis" section above for ideas.

📝 Minor Issues

8. No Progress Indicators for Long Operations

Severity: 🟢 Low
Impact: Appears stuck during long runs

Problem:

Processing run: 18589302302
  Fetching job summary: https://github.com/...
  # Appears frozen here for 5-10 seconds

Workaround: Be patient, check network activity

Potential Solution: Add progress bars or timestamps to output

9. Cache Directories Not in .gitignore

Severity: 🟢 Low
Impact: Risk of committing large cache files

Problem:

Cache directories (job_summary_url/, job_summary_md/, etc.) not automatically ignored
Could accidentally commit 100s of cached files
Increases repository size

Solution: Add to .gitignore:

# AI Review Error Reporter cache
/job_summary_url/
/job_summary_md/
/job_errors/
/job_errors_titles/
/job_error_report.md

10. No Automatic Cleanup of Old Cache

Severity: 🟢 Low
Impact: Cache grows unbounded over time

Problem:

Old workflow run data cached forever
Cache can grow to 100s of MB over months
No automatic cleanup mechanism

Workaround: Manually delete old cache files periodically

# Delete cache files older than 30 days
find job_* -type f -mtime +30 -delete

kacper-mikolajczak added 2 commits October 17, 2025 13:52

v1

93c27b5

add script readme

c2b0e6e

melvin-bot bot assigned kacper-mikolajczak Oct 17, 2025

update readme

9a43d3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] AI review error reporter#72869

[WIP] AI review error reporter#72869
kacper-mikolajczak wants to merge 3 commits intoExpensify:mainfrom
callstack-internal:feat/ai-review-error-reporter

kacper-mikolajczak commented Oct 17, 2025

Uh oh!

kacper-mikolajczak commented Oct 17, 2025 •

edited

Loading

2. `claude_review_collector.sh` (Data Collection)

3. `job_error_aggregator.sh` (Analysis)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kacper-mikolajczak commented Oct 17, 2025

Explanation of Change

Fixed Issues

Tests

Offline tests

QA Steps

PR Author Checklist

Screenshots/Videos

Uh oh!

kacper-mikolajczak commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review Error Reporter

Overview

Scripts

1. ai_review_error_reporter.sh (Main Orchestrator)

2. claude_review_collector.sh (Data Collection)

3. job_error_aggregator.sh (Analysis)

Setup

Prerequisites

Environment Variables

How to Get Your GitHub Session Token:

Recommended: Use a .env file

Output & Caching

Cache Directories

job_summary_url/

job_summary_md/

job_errors/

job_errors_titles/

Cached Data for Advanced Analysis

Final Report

Caching Behavior

Usage Examples

Quick Analysis (Last 10 runs)

Comprehensive Analysis (Last 200 runs)

Re-analyze Cached Data

Known Issues & Limitations

🔥 Critical Issues

1. Authentication Method: Cookie-Based (Fragile)

2. HTML Scraping Instead of Official API

3. No Rate Limiting Handling

⚠️ Medium Priority Issues

4. Hardcoded Repository Path

5. Sequential Processing (Slow for Large Batches)

6. No CI/CD Integration

7. Limited Error Context in Reports

📝 Minor Issues

8. No Progress Indicators for Long Operations

9. Cache Directories Not in .gitignore

10. No Automatic Cleanup of Old Cache

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kacper-mikolajczak commented Oct 17, 2025 •

edited

Loading

1. `ai_review_error_reporter.sh` (Main Orchestrator)

2. `claude_review_collector.sh` (Data Collection)

3. `job_error_aggregator.sh` (Analysis)

Recommended: Use a `.env` file

`job_summary_url/`

`job_summary_md/`

`job_errors/`

`job_errors_titles/`