An AI-powered test failure analysis tool that helps QA teams identify root causes, track trends, and prioritize fixes across Android, iOS, and Web platforms.
| Capability | Detail |
|---|---|
| Test capacity | 2,000+ with smooth performance |
| AI confidence | 60–95% for root cause detection |
| Platforms | Android, iOS, Web |
| Trend tracking | 7-day vs 7-day comparison |
| Grouping algorithms | 5+ (error message, stack trace, name, error code, pattern) |
| Dependencies | None — pure HTML/CSS/JavaScript |
1. Open flaky-test-dashboard.html in any modern browser.
2. Load data — choose one:
- Click 📊 Load Demo Data to generate 2,000 realistic test records instantly.
- Click 📂 Upload CSV File and select your file (see CSV format below).
3. Explore — the dashboard auto-analyzes your data and populates all sections.
4. Act — use Create Ticket buttons to log issues, and Trend Analysis to verify fixes over time.
Four summary cards at the top of the page give you an instant snapshot:
| Card | What it shows |
|---|---|
| Total Tests | Count of all failures in current filter |
| AI Groups | Number of detected failure clusters |
| Top Platform | Most affected platform with icon |
| Trend | Direction and % change vs previous period |
Main section — left panel
Similar failures are automatically clustered using five algorithms:
- Error Message Similarity — Levenshtein distance between error strings
- Stack Trace Similarity — function call signature comparison
- Test Name Similarity — normalized name matching
- Error Code Clustering — HTTP status code grouping
- Pattern Recognition — regex-based pattern matching
Each group card shows:
- Pattern name, affected test count, and platform badges
- Severity: 🔴 Critical (10+ tests) · 🟠 High (5–10) · 🟡 Medium (1–5)
- Root cause analysis box when confidence exceeds 60%
- Expandable list of individual tests
- Create Ticket for Group button
Groups are paginated at 10 per page.
Main section — right panel
AI-generated recommendations surfaced from your failure patterns:
| Insight Type | Example |
|---|---|
| Platform Alert | "Android has 2.3x more failures than iOS" |
| Team Impact | "Auth Team most affected — 450 failures" |
| Trend Warning | "Failures increased 120% this week" |
| Quick Win | "Fix 1 root cause to resolve 450 tests (95% confidence)" |
Full-width section — bottom of page · 2-column layout
Compares the last 7 days against the prior 7 days per root cause.
Trend categories:
| Direction | Criteria | Border | Urgency |
|---|---|---|---|
| 📈 Increasing | ≥ +20% | Red | 🚨 Critical (≥+50%) or |
| 📉 Decreasing | ≤ −20% | Green | ✅ Low — keep monitoring |
| ➡️ Stable | −20% to +20% | Yellow | 📋 Medium — plan a fix |
Each card includes a sparkline mini-chart, previous vs current failure count, urgency label, and a recommended action. Cards are sorted by urgency (increasing first), then by count.
Full-width section · 2-column layout
Platform-specific root cause detection — up to 10 cards total, sorted by impact.
Detection methods:
- HTTP error code analysis (500 → Backend Error, 503 → Overload, 429 → Rate Limit, etc.)
- Stack trace extraction and comparison
- Keyword combinations (
pool+exhausted→ Connection Pool Issue) - Weighted similarity scoring (stack traces weighted 1.5×)
Confidence levels:
| Range | Color | Interpretation |
|---|---|---|
| 90–100% | Green | High confidence — fix it |
| 70–89% | Orange | Likely correct — investigate |
| 60–69% | Yellow | Worth checking |
Each card provides evidence bullets, affected teams, detected error codes, and two action buttons: View X Tests (filters the main list) and Create Ticket.
Filters available:
- Platform — All / Android / iOS / Web
- Team — All teams or a specific one
- Date Range — Start + end date pickers with Apply button
All filters combine. Results update in real time.
Search (top-right) matches across test name, failure description, platform, and team. Partial and case-insensitive matching is supported.
Click any individual test to open a full-detail modal:
- Test name, ID, platform, team, and execution date
- Full failure description and stack trace
- Detected error codes
Close with ×, click outside the modal, or press Escape.
test_case, failure_description, platform, date, team_name, test_id, stack_trace
Column aliases accepted:
| Field | Accepted column names |
|---|---|
| Test name | test_case, testcase, test |
| Error | failure_description, description, error |
| Platform | platform |
| Date | date (YYYY-MM-DD) |
| Team | team_name, team |
| ID | test_id, id |
| Stack trace | stack_trace, stacktrace (optional but recommended) |
Including stack traces significantly improves root cause confidence.
Stack: Pure HTML5 + CSS3 + ES6 JavaScript · No framework · No build tools · No external dependencies
Browser support: Chrome 90+, Firefox 88+, Safari 14+, Edge 90+, iOS Safari, Chrome Mobile
Performance targets:
| Operation | Target |
|---|---|
| Load 2,000 tests | < 1 second |
| Filter / search | < 100ms |
| Chart render | < 200ms |
| Modal open | < 50ms |
| Page change | < 100ms |
Algorithm complexity:
| Algorithm | Complexity | Used for |
|---|---|---|
| Levenshtein distance | O(m × n) | Error message similarity |
| Stack trace matching | O(n) extract + O(1) compare | Call stack matching |
| Advanced grouping | O(n²) worst / O(n log n) avg | Test clustering |
Daily standup — Load latest data, check Trend Analysis for 📈 increasing issues, assign 🚨 critical items.
Sprint planning — Review Root Cause Analysis sorted by impact, estimate fixes, prioritize by ROI.
Incident response — Filter by incident date range, identify root cause with high confidence, verify fix with Trend Analysis.
Regression detection — Monitor daily trends, watch for sudden 📈 increases, correlate with recent deployments.
Executive reporting — Use before/after Trend Analysis to demonstrate improvement and ROI of quality initiatives.
Pro tip: Start with Load Demo Data to explore all features before uploading your own CSV.