A Chrome extension that analyses Apache Spark Web UI pages and overlays visual diagnostics highlighting performance problems. Click the extension icon, hit Analyze, and see issues inline — no cluster config changes, no Spark plugins, no server-side dependencies.
Think of it as a client-side performance linter for your Spark jobs.
| # | Check | Page | Severity |
|---|---|---|---|
| 1 | Data skew — Max duration vs p75/median in Summary Metrics | Stage detail | Critical / Warning |
| 2 | Shuffle spill — Non-zero memory or disk spill | Stage detail | Critical / Warning |
| 3 | GC pressure — GC time exceeding threshold of task duration | Stage detail | Critical / Warning |
| 4 | Straggler tasks — Individual tasks far slower than the median | Stage detail | Warning |
| 5 | Too many small tasks — Low median duration + high task count | Stage detail | Info |
| 6 | Join strategy flags — CartesianProduct, BroadcastNestedLoopJoin, SortMergeJoin | SQL detail | Critical / Warning / Info |
| 7 | Missing predicate pushdown — Empty PushedFilters with Filter operators present | SQL detail | Warning |
| 8 | Shuffle indicators — Exchange nodes highlighted in the physical plan | SQL detail | Info |
| 9 | Dominant stages — A single stage consuming most of job time | Jobs / Job detail | Warning |
| 10 | Executor imbalance — Large spread in shuffle read across executors | Executors | Warning |
| 11 | Memory over-utilisation — Executor storage memory approaching capacity | Executors | Critical / Warning |
| 12 | Long filter conditions — Excessive predicates or character count in filters | SQL detail | Warning / Info |
| 13 | Failed queries — Queries with failed status or error messages | SQL list / detail | Critical |
| 14 | Small files — Scans reading many small files (high task overhead) | SQL detail | Warning / Info |
| 15 | PII detection — Column names or literals matching PII patterns | SQL detail | Info |
| 16 | Estimation errors — Large row estimation multipliers from stale statistics | SQL detail | Critical / Warning |
- Download the latest zip from Releases —
kindling-chrome-*.ziporkindling-firefox-*.zip - Unzip to a folder
- Chrome: Open
chrome://extensions/, enable Developer mode, click Load unpacked, select the unzipped folder - Firefox: Open
about:debugging#/runtime/this-firefox, click Load Temporary Add-on…, selectmanifest.jsonin the unzipped folder
- Clone or download this repository
- Run
npm install && npm run build - Open
chrome://extensions/in Chrome - Enable Developer mode (top-right toggle)
- Click Load unpacked and select the
extension/directory - The flame icon appears in your toolbar
- Navigate to any Spark Web UI page (standalone, History Server, or Databricks driver proxy)
- Click the Kindling icon in the toolbar
- Click Analyze
- Issues appear both in the popup summary and as inline highlights on the page
- Hover highlighted cells for details. Click issues in the floating panel to scroll to them.
- Click Clear to remove all overlays
The extension badge shows the issue count coloured by worst severity (red = critical, amber = warning, grey = info).
Kindling detects the Spark UI by DOM structure, not URL patterns, so it works anywhere the Spark UI is rendered:
localhost:4040(live application)localhost:18080(History Server)- Databricks driver proxy pages
- Any host running the Spark Web UI
Compatible with Spark 3.x and 4.x.
Click the Options link in the popup (or right-click the extension icon and select Options) to adjust thresholds:
| Setting | Default | Description |
|---|---|---|
| Skew threshold | 1.5 | Max/p75 duration ratio to flag skew |
| Severe skew threshold | 3.0 | Max/median duration ratio for severe skew |
| Straggler multiple | 5.0 | Task duration vs median to flag as straggler |
| GC pressure threshold | 0.10 | GC time as fraction of task duration (10%) |
| Small task duration | 50 ms | Below this median = small task |
| Small task count min | 1000 | Minimum tasks before flagging |
| Executor skew ratio | 3.0 | Max/min executor shuffle read ratio |
| Stage time dominance | 0.80 | Stage time as fraction of total job time |
| Memory utilisation (warning) | 0.70 | Storage memory fraction to flag warning |
| Memory utilisation (critical) | 0.90 | Storage memory fraction to flag critical |
| Long filter characters | 500 | Filter condition character count threshold |
| Long filter predicates | 10 | Filter condition predicate count threshold |
| Small file min count | 100 | Minimum files before flagging small files |
| Small file avg size (warning) | 32 MB | Average file size below this triggers warning |
| Small file avg size (critical) | 1 MB | Average file size below this triggers critical |
| Estimation error (warning) | 10 | Row estimation multiplier for warning |
| Estimation error (critical) | 100 | Row estimation multiplier for critical |
Settings are stored in chrome.storage.sync and persist across devices.
kindling/
src/ ES module source (compiled by esbuild)
content/
kindling.js Entry point — state, message listener, run/clear
config.js DEFAULT_CONFIG, SEVERITY, ONE_GIB
parsers.js parseDuration, parseBytes, parseCellValue, etc.
utils.js escapeHtml
dom.js highlightElement/Row/Cell, clearHighlights
detection.js detectSparkUI, getPageType
tables.js findColumnIndex, getDataRows, findTable, parseSummaryMetrics
sql-highlighting.js SQL tokenizer, syntax highlighting, tooltips
panel.js Diagnostic panel, clipboard export (JSON/Markdown)
checkers/
stage.js Skew, spill, GC, stragglers, small tasks
sql.js Joins, pushdown, filters, failed queries, small files, PII, estimation
jobs.js Dominant stages, stage badges
executors.js Memory utilisation, executor balance
popup/popup.js Popup injection and result display
options/options.js Options page load/save
extension/ Load this directory as an unpacked Chrome extension
manifest.json Chrome extension manifest (Manifest V3)
background/
service-worker.js Badge updates
content/
kindling.js Built bundle (generated — do not edit)
kindling.css Injected styles for highlights and panel
popup/
popup.html, popup.js, popup.css
options/
options.html, options.js
icons/
icon-{16,32,48,128}.png
test/
helpers.js Shared test helpers (JSDOM fixtures, assertions)
detection.test.js Spark UI + page type detection tests
stage-checkers.test.js Skew, spill, GC tests
sql-checkers.test.js SQL checker tests
executor-checkers.test.js Memory, executor balance, dominant stage tests
panel.test.js Panel and export tests
config.test.js Config, CSS, Firefox build tests
spark_anti_patterns.py Databricks notebook for manual testing
test/e2e/ Playwright + PySpark E2E tests
conftest.py Fixtures, injection helpers, Spark REST API helpers
workloads.py PySpark workload generators
test_kindling_e2e.py E2E test cases
- User clicks the toolbar icon — the popup opens
- Popup injects
content/kindling.js+content/kindling.cssinto the active tab viachrome.scripting - Content script checks the DOM for Spark UI elements (nav tabs, table IDs, navbar brand)
- If found, it determines the page type and runs the relevant heuristic checkers
- Checkers parse rendered tables (Summary Metrics, task table, executor table) and plan text
- Issues are highlighted inline with colour-coded backgrounds and hover tooltips
- A floating panel summarises all issues, clickable to scroll to each one
- Results are sent back to the popup and badge is updated
The content script only runs when you click Analyze — it never auto-injects.
npm install # Install dependencies
npm run build # Bundle src/ → extension/ (one-off)
npm run watch # Bundle with live rebuild on save
npm run lint # ESLint (no-var, prefer-const, eqeqeq)
npm test # Run all unit tests
npm run test:e2e # Build + run E2E tests (requires Java, uv)
npm run build:firefox # Build Firefox-compatible extension in dist/firefox/
npm run package # Build + zip both Chrome and Firefox extensions into dist/Edit files in src/, then reload the extension in chrome://extensions/.
The E2E test suite (test/e2e/) spins up a local PySpark session with Spark UI, generates workloads that trigger each diagnostic, then uses Playwright to inject Kindling and validate detection. Requires Java 17+ and uv.
cd test/e2e
uv sync
uv run playwright install chromium
uv run pytest -vSome checks cannot be tested in PySpark local mode or CI:
| Check | Reason |
|---|---|
| Executor imbalance | Local mode only has the driver — no executors to compare |
| Memory utilisation | Checker skips driver rows — needs non-driver executors |
| Stragglers | Requires active-tasks-table which only exists mid-execution |
| Shuffle spill | Unified memory manager absorbs spill with available driver memory |
| Dominant stage | Local mode stages share CPU — can't create >80% time dominance |
| Small tasks | CI runners are too slow — median task duration exceeds 50ms threshold |
These checks are covered by unit tests with synthetic DOM fixtures. The Databricks test notebook (test/spark_anti_patterns.py) covers all checks for manual verification on a real cluster.