observability: indexing dashboard polish — realm column, static gauge, longest-jobs move#4821
Conversation
…, longest-jobs move - Active Indexing & Per-realm indexing status: realm column derives from args.realmURL with a host-subdomain fallback so root-of-host (published) realms no longer render blank (e.g. `buckpublishedsep8`). - Queued Indexing Jobs: add a leading `realm` column (`(all realms)` for full-reindex). The pre-existing `concurrency_group` column stays for operators that lean on it. - Active Indexing percent column: gauge cell now hides its inline value (`valueDisplayMode: hidden`) so the bar bounds stay at full cell width instead of shrinking around the label. The percentage moves to a small adjacent left-aligned column with a blank header so the digits sit flush against the bar. - Longest from-scratch-index / incremental-index jobs (24h) panels move from the Job Queue dashboard to the Indexing dashboard (the more natural home), with their realm extraction unified to the same fallback used elsewhere in this dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Observability diff (vs staging)diff --git a/tmp/remote-canon.8rrtDx/dashboards/boxel-status/indexing.json b/tmp/committed-canon.AZBm7q/dashboards/boxel-status/indexing.json
index 86c9743..378bd40 100644
--- a/tmp/remote-canon.8rrtDx/dashboards/boxel-status/indexing.json
+++ b/tmp/committed-canon.AZBm7q/dashboards/boxel-status/indexing.json
@@ -69,6 +69,10 @@
"uid": "cef5v5sl9k7i8f"
},
"description": "System-wide operator action: queue a full reindex across every realm. The button disables itself while a `full-reindex` orchestration job is already pending or running. Per-realm reindex moved to the Realms dashboard. Click POSTs with `Authorization: Bearer ${grafana_secret}` (substituted from SSM at apply time, CS-10929).",
+ "fieldConfig": {
+ "defaults": {},
+ "overrides": []
+ },
"gridPos": {
"h": 8,
"w": 24,
@@ -629,7 +633,8 @@
"id": "custom.cellOptions",
"value": {
"mode": "gradient",
- "type": "gauge"
+ "type": "gauge",
+ "valueDisplayMode": "hidden"
}
},
{
@@ -646,6 +651,42 @@
}
]
},
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "pct"
+ },
+ "properties": [
+ {
+ "id": "unit",
+ "value": "percent"
+ },
+ {
+ "id": "decimals",
+ "value": 1
+ },
+ {
+ "id": "displayName",
+ "value": " "
+ },
+ {
+ "id": "custom.align",
+ "value": "left"
+ },
+ {
+ "id": "custom.minWidth",
+ "value": 70
+ },
+ {
+ "id": "custom.width",
+ "value": 70
+ },
+ {
+ "id": "custom.filterable",
+ "value": false
+ }
+ ]
+ },
{
"matcher": {
"id": "byName",
@@ -766,7 +807,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT\n j.id AS job_id,\n RTRIM(REGEXP_REPLACE(j.concurrency_group, '^indexing:https?://[^/]+/', ''), '/') AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n j.job_type,\n COALESCE(jp.files_completed, 0) AS files_completed,\n COALESCE(jp.total_files, 0) AS total_files,\n CASE WHEN COALESCE(jp.total_files, 0) > 0\n THEN (jp.files_completed::float / jp.total_files) * 100\n ELSE 0\n END AS percent,\n EXTRACT(EPOCH FROM (NOW() - jr.created_at)) AS elapsed_seconds,\n jr.created_at AS started_at,\n jr.worker_id,\n jr.id AS reservation_id\n FROM jobs j\n JOIN job_reservations jr ON jr.job_id = j.id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NULL\n ORDER BY jr.created_at DESC;",
+ "rawSql": "SELECT\n job_id,\n realm,\n realm_url,\n job_type,\n files_completed,\n total_files,\n percent,\n percent AS pct,\n elapsed_seconds,\n started_at,\n worker_id,\n reservation_id\nFROM (\n SELECT\n j.id AS job_id,\n COALESCE(\n NULLIF(RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/'), ''),\n REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://([^./:]+).*$', '\\1')\n ) AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n j.job_type,\n COALESCE(jp.files_completed, 0) AS files_completed,\n COALESCE(jp.total_files, 0) AS total_files,\n CASE WHEN COALESCE(jp.total_files, 0) > 0\n THEN (jp.files_completed::float / jp.total_files) * 100\n ELSE 0\n END AS percent,\n EXTRACT(EPOCH FROM (NOW() - jr.created_at)) AS elapsed_seconds,\n jr.created_at AS started_at,\n jr.worker_id,\n jr.id AS reservation_id\n FROM jobs j\n JOIN job_reservations jr ON jr.job_id = j.id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n LEFT JOIN job_progress jp ON jp.job_id = j.id\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n AND j.finished_at IS NULL\n) active\nORDER BY started_at DESC;",
"refId": "A"
}
],
@@ -898,7 +939,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT\n RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/') AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL) AS pending,\n COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NOT NULL) AS in_flight,\n MAX(j.finished_at) AS last_completed_at,\n EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at)\n FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n GROUP BY j.args->>'realmURL'\n ORDER BY pending DESC, in_flight DESC, last_completed_at DESC NULLS LAST\n LIMIT 200;",
+ "rawSql": "SELECT\n COALESCE(\n NULLIF(RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/'), ''),\n REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://([^./:]+).*$', '\\1')\n ) AS realm,\n COALESCE(j.args->>'realmURL','') AS realm_url,\n COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL) AS pending,\n COUNT(*) FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NOT NULL) AS in_flight,\n MAX(j.finished_at) AS last_completed_at,\n EXTRACT(EPOCH FROM (NOW() - MIN(j.created_at)\n FILTER (WHERE j.status = 'unfulfilled' AND jr.id IS NULL))) AS oldest_pending_seconds\n FROM jobs j\n LEFT JOIN job_reservations jr ON j.id = jr.job_id\n AND jr.completed_at IS NULL AND jr.locked_until > NOW()\n WHERE j.job_type IN ('from-scratch-index','incremental-index')\n GROUP BY j.args->>'realmURL'\n ORDER BY pending DESC, in_flight DESC, last_completed_at DESC NULLS LAST\n LIMIT 200;",
"refId": "A"
}
],
@@ -1034,7 +1075,7 @@
"editorMode": "code",
"format": "table",
"rawQuery": true,
- "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' AND j.job_type IN ('from-scratch-index','incremental-index','full-reindex') \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
+ "rawSql": "SELECT \n j.id, \n j.priority, \n j.job_type, \n CASE\n WHEN j.job_type = 'full-reindex' THEN '(all realms)'\n ELSE COALESCE(\n NULLIF(RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/'), ''),\n REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://([^./:]+).*$', '\\1')\n )\n END AS realm,\n CASE \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '') \n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '') \n ELSE j.concurrency_group \n END AS concurrency_group, \n j.status AS status, \n j.created_at AS created_at, \n\n\n -- Wait time in seconds\n CASE \n WHEN jr.created_at IS NOT NULL \n THEN EXTRACT(EPOCH FROM (jr.created_at - j.created_at))\n ELSE \n EXTRACT(EPOCH FROM (NOW() - j.created_at))\n END\n AS wait_seconds,\n j.id as job_id\n\nFROM \n jobs j\n \nLEFT JOIN \n job_reservations jr ON j.id = jr.job_id\n\nWHERE\njr.job_id IS NULL AND j.status = 'unfulfilled' AND j.job_type IN ('from-scratch-index','incremental-index','full-reindex') \n \nORDER BY \n j.created_at ASC\nLIMIT 500;",
"refId": "A",
"sql": {
"columns": [
@@ -1057,6 +1098,280 @@
],
"title": "Queued Indexing Jobs",
"type": "table"
+ },
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "description": "Top 10 from-scratch-index jobs that finished in the past 24 hours, ranked by run duration. `run_seconds` measures from the final reservation's `created_at` to `finished_at`, so a job that was retried is timed on the attempt that completed it. Use this to spot realms whose full reindex is regressing.",
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "run_seconds"
+ },
+ "properties": [
+ {
+ "id": "unit",
+ "value": "s"
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "worker_id"
+ },
+ "properties": [
+ {
+ "id": "links",
+ "value": [
+ {
+ "targetBlank": true,
+ "title": "View logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "pattern": "^(.{6}).*$",
+ "result": {
+ "index": 0,
+ "text": "View logs ($1)"
+ }
+ },
+ "type": "regex"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "reservation_id"
+ },
+ "properties": [
+ {
+ "id": "custom.hidden",
+ "value": true
+ }
+ ]
+ }
+ ]
+ },
+ "gridPos": {
+ "h": 10,
+ "w": 12,
+ "x": 0,
+ "y": 54
+ },
+ "id": 20,
+ "options": {
+ "cellHeight": "sm",
+ "footer": {
+ "countRows": false,
+ "enablePagination": false,
+ "fields": "",
+ "reducer": [
+ "sum"
+ ],
+ "show": false
+ },
+ "showHeader": true,
+ "sortBy": [
+ {
+ "desc": true,
+ "displayName": "run_seconds"
+ }
+ ]
+ },
+ "pluginVersion": "10.4.1",
+ "targets": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT\n j.id,\n lr.reservation_id,\n CASE\n WHEN j.job_type = 'full-reindex' THEN '(all realms)'\n ELSE COALESCE(\n NULLIF(RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/'), ''),\n REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://([^./:]+).*$', '\\1')\n )\n END AS realm,\n j.status,\n lr.started_at,\n j.finished_at,\n EXTRACT(EPOCH FROM (j.finished_at - lr.started_at)) AS run_seconds,\n lr.worker_id\nFROM jobs j\nJOIN LATERAL (\n SELECT jr.id AS reservation_id, jr.created_at AS started_at, jr.worker_id\n FROM job_reservations jr\n WHERE jr.job_id = j.id\n ORDER BY jr.created_at DESC\n LIMIT 1\n) lr ON TRUE\nWHERE j.job_type = 'from-scratch-index'\n AND j.finished_at IS NOT NULL\n AND j.finished_at > NOW() - INTERVAL '24 hours'\nORDER BY run_seconds DESC NULLS LAST\nLIMIT 10;",
+ "refId": "A"
+ }
+ ],
+ "title": "Longest from-scratch-index jobs (24h)",
+ "type": "table"
+ },
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "description": "Top 10 incremental-index jobs that finished in the past 24 hours, ranked by run duration. `run_seconds` measures from the final reservation's `created_at` to `finished_at`, so a job that was retried is timed on the attempt that completed it. Outliers here usually mean a heavy invalidation fan-out from a single edit.",
+ "fieldConfig": {
+ "defaults": {
+ "color": {
+ "mode": "thresholds"
+ },
+ "custom": {
+ "align": "left",
+ "cellOptions": {
+ "type": "auto"
+ },
+ "filterable": true,
+ "inspect": false,
+ "minWidth": 150
+ },
+ "mappings": [],
+ "thresholds": {
+ "mode": "absolute",
+ "steps": [
+ {
+ "color": "green"
+ },
+ {
+ "color": "red",
+ "value": 80
+ }
+ ]
+ }
+ },
+ "overrides": [
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "run_seconds"
+ },
+ "properties": [
+ {
+ "id": "unit",
+ "value": "s"
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "worker_id"
+ },
+ "properties": [
+ {
+ "id": "links",
+ "value": [
+ {
+ "targetBlank": true,
+ "title": "View logs",
+ "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
+ }
+ ]
+ },
+ {
+ "id": "mappings",
+ "value": [
+ {
+ "options": {
+ "pattern": "^(.{6}).*$",
+ "result": {
+ "index": 0,
+ "text": "View logs ($1)"
+ }
+ },
+ "type": "regex"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "matcher": {
+ "id": "byName",
+ "options": "reservation_id"
+ },
+ "properties": [
+ {
+ "id": "custom.hidden",
+ "value": true
+ }
+ ]
+ }
+ ]
+ },
+ "gridPos": {
+ "h": 10,
+ "w": 12,
+ "x": 12,
+ "y": 54
+ },
+ "id": 21,
+ "options": {
+ "cellHeight": "sm",
+ "footer": {
+ "countRows": false,
+ "enablePagination": false,
+ "fields": "",
+ "reducer": [
+ "sum"
+ ],
+ "show": false
+ },
+ "showHeader": true,
+ "sortBy": [
+ {
+ "desc": true,
+ "displayName": "run_seconds"
+ }
+ ]
+ },
+ "pluginVersion": "10.4.1",
+ "targets": [
+ {
+ "datasource": {
+ "type": "grafana-postgresql-datasource",
+ "uid": "cef5v5sl9k7i8f"
+ },
+ "editorMode": "code",
+ "format": "table",
+ "rawQuery": true,
+ "rawSql": "SELECT\n j.id,\n lr.reservation_id,\n CASE\n WHEN j.job_type = 'full-reindex' THEN '(all realms)'\n ELSE COALESCE(\n NULLIF(RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/'), ''),\n REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://([^./:]+).*$', '\\1')\n )\n END AS realm,\n j.status,\n lr.started_at,\n j.finished_at,\n EXTRACT(EPOCH FROM (j.finished_at - lr.started_at)) AS run_seconds,\n lr.worker_id\nFROM jobs j\nJOIN LATERAL (\n SELECT jr.id AS reservation_id, jr.created_at AS started_at, jr.worker_id\n FROM job_reservations jr\n WHERE jr.job_id = j.id\n ORDER BY jr.created_at DESC\n LIMIT 1\n) lr ON TRUE\nWHERE j.job_type = 'incremental-index'\n AND j.finished_at IS NOT NULL\n AND j.finished_at > NOW() - INTERVAL '24 hours'\nORDER BY run_seconds DESC NULLS LAST\nLIMIT 10;",
+ "refId": "A"
+ }
+ ],
+ "title": "Longest incremental-index jobs (24h)",
+ "type": "table"
}
],
"refresh": "5s",
diff --git a/tmp/remote-canon.8rrtDx/dashboards/boxel-status/job-queue.json b/tmp/committed-canon.AZBm7q/dashboards/boxel-status/job-queue.json
index a2e16cc..d2de711 100644
--- a/tmp/remote-canon.8rrtDx/dashboards/boxel-status/job-queue.json
+++ b/tmp/committed-canon.AZBm7q/dashboards/boxel-status/job-queue.json
@@ -1112,280 +1112,6 @@
],
"title": "Finished Jobs (limit 500)",
"type": "table"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "description": "Top 10 from-scratch-index jobs that finished in the past 24 hours, ranked by run duration. `run_seconds` measures from the final reservation's `created_at` to `finished_at`, so a job that was retried is timed on the attempt that completed it. Use this to spot realms whose full reindex is regressing.",
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "thresholds"
- },
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
- },
- "filterable": true,
- "inspect": false,
- "minWidth": 150
- },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- },
- {
- "color": "red",
- "value": 80
- }
- ]
- }
- },
- "overrides": [
- {
- "matcher": {
- "id": "byName",
- "options": "run_seconds"
- },
- "properties": [
- {
- "id": "unit",
- "value": "s"
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "worker_id"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
- },
- "type": "regex"
- }
- ]
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "reservation_id"
- },
- "properties": [
- {
- "id": "custom.hidden",
- "value": true
- }
- ]
- }
- ]
- },
- "gridPos": {
- "h": 10,
- "w": 12,
- "x": 0,
- "y": 58
- },
- "id": 20,
- "options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
- ],
- "show": false
- },
- "showHeader": true,
- "sortBy": [
- {
- "desc": true,
- "displayName": "run_seconds"
- }
- ]
- },
- "pluginVersion": "10.4.1",
- "targets": [
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "table",
- "rawQuery": true,
- "rawSql": "SELECT\n j.id,\n lr.reservation_id,\n CASE\n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '')\n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '')\n ELSE j.concurrency_group\n END AS realm,\n j.status,\n lr.started_at,\n j.finished_at,\n EXTRACT(EPOCH FROM (j.finished_at - lr.started_at)) AS run_seconds,\n lr.worker_id\nFROM jobs j\nJOIN LATERAL (\n SELECT jr.id AS reservation_id, jr.created_at AS started_at, jr.worker_id\n FROM job_reservations jr\n WHERE jr.job_id = j.id\n ORDER BY jr.created_at DESC\n LIMIT 1\n) lr ON TRUE\nWHERE j.job_type = 'from-scratch-index'\n AND j.finished_at IS NOT NULL\n AND j.finished_at > NOW() - INTERVAL '24 hours'\nORDER BY run_seconds DESC NULLS LAST\nLIMIT 10;",
- "refId": "A"
- }
- ],
- "title": "Longest from-scratch-index jobs (24h)",
- "type": "table"
- },
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "description": "Top 10 incremental-index jobs that finished in the past 24 hours, ranked by run duration. `run_seconds` measures from the final reservation's `created_at` to `finished_at`, so a job that was retried is timed on the attempt that completed it. Outliers here usually mean a heavy invalidation fan-out from a single edit.",
- "fieldConfig": {
- "defaults": {
- "color": {
- "mode": "thresholds"
- },
- "custom": {
- "align": "left",
- "cellOptions": {
- "type": "auto"
- },
- "filterable": true,
- "inspect": false,
- "minWidth": 150
- },
- "mappings": [],
- "thresholds": {
- "mode": "absolute",
- "steps": [
- {
- "color": "green"
- },
- {
- "color": "red",
- "value": 80
- }
- ]
- }
- },
- "overrides": [
- {
- "matcher": {
- "id": "byName",
- "options": "run_seconds"
- },
- "properties": [
- {
- "id": "unit",
- "value": "s"
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "worker_id"
- },
- "properties": [
- {
- "id": "links",
- "value": [
- {
- "targetBlank": true,
- "title": "View logs",
- "url": "/d/fetquzizsej28b?${__url_time_range}&var-job_id=${__data.fields.id}.${__data.fields.reservation_id}&orgId=1&viewPanel=3"
- }
- ]
- },
- {
- "id": "mappings",
- "value": [
- {
- "options": {
- "pattern": "^(.{6}).*$",
- "result": {
- "index": 0,
- "text": "View logs ($1)"
- }
- },
- "type": "regex"
- }
- ]
- }
- ]
- },
- {
- "matcher": {
- "id": "byName",
- "options": "reservation_id"
- },
- "properties": [
- {
- "id": "custom.hidden",
- "value": true
- }
- ]
- }
- ]
- },
- "gridPos": {
- "h": 10,
- "w": 12,
- "x": 12,
- "y": 58
- },
- "id": 21,
- "options": {
- "cellHeight": "sm",
- "footer": {
- "countRows": false,
- "enablePagination": false,
- "fields": "",
- "reducer": [
- "sum"
- ],
- "show": false
- },
- "showHeader": true,
- "sortBy": [
- {
- "desc": true,
- "displayName": "run_seconds"
- }
- ]
- },
- "pluginVersion": "10.4.1",
- "targets": [
- {
- "datasource": {
- "type": "grafana-postgresql-datasource",
- "uid": "cef5v5sl9k7i8f"
- },
- "editorMode": "code",
- "format": "table",
- "rawQuery": true,
- "rawSql": "SELECT\n j.id,\n lr.reservation_id,\n CASE\n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/.+' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://[^/]+/', '')\n WHEN j.concurrency_group ~ '^indexing:https://[^/]+/?$' THEN REGEXP_REPLACE(j.concurrency_group, '^indexing:https://', '')\n ELSE j.concurrency_group\n END AS realm,\n j.status,\n lr.started_at,\n j.finished_at,\n EXTRACT(EPOCH FROM (j.finished_at - lr.started_at)) AS run_seconds,\n lr.worker_id\nFROM jobs j\nJOIN LATERAL (\n SELECT jr.id AS reservation_id, jr.created_at AS started_at, jr.worker_id\n FROM job_reservations jr\n WHERE jr.job_id = j.id\n ORDER BY jr.created_at DESC\n LIMIT 1\n) lr ON TRUE\nWHERE j.job_type = 'incremental-index'\n AND j.finished_at IS NOT NULL\n AND j.finished_at > NOW() - INTERVAL '24 hours'\nORDER BY run_seconds DESC NULLS LAST\nLIMIT 10;",
- "refId": "A"
- }
- ],
- "title": "Longest incremental-index jobs (24h)",
- "type": "table"
}
],
"refresh": "5s",
(Run: https://github.com/cardstack/boxel/actions/runs/25835910315) |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f65383ce09
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
Polish pass on the Grafana Indexing dashboard: ensure the realm column is always populated (even for root-of-host published realms), make the Active Indexing percent gauge render at full cell width with a separate numeric column, give Queued Indexing Jobs a dedicated realm column, and relocate the two "Longest …-index jobs (24h)" tables from the Job Queue dashboard to the Indexing dashboard. Also normalizes a number of \u2014 JSON escapes to literal em-dashes.
Changes:
- Update SQL realm extraction to fall back to the host's first label when the path is empty, and add
'(all realms)'forfull-reindex. - Hide the gauge's inline value and add a new
pctcolumn with a blank header for the percentage display. - Move the two longest-indexing-jobs (24h) tables from
job-queue.jsonintoindexing.json.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/observability/grafanactl/resources/dashboards/boxel-status/job-queue.json | Removes the two Longest-jobs panels; replaces \u2014 escapes with em-dash characters. |
| packages/observability/grafanactl/resources/dashboards/boxel-status/indexing.json | Adds the relocated panels, updates Active Indexing/Per-realm/Queued Indexing queries with the realm fallback, configures the static gauge and adjacent pct column. |
Comments suppressed due to low confidence (1)
packages/observability/grafanactl/resources/dashboards/boxel-status/indexing.json:1369
- Same backreference-escaping bug as in the from-scratch panel:
'\\\\1'decodes to SQL'\\1', whichregexp_replacetreats as a literal backslash +1rather than as a backreference. Use'\\1'(matching the form used at lines 810/942/1077) so the root-of-host fallback emits the actual hostname first label.
"rawSql": "SELECT\n j.id,\n lr.reservation_id,\n CASE\n WHEN j.job_type = 'full-reindex' THEN '(all realms)'\n ELSE COALESCE(\n NULLIF(RTRIM(REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://[^/]+/', ''), '/'), ''),\n REGEXP_REPLACE(COALESCE(j.args->>'realmURL',''), '^https?://([^./:]+).*$', '\\\\1')\n )\n END AS realm,\n j.status,\n lr.started_at,\n j.finished_at,\n EXTRACT(EPOCH FROM (j.finished_at - lr.started_at)) AS run_seconds,\n lr.worker_id\nFROM jobs j\nJOIN LATERAL (\n SELECT jr.id AS reservation_id, jr.created_at AS started_at, jr.worker_id\n FROM job_reservations jr\n WHERE jr.job_id = j.id\n ORDER BY jr.created_at DESC\n LIMIT 1\n) lr ON TRUE\nWHERE j.job_type = 'incremental-index'\n AND j.finished_at IS NOT NULL\n AND j.finished_at > NOW() - INTERVAL '24 hours'\nORDER BY run_seconds DESC NULLS LAST\nLIMIT 10;",
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…edupe percent - Longest from-scratch-index / incremental-index jobs (24h): the realm fallback regex was using a JSON `'\\\\1'` (4 backslashes), which decodes to SQL `'\\1'` and emits a literal `\1` instead of the captured host label. Fix to `'\\1'` so root-of-host (published) realms render their subdomain — matching the form used in the other panels. - Active Indexing: the `percent` (gauge) and `pct` (left-aligned number) columns held duplicate CASE expressions. Wrap the main select in a subquery and project `percent AS pct` from a single source expression so the two fields can't drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Thanks for adding the preview @lukemelia, so much easier to review properly 🎉 |


Summary
realmcell blank for root-of-host (published) realms because theirrealmURLhas no path segment after the host. The column now falls back to the first host label (e.g.https://buckpublishedsep8.staging.boxel.build/→buckpublishedsep8), so every row has a realm.realmcolumn (with(all realms)forfull-reindex). The pre-existingconcurrency_groupcolumn stays for operators that lean on it.percentgauge cell now hides its inline value (valueDisplayMode: hidden), so the bar bounds stay at full cell width instead of shrinking around the label. The percentage moves to a small adjacent left-aligned column with a blank header so the digits sit flush against the bar.Test plan
pnpm --filter @cardstack/observability apply --env staging(or local equivalent) and open the Indexing dashboard.realmcolumn populated forfrom-scratch-index/incremental-index; shows(all realms)for a queuedfull-reindex.🤖 Generated with Claude Code