diff --git a/asap-dropin/.env b/asap-dropin/.env index 79e0f408..592c6a1d 100644 --- a/asap-dropin/.env +++ b/asap-dropin/.env @@ -19,3 +19,10 @@ REMOTE_WRITE_PORT=9091 # Point your Grafana Prometheus datasource here: # http://localhost:${QUERY_ENGINE_PORT} QUERY_ENGINE_PORT=8088 + +# ── Observation window ──────────────────────────────────────────────────────── +# How long ASAPQuery observes Grafana queries before generating an acceleration +# plan (in seconds). All queries are forwarded to Prometheus during this window. +# Rule of thumb: set this to at least 3× your Grafana dashboard refresh interval. +# Example: 30s refresh → 90s minimum; 1m refresh → 180s minimum. +TRACKER_OBSERVATION_WINDOW_SECS=180 diff --git a/asap-dropin/README.md b/asap-dropin/README.md index bb16ec99..fdf206b4 100644 --- a/asap-dropin/README.md +++ b/asap-dropin/README.md @@ -2,7 +2,7 @@ A self-contained single-container Docker Compose that adds ASAPQuery to an existing Prometheus and Grafana deployment. -On startup, all queries are forwarded transparently to your upstream Prometheus. After one observation window (default 10 min), the engine automatically plans and activates sketch-based acceleration based on the real queries it observed from Grafana. +On startup, all queries are forwarded transparently to your upstream Prometheus. After one observation window (default 180s), the engine automatically plans and activates sketch-based acceleration based on the real queries it observed from Grafana. ## Prerequisites @@ -10,33 +10,58 @@ On startup, all queries are forwarded transparently to your upstream Prometheus. - A running Prometheus instance - A running Grafana instance (with a Prometheus datasource) -## Quick Start +## Architecture + +``` +Prometheus ──remote_write──▶ ASAPQuery (:9091) + ▲ │ + │ unsupported queries ▼ builds sketches + └──────────── ASAPQuery (:8088) ◀── Grafana +``` -### 1. Configure environment +## Setup -Edit `.env`: +### Step 1 — Configure environment + +Edit `.env` to match your deployment: | Variable | Default | Description | |---|---|---| -| `PROMETHEUS_URL` | `http://host.docker.internal:9090` | URL of your Prometheus, reachable from inside Docker | +| `PROMETHEUS_URL` | `http://host.docker.internal:9090` | URL of your Prometheus, reachable from inside the ASAPQuery container | | `PROMETHEUS_SCRAPE_INTERVAL` | `15` | Your Prometheus scrape interval in seconds | -| `REMOTE_WRITE_PORT` | `9091` | Host port for the remote-write receiver | -| `QUERY_ENGINE_PORT` | `8088` | Host port for the ASAPQuery query engine | +| `REMOTE_WRITE_PORT` | `9091` | ASAPQuery data ingest port — must be free on the host | +| `QUERY_ENGINE_PORT` | `8088` | ASAPQuery query endpoint port — must be free on the host | +| `TRACKER_OBSERVATION_WINDOW_SECS` | `180` | How long to observe queries before planning (see note below) | **Finding the right `PROMETHEUS_URL`:** -- **Docker Desktop (Mac/Windows):** `http://host.docker.internal:9090` (default) -- **Linux (Prometheus on host):** `http://172.17.0.1:9090` (default Docker bridge gateway) -- **Prometheus in another Docker Compose:** create a shared external network +- **Prometheus on the same host as Docker:** `http://172.17.0.1:9090` (default Docker bridge gateway on Linux) +- **Prometheus in another Docker Compose:** use a shared external Docker network and the Prometheus service name + +**Setting `TRACKER_OBSERVATION_WINDOW_SECS`:** +Set this to at least 3× your Grafana dashboard refresh interval so ASAPQuery sees enough query repetitions to build a useful plan. +- Grafana refresh 30s → set to 90 or higher +- Grafana refresh 1m → set to 180 or higher (default) +- Grafana refresh 5m → set to 900 or higher -### 2. Start ASAPQuery +### Step 2 — Start ASAPQuery ```bash docker compose up -d ``` -### 3. Add remote_write to your Prometheus +Verify it started: -Add this to your `prometheus.yml` and reload Prometheus: +```bash +docker compose logs queryengine +``` + +You should see a line confirming Prometheus is reachable, then the engine waiting for the observation window. + +### Step 3 — Configure Prometheus remote_write + +Prometheus needs to send all ingested samples to ASAPQuery so it can build sketches. + +**Add this block to your `prometheus.yml`:** ```yaml remote_write: @@ -46,29 +71,65 @@ remote_write: sample_age_limit: 5m ``` -### 4. Point Grafana at ASAPQuery +> **Finding the right `remote_write` URL:** The URL is from Prometheus's perspective. +> - **Prometheus on the same host as Docker:** `http://localhost:9091/receive` (default above) +> - **Prometheus in Docker on the same host:** `http://host.docker.internal:9091/receive` (Mac/Windows) or `http://172.17.0.1:9091/receive` (Linux) +> - Change `9091` if you set a different `REMOTE_WRITE_PORT` in `.env` -Change your Grafana Prometheus datasource URL from your Prometheus address to: +**Reload Prometheus to apply the change:** +If Prometheus was started with `--web.enable-lifecycle`: +```bash +curl -X POST http://localhost:9090/-/reload ``` -http://localhost:8088 + +Otherwise, send SIGHUP to the Prometheus process: +```bash +kill -HUP $(pgrep prometheus) ``` -ASAPQuery speaks the Prometheus query API. Queries it can accelerate are answered from sketches; all others are transparently forwarded to your upstream Prometheus. +See the [Prometheus configuration docs](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) for more details on reloading. -## Architecture +### Step 4 — Add an ASAPQuery datasource in Grafana + +Create a new datasource in Grafana pointing at ASAPQuery, then switch your dashboards to use it. + +1. Open Grafana in your browser +2. Go to **Connections → Data Sources** +3. Click **Add new data source** and select **Prometheus** +4. Set the **Name** to something like `ASAPQuery` +5. Set the **URL** to: + ``` + http://localhost:8088 + ``` + (Change the port if you set a different `QUERY_ENGINE_PORT` in `.env`) +6. Click **Save & Test** — you should see "Data source is working" +7. Open your dashboards and switch their datasource to `ASAPQuery` + +ASAPQuery speaks the Prometheus query API. Queries it can accelerate are answered from sketches; all others are transparently forwarded to your upstream Prometheus, so your dashboards continue to work. + +### Step 5 — Verify end-to-end + +Use your Grafana dashboards normally. During the observation window, all queries pass through to Prometheus transparently. + +After the observation window elapses, check the ASAPQuery logs: +```bash +docker compose logs queryengine | grep query_tracker +``` + +You should see lines like: +``` +query_tracker: planner succeeded — streaming aggregations: N, inference queries: M ``` -Your Prometheus ──remote_write──▸ ASAPQuery (:9091/receive) - │ - ▼ -Your Grafana ◂──query──── ASAPQuery Query Engine (:8088) - │ - ▼ (fallback / passthrough) - Your Prometheus + +From this point on, check the routing in the logs: + +```bash +docker compose logs queryengine | grep "destination=" ``` -The query engine embeds the planner and runs it automatically after observing real Grafana queries for one observation window. No separate planner container, no Kafka, no Arroyo. +Lines with `destination=asap` are served by ASAPQuery; lines with `destination=prometheus` are forwarded to your upstream Prometheus. ## Development diff --git a/asap-dropin/docker-compose.yml b/asap-dropin/docker-compose.yml index 8ce572b1..fb527d75 100644 --- a/asap-dropin/docker-compose.yml +++ b/asap-dropin/docker-compose.yml @@ -9,7 +9,7 @@ name: asapquery-dropin # 4. Point your Grafana datasource URL -> http://localhost:${QUERY_ENGINE_PORT} # # The query engine starts with an empty plan and forwards all queries to Prometheus. -# After the observation window (default 10 min), it automatically generates a plan +# After the observation window (default 180s), it automatically generates a plan # based on real query patterns and begins precomputing sketches. networks: @@ -43,11 +43,11 @@ services: - "--lock-strategy=per-key" - "--forward-unsupported-queries" - "--enable-query-tracker" - - "--tracker-observation-window-secs=600" + - "--tracker-observation-window-secs=${TRACKER_OBSERVATION_WINDOW_SECS:-60}" healthcheck: test: ["CMD-SHELL", "bash -c 'echo > /dev/tcp/localhost/8088' 2>/dev/null || exit 1"] interval: 10s timeout: 5s retries: 10 start_period: 15s - restart: unless-stopped + restart: no diff --git a/asap-query-engine/src/drivers/query/servers/http.rs b/asap-query-engine/src/drivers/query/servers/http.rs index 3866bbdb..7eee5880 100644 --- a/asap-query-engine/src/drivers/query/servers/http.rs +++ b/asap-query-engine/src/drivers/query/servers/http.rs @@ -217,6 +217,12 @@ async fn process_query_request( total_duration.as_secs_f64() * 1000.0 ); debug!("=== RETURNING SUCCESS RESPONSE ==="); + info!( + "query='{}' destination=asap asap_latency_ms={:.2} total_latency_ms={:.2}", + parsed_request.query, + query_duration.as_secs_f64() * 1000.0, + total_duration.as_secs_f64() * 1000.0 + ); match state .adapter @@ -238,6 +244,11 @@ async fn process_query_request( // Step 4: Handle unsupported query using fallback client if let Some(fallback) = &state.fallback { debug!("Query not supported locally, forwarding to fallback"); + info!( + "query='{}' destination=prometheus total_latency_ms={:.2}", + parsed_request.query, + total_duration.as_secs_f64() * 1000.0 + ); // Fallback client handles the HTTP call and returns formatted response match fallback .execute_query_with_headers(parsed_request, headers) @@ -248,6 +259,11 @@ async fn process_query_request( } } else { debug!("Query not supported and forwarding disabled, returning error"); + info!( + "query='{}' destination=none_unsupported total_latency_ms={:.2}", + parsed_request.query, + total_duration.as_secs_f64() * 1000.0 + ); // Adapter formats the unsupported query error for its protocol match state.adapter.format_unsupported_query_response().await { Ok(json) => json.into_response(), diff --git a/asap-query-engine/src/main.rs b/asap-query-engine/src/main.rs index 92aade49..80fb52f2 100644 --- a/asap-query-engine/src/main.rs +++ b/asap-query-engine/src/main.rs @@ -403,6 +403,40 @@ async fn main() -> Result<()> { adapter_config, }; + // Verify Prometheus is reachable before starting + { + let client = reqwest::Client::new(); + let health_url = format!( + "{}/api/v1/status/runtimeinfo", + args.prometheus_server.trim_end_matches('/') + ); + match client + .get(&health_url) + .timeout(std::time::Duration::from_secs(5)) + .send() + .await + { + Ok(resp) if resp.status().is_success() => { + info!("Prometheus reachable at {}", args.prometheus_server); + } + Ok(resp) => { + error!( + "Prometheus at {} returned HTTP {} — cannot start", + args.prometheus_server, + resp.status() + ); + std::process::exit(1); + } + Err(e) => { + error!( + "Cannot reach Prometheus at {}: {}", + args.prometheus_server, e + ); + std::process::exit(1); + } + } + } + let query_tracker = if args.enable_query_tracker { use query_engine_rust::planner_client::{LocalPlannerClient, PlannerResult}; use query_engine_rust::QueryTrackerConfig;