Skip to content

GET /v1/alerts?origin=X causes systematic timeout (25s+) while same query without origin= responds in 13ms #4470

@jmrGrav

Description

@jmrGrav

Description

Calling GET /v1/alerts?has_active_decision=true&origin=X against the local LAPI hangs and eventually times out (>25s, observed up to 20s+ on every call), while the same request without the origin= parameter completes in ~13ms with HTTP 200. This makes cscli decisions list --origin X unusable in scripts (and breaks any tooling that relies on it, e.g. cron jobs that filter decisions by origin).

The bottleneck is not SQLite — the underlying SQL query completes in <100ms with or without an index on decisions.origin. The hang is somewhere in the Go handler for the origin filter on this route.

Environment

  • CrowdSec: v1.7.8-debian-pragmatic-amd64-63227459 (Codename: alphaga, BuildDate: 2026-05-11)
  • GoVersion: 1.26.2
  • OS: Ubuntu 24.04.4 LTS, kernel 6.17.0-23-generic
  • DB backend: SQLite (44 MB, ~44k rows in decisions, ~33k active)
  • LAPI binding: 127.0.0.1:8080 (local-only)

Steps to reproduce

# Get a JWT
LOGIN=$(grep "^login:" /etc/crowdsec/local_api_credentials.yaml | awk '{print $2}')
PASS=$(grep "^password:" /etc/crowdsec/local_api_credentials.yaml | awk '{print $2}')
JWT=$(curl -s -X POST http://127.0.0.1:8080/v1/watchers/login \
  -H "Content-Type: application/json" \
  -d "{\"machine_id\":\"$LOGIN\",\"password\":\"$PASS\"}" \
  | jq -r .token)

# Fast — completes in ~13ms
time curl -s -m 25 -H "Authorization: Bearer $JWT" \
  "http://127.0.0.1:8080/v1/alerts?has_active_decision=true&include_capi=false&limit=100" \
  -o /dev/null -w "HTTP %{http_code} time=%{time_total}s\n"

# Hangs — times out (HTTP 000 after 25s)
time curl -s -m 25 -H "Authorization: Bearer $JWT" \
  "http://127.0.0.1:8080/v1/alerts?has_active_decision=true&include_capi=false&limit=100&origin=cscli" \
  -o /dev/null -w "HTTP %{http_code} time=%{time_total}s\n"

Observed output:

HTTP 200 time=0.013825s
HTTP 000 time=25.002199s

Same behaviour through cscli:

$ time cscli decisions list -o json | wc -c           # ~0.3s, returns full payload
$ time cscli decisions list --origin cscli -o json    # always 15s timeout (cscli's internal timeout)
$ time cscli decisions list --origin cscli            # same timeout in table mode
$ time cscli decisions list --value 1.2.3.4 -o json   # ~0.2s, works
$ time cscli decisions list --scope Ip -o json        # ~0.2s, works

Only the origin filter is affected. Reproducible with any value of origin (cscli, crowdsec, lists, CAPI).

Investigation

To rule out the database as the bottleneck:

  1. Existing indexes on decisions: decision_value, decision_until, decision_alert_decisions, decision_start_ip_end_ip. No index on origin.
  2. Direct SQLite benchmark of the equivalent JOIN (alertsdecisions filtered by origin and until > now()): 70ms (uses decision_until index).
  3. Created CREATE INDEX idx_decision_origin_until ON decisions(origin, until) + ANALYZE → SQL query drops to 21ms with EXPLAIN QUERY PLAN confirming use of the new index.
  4. cscli --origin still times out at 20s with the new index in place. Index dropped afterwards.

→ The handler is not SQL-bound. The hang happens elsewhere in the Go path for the origin= query parameter. Possible candidates: a misbehaving filter loop, an N+1 fetch, a deadlock, or unbounded pagination. The TCP connection from cscli/curl stays in ESTABLISHED state during the whole hang, then the client times out.

Expected behavior

GET /v1/alerts?origin=X should return in the same order of magnitude as the unfiltered query (10s–100s of ms), since filtering reduces the result set rather than expanding it.

Workaround

Drop the origin= filter and filter client-side:

# Fetch all active alerts (one call, ~300ms)
result = subprocess.run(
    ["cscli", "decisions", "list", "-o", "json"],
    capture_output=True, text=True, timeout=15,
)
alerts = json.loads(result.stdout)

# Filter by origin in Python
cscli_bans = {
    dec["value"]
    for alert in alerts
    for dec in (alert.get("decisions") or [])
    if dec.get("origin") == "cscli"
    and dec.get("type") == "ban"
    and dec.get("scope", "").lower() == "ip"
}

This is a strict regression for callers that need the filter, but works around the issue completely.


Happy to provide additional diagnostics (pprof, strace, journalctl, full schema dump) if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions