Skip to content

cachedb_redis: add Unix socket transport and lazy connection#3856

Open
NormB wants to merge 3 commits intoOpenSIPS:masterfrom
NormB:mr/feature-redis-unix-socket-lazy
Open

cachedb_redis: add Unix socket transport and lazy connection#3856
NormB wants to merge 3 commits intoOpenSIPS:masterfrom
NormB:mr/feature-redis-unix-socket-lazy

Conversation

@NormB
Copy link
Copy Markdown
Member

@NormB NormB commented Mar 30, 2026

Summary

  • Add Unix domain socket support as an alternative to TCP
  • Add lazy_connect parameter to defer connection until first use

Details

Unix socket transport

The module now supports connecting to Redis via Unix domain sockets using a query parameter in the cachedb_url:

modparam("cachedb_redis", "cachedb_url",
    "redis:local://localhost/?socket=/var/run/redis/redis.sock")

The REDIS_UNIX_SOCKET flag distinguishes socket connections from TCP. MI output includes transport (tcp/unix) and socket_path fields. Unix socket connections are supported in both single-instance and cluster topology refresh paths.

Lazy connection

The new lazy_connect parameter (integer, default 0) defers the Redis connection from child_init to the first cache operation. This is useful when:

  • Redis may not be available at OpenSIPS startup
  • Multiple cachedb_url groups are configured but not all are needed immediately
  • Startup time matters and Redis connection latency is non-trivial

Both TCP and Unix socket connections support lazy mode.

Parameter Type Default Description
lazy_connect integer 0 Defer Redis connection until first cache operation

Testing

Suite Tests Description
test_unix_socket 19 Store/fetch, MI reporting, PING latency, recovery
test_lazy_connect 17 Deferred connect for all transport types, recovery

Compatibility

No behavioral change for existing TCP configurations. The lazy_connect parameter defaults to 0 (disabled).

Dependencies

Debian added 3 commits March 30, 2026 04:30
Fix several correctness and safety issues in parse_moved_reply()
and the MOVED redirect handler:

- Add slot value overflow protection: return ERR_INVALID_SLOT
  when parsed slot exceeds 16383 during digit accumulation,
  preventing signed integer overflow on malformed MOVED replies.

- Add port value overflow protection: return ERR_INVALID_PORT
  when parsed port exceeds 65535 during digit accumulation,
  complementing the existing post-loop range check and preventing
  signed integer overflow on malformed input.

- Fix undefined behavior in the no-colon endpoint fallback path:
  replace comparison of potentially-NULL out->endpoint.s against
  end pointer with (p < end), which achieves the same logic using
  the scan position variable that is always valid.

- Replace pkg_malloc heap allocation of redis_moved struct with
  stack allocation in the MOVED handler. The struct is small
  (~24 bytes) and never outlives the enclosing scope, making heap
  allocation unnecessary. This eliminates the OOM error path and
  two pkg_free() calls.
Replace the static cluster topology (built once at startup, never
refreshed) with runtime discovery and automatic refresh:

Topology discovery and refresh:
- Probe CLUSTER SHARDS (Redis 7+) with fallback to CLUSTER SLOTS
  (Redis 3+) for backward compatibility
- O(1) slot_table[16384] lookup replaces per-query linked-list scan
- Automatic topology refresh on MOVED redirect, connection failure,
  or query targeting an unmapped slot (rate-limited to 1/sec)
- Dynamic node creation when MOVED points to an unknown endpoint
- Stale node pruning during refresh with safe connection cleanup
- Cap redirect loop at 5 max redirects to prevent worker hang on
  pathological cluster state

Cluster observability via MI commands:
- redis_cluster_info: full topology dump including per-node connection
  status, slot assignments, query/error/moved/ask counters, and
  last activity timestamp
- redis_cluster_refresh: trigger manual topology refresh (bypasses
  rate limit)
- redis_ping_nodes: per-node PING with microsecond latency reporting
- All MI commands support optional group filter parameter

Statistics:
- redis_queries, redis_queries_failed, redis_moved, redis_ask,
  redis_topology_refreshes (module-level stat counters)
- Per-node query, error, moved, ask counters in redis_cluster_info

Hash slot correctness:
- Hash tag {…} extraction per Redis Cluster specification
- CRC16 modulo 16384 replaces bitwise AND with slots_assigned

ASK redirect handling:
- Detect ASK responses alongside existing MOVED handling
- Send ASKING command to target node before retrying original query
- Do not update slot map (ASK is a temporary mid-migration redirect)
- Refactor parse_moved_reply into parse_redirect_reply with prefix
  parameter; inline wrappers for backward compatibility

Connection reliability:
- TCP keepalive via redis_keepalive parameter (default 10s)
- Stack allocation for redis_moved structs (eliminates OOM paths)
- NULL guards on malformed CLUSTER SHARDS/SLOTS reply elements
- Integer overflow protection in slot and port parsing
- NULL guards in MI command handlers for group_name/initial_url

Documentation:
- New section: Redis Cluster Support (topology discovery, automatic
  refresh, MOVED/ASK handling, hash tags)
- MI command reference: redis_cluster_info, redis_cluster_refresh,
  redis_ping_nodes
- Authentication URL format documentation (classic, ACL, no-auth)
- New parameter: redis_keepalive

Test suite (186 tests):
- C unit tests: hash slot calculation (37), MI counter helpers (41)
- Integration: topology startup (12), ASK redirect (16), topology
  refresh (13), MI commands (50), edge cases (16)
- Trap EXIT handlers for safe cluster state restoration
- python3 preflight checks for JSON-dependent tests

Depends on: OpenSIPS#3815 (hash tag + modulo fix), OpenSIPS#3852 (ASK redirect)
Add Unix domain socket support as an alternative to TCP connections:
- New URL format: redis:group://localhost/?socket=/path/to/sock
- REDIS_UNIX_SOCKET flag for connection and node identification
- MI output includes transport type (tcp/unix) and socket_path
- Unix socket path tracked in redis_con and cluster_node structs

Add lazy connection establishment:
- New lazy_connect module parameter (integer, default 0)
- Defers Redis connection until first cache operation
- Works for both TCP and Unix socket transport modes

Test suite:
- test_unix_socket.sh: 19 integration tests
- test_lazy_connect.sh: 17 integration tests
- Test stubs synced with production struct layout

Depends on: MR B (feature/redis-cluster-management)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant