cachedb_redis: add Unix socket transport and lazy connection#3856
Open
NormB wants to merge 3 commits intoOpenSIPS:masterfrom
Open
cachedb_redis: add Unix socket transport and lazy connection#3856NormB wants to merge 3 commits intoOpenSIPS:masterfrom
NormB wants to merge 3 commits intoOpenSIPS:masterfrom
Conversation
added 3 commits
March 30, 2026 04:30
Fix several correctness and safety issues in parse_moved_reply() and the MOVED redirect handler: - Add slot value overflow protection: return ERR_INVALID_SLOT when parsed slot exceeds 16383 during digit accumulation, preventing signed integer overflow on malformed MOVED replies. - Add port value overflow protection: return ERR_INVALID_PORT when parsed port exceeds 65535 during digit accumulation, complementing the existing post-loop range check and preventing signed integer overflow on malformed input. - Fix undefined behavior in the no-colon endpoint fallback path: replace comparison of potentially-NULL out->endpoint.s against end pointer with (p < end), which achieves the same logic using the scan position variable that is always valid. - Replace pkg_malloc heap allocation of redis_moved struct with stack allocation in the MOVED handler. The struct is small (~24 bytes) and never outlives the enclosing scope, making heap allocation unnecessary. This eliminates the OOM error path and two pkg_free() calls.
Replace the static cluster topology (built once at startup, never
refreshed) with runtime discovery and automatic refresh:
Topology discovery and refresh:
- Probe CLUSTER SHARDS (Redis 7+) with fallback to CLUSTER SLOTS
(Redis 3+) for backward compatibility
- O(1) slot_table[16384] lookup replaces per-query linked-list scan
- Automatic topology refresh on MOVED redirect, connection failure,
or query targeting an unmapped slot (rate-limited to 1/sec)
- Dynamic node creation when MOVED points to an unknown endpoint
- Stale node pruning during refresh with safe connection cleanup
- Cap redirect loop at 5 max redirects to prevent worker hang on
pathological cluster state
Cluster observability via MI commands:
- redis_cluster_info: full topology dump including per-node connection
status, slot assignments, query/error/moved/ask counters, and
last activity timestamp
- redis_cluster_refresh: trigger manual topology refresh (bypasses
rate limit)
- redis_ping_nodes: per-node PING with microsecond latency reporting
- All MI commands support optional group filter parameter
Statistics:
- redis_queries, redis_queries_failed, redis_moved, redis_ask,
redis_topology_refreshes (module-level stat counters)
- Per-node query, error, moved, ask counters in redis_cluster_info
Hash slot correctness:
- Hash tag {…} extraction per Redis Cluster specification
- CRC16 modulo 16384 replaces bitwise AND with slots_assigned
ASK redirect handling:
- Detect ASK responses alongside existing MOVED handling
- Send ASKING command to target node before retrying original query
- Do not update slot map (ASK is a temporary mid-migration redirect)
- Refactor parse_moved_reply into parse_redirect_reply with prefix
parameter; inline wrappers for backward compatibility
Connection reliability:
- TCP keepalive via redis_keepalive parameter (default 10s)
- Stack allocation for redis_moved structs (eliminates OOM paths)
- NULL guards on malformed CLUSTER SHARDS/SLOTS reply elements
- Integer overflow protection in slot and port parsing
- NULL guards in MI command handlers for group_name/initial_url
Documentation:
- New section: Redis Cluster Support (topology discovery, automatic
refresh, MOVED/ASK handling, hash tags)
- MI command reference: redis_cluster_info, redis_cluster_refresh,
redis_ping_nodes
- Authentication URL format documentation (classic, ACL, no-auth)
- New parameter: redis_keepalive
Test suite (186 tests):
- C unit tests: hash slot calculation (37), MI counter helpers (41)
- Integration: topology startup (12), ASK redirect (16), topology
refresh (13), MI commands (50), edge cases (16)
- Trap EXIT handlers for safe cluster state restoration
- python3 preflight checks for JSON-dependent tests
Depends on: OpenSIPS#3815 (hash tag + modulo fix), OpenSIPS#3852 (ASK redirect)
Add Unix domain socket support as an alternative to TCP connections: - New URL format: redis:group://localhost/?socket=/path/to/sock - REDIS_UNIX_SOCKET flag for connection and node identification - MI output includes transport type (tcp/unix) and socket_path - Unix socket path tracked in redis_con and cluster_node structs Add lazy connection establishment: - New lazy_connect module parameter (integer, default 0) - Defers Redis connection until first cache operation - Works for both TCP and Unix socket transport modes Test suite: - test_unix_socket.sh: 19 integration tests - test_lazy_connect.sh: 17 integration tests - Test stubs synced with production struct layout Depends on: MR B (feature/redis-cluster-management)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
lazy_connectparameter to defer connection until first useDetails
Unix socket transport
The module now supports connecting to Redis via Unix domain sockets using a query parameter in the
cachedb_url:The
REDIS_UNIX_SOCKETflag distinguishes socket connections from TCP. MI output includestransport(tcp/unix) andsocket_pathfields. Unix socket connections are supported in both single-instance and cluster topology refresh paths.Lazy connection
The new
lazy_connectparameter (integer, default 0) defers the Redis connection fromchild_initto the first cache operation. This is useful when:cachedb_urlgroups are configured but not all are needed immediatelyBoth TCP and Unix socket connections support lazy mode.
lazy_connectTesting
Compatibility
No behavioral change for existing TCP configurations. The
lazy_connectparameter defaults to 0 (disabled).Dependencies