Skip to content

merge#16

Closed
JimB123 wants to merge 100 commits into
jims-forklessfrom
unstable
Closed

merge#16
JimB123 wants to merge 100 commits into
jims-forklessfrom
unstable

Conversation

@JimB123
Copy link
Copy Markdown
Owner

@JimB123 JimB123 commented Feb 18, 2026

No description provided.

phermansson and others added 30 commits January 8, 2026 19:27
…ey-io#2945)

Previously, only changes to argv[0] were considered when deciding
whether to re-prepare the command and recompute the cluster slot. This
caused issues where changes to the argument count (argc) would not
trigger the necessary prepare in case additional keys were injected to
the command

This fix adds a check to ensure that changes to argc are properly
considered.

Signed-off-by: Patrik Hermansson <phermansson@gmail.com>
… explicitly set (valkey-io#2777)

By default, servers in a cluster don't have human-readable nodename
unless explicitly set by user. Most logging statements in the cluster
are in the form of `serverLog(LL_NOTICE, "Node %.40s (%s) did
something..", node->name, node->human_nodename, ...);` , and thus this
would always result in an empty `()` like the following:

```
* Node dd4f43d9e4a1a2875a514733c08d4870c21db682 () is now a replica of node 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 () in shard d3fc557d12ab3c5a8850c7b0c9afb252b473cad1
* Clear FAIL state for node dd4f43d9e4a1a2875a514733c08d4870c21db682 (): replica is reachable again.
* NODE 524978a9924b5841c4c93f83581d202cfbce1443 () possibly failing.
* FAIL message received from 747e0f97a58cd5a7ac4c2130417fc47a807802ed () about 1b5d2a4865c04aaa1ccf06db66f22a043411ded7 ()
```

**This PR adds a helper function `humanNodename` specifically for
logging purpose that can print the node's ip:port when its nodename is
not explicitly set.** Now the logs can greatly help developers to
quickly recognize which node is doing stuff:

```
* NODE 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) possibly failing.
* Marking node 03b3811da7b33451f6e4988295b0158f410d2c00 (127.0.0.1:7001) as failing (quorum reached).
* NODE b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) possibly failing.
* Marking node b2f0fc93632d12c06039335b025365d484ff9049 (127.0.0.1:7002) as failing (quorum reached).
# Cluster state changed: fail
```

There’s no behavior change in cluster state, gossip, failover, etc, and
**the human nodename of each node is not changed**. This PR is for
logging only and is safe:

- For developers who never cared about the empty (), this is just a
small quality-of-life improvement: suddenly the logs tell them which
node is which, without extra node ID lookups.

- For developers who rely on custom names they prefer, nothing changes –
their explicit nodename takes precedence.

Signed-off-by: Zhijun <dszhijun@gmail.com>
…n and data corruption (valkey-io#3004)

When loading AOF in cluster mode, keys inside a MULTI/EXEC block could
be
inserted into wrong hash slots, causing key duplication and data
corruption.

The root cause was the slot caching optimization in getKeySlot(). This
optimization reuses a cached slot value to avoid recalculating the hash
for every key operation. However, when replaying AOF, a transaction may
contain commands affecting keys in different slots. The cached slot from
a previous command (e.g., SET k1) would incorrectly be used for
subsequent
commands in the transaction (e.g., SET k0), causing k0 to be stored in
k1's
slot.

The existing code already skipped this optimization for replicated
clients
(commands from primary) using isReplicatedClient(). This change extends
that to also skip for AOF clients by using mustObeyClient() instead,
which
covers both replicated clients and the AOF client.

Fixes valkey-io#2995, introduced in valkey-io#1949.

Signed-off-by: aditya.teltia <teltia.aditya22@gmail.com>
…on is in the past (valkey-io#3023)

if an expired time is used, the condition is ignored, and it directly
becomes the effect of the `hdel` command.
This is mainly important for cases where the user would like to protect
deleting the data in case the `HEXPIREAT` command is used and the user
would like to protect delay execution of this command to delete fields.

fixes: valkey-io#3021

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
**The problem/use-case that the feature addresses**

I teach Valkey to people frequently. The general process is to start on
standalone Valkey (or 'try me' Valkey) to get started. However, in the
more advanced sections, I introduce the concepts of required to develop
for a Valkey Cluster. The key learning here is to understand slots (e.g.
multi-key operations, using hash tags, etc.). Standing up a cluster is a
hassle for someone just learning, and, while I usually demo `CLUSTER
KEYSLOT` for the users, they can't run that command on their single
instance since all of the `CLUSTER` commands are blocked. I think Valkey
students would learn a lot more quickly if they could experiment with
`CLUSTER KEYSLOT` themselves in standalone mode.

In this case, `CLUSTER KEYSLOT` doesn't require anything in the cluster
- it just does the math on the key to determine the slot number.

**Description of the feature**

This feature unblocks `CLUSTER KEYSLOT` on single instances.

**Alternatives you've considered**

- Having a cluster on try-me valkey (too many resources)
- Having single instance clusters (discussed before, but far more
complex than this trivial change)

**Additional information**

Their might be less repetitious logic, but the entire bits that
generates `CLUSTER KEYSLOT` result is just 3 lines, so repetitious feels
fine?

---------

Signed-off-by: Kyle J. Davis <kyledvs@amazon.com>
…ion (valkey-io#3030)

Before valkey-io#2858, we has this check:
```
static int scriptVerifyOOM(scriptRunCtx *run_ctx, char **err) {
    if (run_ctx->flags & SCRIPT_ALLOW_OOM) {
        /* Allow running any command even if OOM reached */
        return C_OK;
    }

    /* If we reached the memory limit configured via maxmemory, commands that
     * could enlarge the memory usage are not allowed, but only if this is the
     * first write in the context of this script, otherwise we can't stop
     * in the middle. */

    if (server.maxmemory &&                          /* Maxmemory is actually enabled. */
        !mustObeyClient(run_ctx->original_client) && /* Don't care about mem for replicas or AOF. */
        !(run_ctx->flags & SCRIPT_WRITE_DIRTY) &&    /* Script had no side effects so far. */
        server.pre_command_oom_state &&              /* Detected OOM when script start. */
        (run_ctx->c->cmd->flags & CMD_DENYOOM)) {
        *err = sdsdup(shared.oomerr->ptr);
        return C_ERR;
    }

    return C_OK;
}
```

If we reached the memory limit configured via maxmemory, commands that
could enlarge the memory usage are not allowed, but only if this is the
first write in the context of this script, otherwise we can't stop
in the middle.

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
This change restores maxmemory to 0 at the end of both OOM scripting
tests, preventing memory configuration from leaking into subsequent
tests.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
The patch fixes the error by adding `pull‑requests: write` to the
permissions block of `weekly.yml`. Github rejects if the reusable
workflow's (here daily.yml) permissions are not provided.

We recently changed the permissions in `daily.yml` where we gave write
permissions in valkey-io#2907
With this change, we bring parity to the permissions since `weekly.yml`
uses calls `daily.yml` workflow call method.

Fixes: https://github.com/valkey-io/valkey/actions/runs/20890545320

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Introduced by mistake in valkey-io#2807.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…copy-avoidance (valkey-io#3046)

In releaseBufReferences(), clusterSlotStatsAddNetworkBytesOutForSlot() is called in the IO
thread after writing the reply to the client, and (since valkey-io#2652) it's also done in the normal
code path in the main thread (afterCommand() -> clusterSlotStatsAddNetworkBytesOutForUserClient()).
This means the output bytes are recorded twice in the cluster slot stats.

Fixes valkey-io#3043. Bug was introduced in valkey-io#2652 (9.0.1).

Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…piration is in the past (valkey-io#3048)

valkey-io#3023 was only partially solving
the problem.
We need to avoid expiring items in case of any validation rule failed.

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
…erhead (valkey-io#3005)

The metric `used_memory_dataset` turned into an insanely large number close to 2^64 (actually
overflowed negative value), as reported in valkey-io#2994.

## Double-Counted database memory

When server starts, the global variable `server.initial_memory_usage` is used to record a memory
baseline in InitServerLast. This `server.initial_memory_usage` has clearly included initial database
memory, since databases are created in initServer.

In function getMemoryOverheadData, the `mem_total` is firstly assigned the baseline, which includes
initial database memory. And then all extra memory usage of databases are added to mem_total. The initial
database memory are therefore counted TWICE.

This eventually caused wrongly larger `used_memory_overhead`. For a database with only a couple of keys,
the `used_memory_overhead` is easily larger than `used_memory` and causes an overflowed `used_memory_dataset`.

## Missed Empty Databases

In function getMemoryOverheadData(), kvstores without any allocated hashtable are ignored from calculation:
```c
if (db == NULL || !kvstoreNumAllocatedHashtables(db->keys)) continue;
```

However, even the kvstore has no allocated hashtable, there are still some memory allocated by kvstoreCreate(),
including `hashtable_size_index`, which can be larger than 128 KiB.

On the contrary, this caused wrongly smaller `used_memory_overhead` for an empty database. When we insert only
ONE key to the database, the database is suddenly taken into account, and `used_memory_overhead` will increase
(for `used_memory_dataset` decrease) by more than 128 KiB due to the single key insertion.

Signed-off-by: Ace Breakpoint <chemistudio@gmail.com>
Signed-off-by: bpint <chemistudio@gmail.com>
Signed-off-by: Madelyn Olson <madelyneolson@gmail.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Add VM_ClusterKeySlotC that takes a data pointer and len as parameters.

Purpose: A module can avoid creating a ValkeyModuleString in a hot code
path.

Closes valkey-io#2901

---------

Signed-off-by: Su Ko <rhtn1128@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
The metric tracks the number of seconds the main thread spends doing
active work as supposed to waiting for work. The active time is exposed
as the INFO field `used_active_time_main_thread` in the CPU section.

When I/O threads are used, the main thread uses a busy-loop when it's
waiting for work, so the reported CPU usage is near 100% even if the
thread has capacity to handle more work. This new metric attempts to
provide a useful metric of how loaded the main thread is by excluding
the time the thread is just waiting for work.

The busy loop consists of the cycle (beforeSleep, non-blocking epoll,
afterSleep). Only the duration of the loops that handle at least one
event loop event (network, file or timer event) or some work from I/O
threads (execution of commands received from clients by I/O threads) is
counted as active time.

Implements the main thread part of valkey-io#2065.

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…3071)

Fix flaky tests in `slot-stats.tcl` and `introspection.tcl`.

Both tests share a similar flaky issue - when using a deferring client,
after `$rd <command>` returns, we **cannot** guarantee that the server
has already executed the command.


#### 1. "CLUSTER SLOT-STATS network-bytes-in, in-line buffer processing"

Add `$rd read` after `$rd flush` to wait for the server response,
ensuring the command is fully processed before checking slot stats.

#### 2. "MONITOR log blocked command only once"

Skip any leading `info` commands from MONITOR output before asserting
`blpop`. Since we cannot guarantee command execution order from the
deferring client's perspective, `info` commands triggered by
`wait_for_blocked_clients_count` may appear before the expected `blpop`.

Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>
…nable repl-diskless-sync (valkey-io#3051)

We should mention it in valkey.conf otherwise people may got it wrong.

Also, when the primary node receives a REPLCONF CAPA dual-channel but
repl-diskless-sync is not enabled (we will ignore the capa in this case),
we will also print a notice message so that people can be aware of it from
the logs. We believe if a user enables dual-channel-replication-enabled,
they intend to use dual channel replication.

Closes valkey-io#3044.

Signed-off-by: Binbin <binloveplay1314@qq.com>
…key-io#3075)

This is a minor cleanup, remove the useless replicas var and we
can break the loop ASAP.

Signed-off-by: Binbin <binloveplay1314@qq.com>
As we can see, it is entirely possible for the microseconds to have a
value of 0:
```
*** [err]: The microsecond part of the TIME command will not overflow in tests/unit/introspection-2.tcl
Expected '0' to be more than '0' (context: type eval line 4 cmd {assert_morethan $microseconds 0} proc ::test)
```

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: wangxiaomeng <wangxiaomeng@kylinos.cn>
…o#3079)

In single-thread mode with --duration, the benchmark could hang
indefinitely after the duration elapsed.

## Root Cause
Consider the following scenario: 

**Step 1** (T = 0.99s, duration = 1s):
- `readHandler` receives response, calls `clientDone()`
- `isBenchmarkFinished()` returns `false` (0.99s < 1s)
- `clientDone()` calls `resetClient()`

**Step 2** (T = 1.01s):
- `writeHandler` is called
- `isBenchmarkFinished()` returns `true` (1.01s >= 1s)
- `writeHandler` returns early, **no request sent, no read event
registered**

**Result**:
- `readHandler` never triggers → `clientDone()` never called → no
`aeStop()`
- `showThroughput()` checks `num_threads && isBenchmarkFinished()`,
skipping single-thread mode
- **Event loop never terminates**

Fixes valkey-io#2893, fixes valkey-io#2843

---------

Signed-off-by: yzc-yzc <96833212+yzc-yzc@users.noreply.github.com>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Simple change to `hashtableDebug` to improve readability:
* Improve indentation
* Remove "(empty)" lines

Signed-off-by: Jim Brunner <brunnerj@amazon.com>
### Overview
This PR adds support for automatic background TLS reloading, closes
valkey-io#2649
TLS validity checks and fail-fast behavior on invalid certificates are
handled separately in valkey-io#2999.
- New configuration
  - `tls-auto-reload-interval <seconds>`
  - `0` disabled (default, backward compatible)
  - `>0` check interval in seconds
- TLS materials change detection in background
  - SHA-256 fingerprint checking for certificate files
- `inode + mtime` checking for CA certificate directories and key files
- Skips reload if materials haven't changed
`tlsCheckMaterialsAndUpdateCache`
- TLS contexts reload
- CPU-intensive certificate parsing happens in dedicated BIO worker
thread `BIO_TLS_RELOAD`
  - Main thread never blocks, atomically swaps SSL contexts
- Two-phase reload: background preparation `tlsConfigureAsync` + main
thread application `tlsApplyPendingReload`


**Note**: Original TLS load and reload still remain in main thread using
`tlsConfigureSync`, including:
- Initial TLS load (server startup)
- Runtime reload via CONFIG SET


---------

Signed-off-by: Yang Zhao <zymy701@gmail.com>
…alkey-io#3082)

Currently, the weekly runs do not progress if there is a failed workflow
as github CI treats `fail-fast` to be true by default. With this change,
we continue to test all the branches even after failure.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
…alkey-io#3022)

if the hrandfield(e.g. hrandfield myhash) command without other args
does not find a valid field, it will return an uninitialized lval.

```
debug set-active-expire no
hsetex myhash ex 1 fields 2 f1 v1 f2 v2
after 1s...
hrandfield myhash
[will return some uninitialized number]
```

related: valkey-io#3021

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
The test case added in valkey-io#3042 fails on platforms without MPTCP support.

This change automatically skips the MPTCP tests on platforms without
MTPCP.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…table (valkey-io#3092)

Thanks madolson for pointing out the issue with
valkey-io#2907.

Whenever we were using the label, it was picking up the head of the
unstable instead of the PR. The reason was that we changed the target to
`pull_request_target`. We had to do that since we wanted to remove the
label after the run is completed.

With this change, it correctly picks up the commit hash of the PR and
checks it out.

To test this, I merged the changes in my forked repository's
[unstable](valkey-io/valkey@unstable...sarthakaggarwal97:valkey:unstable),
created a dummy
[PR](sarthakaggarwal97#61), and was able
to verify that the commit of the PR is being checked out in the
[action](https://github.com/sarthakaggarwal97/valkey/actions/runs/21229598167/job/61084986422?pr=61#step:3:81).

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
…y-io#3088)

Memory leak in the VM_GetCommandKeysWithFlags function when
MAX_KEYS_BUFFER is reached.
We will malloc the memory so we should always free the getKeysResult.
```
    /* Resize if necessary */
    if (numkeys > result->size) {
        if (result->keys != result->keysbuf) {
            /* We're not using a static buffer, just (re)alloc */
            result->keys = zrealloc(result->keys, numkeys * sizeof(keyReference));
        } else {
            /* We are using a static buffer, copy its contents */
            result->keys = zmalloc(numkeys * sizeof(keyReference));
            if (result->numkeys) memcpy(result->keys, result->keysbuf, result->numkeys * sizeof(keyReference));
        }
        result->size = numkeys;
    }
```

Closes valkey-io#3087.

Signed-off-by: arshidkv12 <arshidkv12@gmail.com>
Signed-off-by: Arshid <arshidkv12@gmail.com>
Signed-off-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
When using XREAD STREAMS <stream> + on an empty stream created with
MKSTREAM, valkey returns an error instead of nil.

This happens because is missing a check on the stream length.

The fix adds the length check on the condition.

Fixes valkey-io#2728

Signed-off-by: diego-ciciani01 <diego.ciciani@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
enjoy-binbin and others added 28 commits February 9, 2026 18:55
…noring certain node (valkey-io#3069)" (valkey-io#3176)

This reverts valkey-io#3069 commit dbdc3d9.

Clearly, it doesn't handle TLS-related aspects, which caused it
to fail some TLS tests. Furthermore, this check is also incorrect
under the "announce client port" setting.

There isn't much value in doing this and there is no need to extend
the helper function.

Signed-off-by: Binbin <binloveplay1314@qq.com>
valkey-io#3138)

**Fixes valkey-io#3127**

## Description
Adds a `valgrind:skip` tag to automatically skip tests when running with
`--valgrind`. This allows skipping before the server starts, rather than
checking `$::valgrind` inside tests.

## Changes
- Added tag check in `server.tcl`
- Replaced `!$::valgrind` checks with the new tag in affected tests
- Updated README documentation
- Refactored tests in `hash.tcl` by adding `valgrind:skip` tag inline
- Added `valgrind:skip` tag to a test in `incr.tcl`

## Testing
Verified tests are correctly skipped with `--valgrind` and run normally
without it.

Signed-off-by: Mangat Toor <immangat@amazon.com>
Co-authored-by: Mangat Toor <immangat@amazon.com>
…o#3171)

The problem: when running `./runtest --large-memory --tags large-memory`
the `--tags large-memory` filter would look for tests with
`large-memory` tag, it found the nested tag in the test context but
`start_server` didn't have `large-memory`, so the filtering logic had
issues. And that's why we did not notice how we introduced this failure:

```
[err]: CVE-2025-32023: Sparse HLL XZERO overflow triggers crash in tests/unit/hyperloglog.tcl
Expected an error matching '*INVALIDOBJ*' but got '1' (context: type eval line 24 cmd {assert_error {*INVALIDOBJ*} {r pfadd hll_overflow foo}} proc ::test)
```

Tags `large-memory`, `needs:other-server`, `compatible-redis`, and
`network` now restricted to top-level `start_server`. Prevents `--tags`
filtering issues when these tags appear in nested contexts. Added
validation that errors on nested usage. Refactored tests that fail new
validation. Also fixed all `lsearch` calls to use `-exact` flag to
preventing bugs like "tls" matching "tls:skip".

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Adding more client flags in the information populated by
`ValkeyModule_GetClientInfoById()`:

    VALKEYMODULE_CLIENTINFO_FLAG_PRIMARY
    VALKEYMODULE_CLIENTINFO_FLAG_REPLICA
    VALKEYMODULE_CLIENTINFO_FLAG_MONITOR
    VALKEYMODULE_CLIENTINFO_FLAG_MODULE
    VALKEYMODULE_CLIENTINFO_FLAG_AUTHENTICATED
    VALKEYMODULE_CLIENTINFO_FLAG_EVER_AUTHENTICATED
    VALKEYMODULE_CLIENTINFO_FLAG_FAKE

Closes valkey-io#2590

---------

Signed-off-by: martinrvisser <mvisser@hotmail.com>
Signed-off-by: martinrvisser <martinrvisser@users.noreply.github.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…io#3135)

There should be a way for people to monitor the amount of output buffer
data accumulating at the source node during the migration process.

This PR also do a change in slotExportTryDoPause, previously, we used 
`getClientOutputBufferMemoryUsage` to determine if the offset was sufficient
to trigger the pause, it also adding the listNode overhead. Now we use
client->bytes.

Signed-off-by: Binbin <binloveplay1314@qq.com>
…urn type (valkey-io#3184)

This fixes a regression where PFADD returned 1 instead of INVALIDOBJ for
corrupted sparse HLL payloads. In small strings refactor (valkey-io#2516) we
incorrectly changed the datatype from int to bool. That function still
uses -1 as its error signal.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
We have been seeing github actions runners being OOM when large memory
tests are run with ASan. The operation eventually is being canceled
during the test.

This change moves the large-memory tests with ASan and UBSan to separate
jobs, so we get a dedicated runner with its own timeout. We can tweak
the number of simultaneous test clients for these tests without
affecting the other test jobs.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
I think a lot of new contributors are not aware how they can run the
extended list of tests in their forked repo branch with other test
arguments.

This change could be helpful to some people.

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
Currently if we just run make test, it fails because Lua is not built
automatically. Thanks to @zuiderkwast for pointing the fix.

Make Test Passed

```

Test Summary: 5895 passed, 0 failed

\o/ All tests passed without errors!

Cleanup: may take some time... OK
```

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
If the client executing `ACL LOAD` is a user that is removed from ACL
file, it frees itself while still executing the command. Same happens
for user while removing it's channel permissions.
This causes `c->conn` pointer to become `null` and later
`serverAssert(c->conn)` fails for that client on write in
`prepareClientToWrite()`.
Solution is to flag current client to be closed after command completes
and reply is written, later this client is freed async.

Resolves valkey-io#3181

---------

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
…alkey-io#3185)

Updates to latest versions for each of the github actions used.

Pinning prevents an attack where the upstream action dependency is
compromised and the "v4" tag for example gets edited to point to a
malicious version. We already do this for most checkout actions in our
workflows.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
…#3186)

When building Valkey 9 for openSUSE Tumbleweed with RDMA enabled there are some
build failures for the i586 and armv7l arches (with `-Werror`):
```
[   45s] cc -std=c99 -pedantic -O3 -fPIC -fvisibility=hidden  -O2 -Wall -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=3 -fstack-protector-strong -funwind-tables -fasynchronous-unwind-tables -fstack-clash-protection -Werror=return-type -flto=auto -g -Wall -Wextra -pedantic -Wstrict-prototypes -Wwrite-strings -Wno-missing-field-initializers -Werror -g -ggdb  -Iinclude/valkey -I../../src/ -I../../src/ -MMD -MP -c src/rdma.c -o obj/rdma.o
[   45s] src/rdma.c: In function ‘rdmaPostRecv’:
[   45s] src/rdma.c:164:21: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
[   45s]   164 |     recv_wr.wr_id = (uint64_t)cmd;
[   45s]       |                     ^
[   45s] src/rdma.c: In function ‘rdmaSendCommand’:
[   45s] src/rdma.c:305:21: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
[   45s]   305 |     send_wr.wr_id = (uint64_t)_cmd;
[   45s]       |                     ^
[   45s] In file included from /usr/include/sys/types.h:176,
[   45s]                  from include/valkey/valkey.h:42,
[   45s]                  from include/valkey/async.h:34,
[   45s]                  from src/rdma.c:36:
[   45s] src/rdma.c: In function ‘connRdmaRegisterRx’:
[   45s] src/rdma.c:323:31: error: cast from pointer to integer of different size [-Werror=pointer-to-int-cast]
[   45s]   323 |     cmd.memory.addr = htobe64((uint64_t)ctx->recv_buf);
[   45s]       |                               ^
[   45s] src/rdma.c: In function ‘connRdmaHandleRecv’:
[   45s] src/rdma.c:341:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
[   45s]   341 |         ctx->tx_addr = (char *)be64toh(cmd->memory.addr);
[   45s]       |                        ^
[   45s] cc1: all warnings being treated as errors
[   45s] make[3]: *** [Makefile:228: obj/rdma.o] Error 1
```

Fixes valkey-io#3183

Signed-off-by: stydxm <stydxm@outlook.com>
…y-io#3034)

With this we get more detailed backtrace information, including
information about static functions. Off by default - to enable you must
enable at compile time:

    make USE_LIBBACKTRACE=yes

Signed-off-by: Rain Valentine <rsg000@gmail.com>
…alkey-io#3204)

A relatively simple cleanup in our workflows. Using actions is best
practice.

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Add a CI specifically for TLS to build and test
both built-in and module modes.

---------

Signed-off-by: Yang Zhao <zymy701@gmail.com>
Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Only call gettimeofday() periodically and add monotonic delta. This
avoids a syscall if we have a fast monotonic clock, which we have now by
default on most platforms.

`ustime()` is called once per command execution. In a busy server
running 1M RPS, `ustime()` is called every microsecond. By calling
`gettimeofday()` only once every millisecond, we save 99.9% of these
syscalls.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Replace the implementation of byte-reverse functions.

The new implementation compiles to a single bswap instruction. The old
implementation didn't.

The new implementation is small and all in the .h file. The .c file is
deleted.

On little-endian systems, this affects streams, module timers and
cluster code and a few more things, where `htonu64()` is used for
storing stream-id or timestamps in a rax in big-endian order.

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
…sue (valkey-io#1840)

It's worth mentioning that enabling `maxmemory-clients` will affect the
performance so people can perform its own benchmark before enabling it.

Signed-off-by: Binbin <binloveplay1314@qq.com>
At one point in development there was assumption that intset of database
IDs would have IDs only in `[0, server.dbnum-1]` range, but
[here](valkey-io#2309 (comment))
we decided to change upper bound to `INT32_MAX` for ACL compatibility
reasons between nodes.
Because of that change, assumption is not true anymore and we should
explicitly check each database list to contain access to full `[0,
server.dbnum-1]` range.
Also added test for that.

Signed-off-by: Daniil Kashapov <daniil.kashapov.ykt@gmail.com>
…alkey-io#3202)

We made some changes to the workflow where the label was getting removed
every time we ran it. This changes handles it, removes
`pull_request_target` and doesn't re-trigger on adding another label.

I tried it here: sarthakaggarwal97#65
The code is already merged in my unstable.

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
This PR enables `USE_LIBBACKTRACE=yes` across all CI builds and builds
upon the changes introduced in valkey-io#3034. Alpine-based jobs previously
attempted to install `libbacktrace-dev`, which does not exist in
Alpine’s apk repositories.

This caused these two errors in the daily tests below:
-
https://github.com/valkey-io/valkey/actions/runs/22045858351/job/63694456995
-
https://github.com/valkey-io/valkey/actions/runs/22045858351/job/63694457018

To resolve this, Alpine jobs now build GNU libbacktrace from source
inside the container before compiling Valkey. This aligns Alpine
behavior with other environments (Ubuntu jobs) and now avoids utilizing
non-existent Alpine packages.

An alternative approach we can consider is to disable `USE_LIBBACKTRACE`
for Alpine-based tests.

Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
)

If there is a crash between the time the job is popped and freed, we
technically leak memory. This change allows us to peek, and pop just
before we are about to free the job.

Fixes valgrind errors:
https://github.com/valkey-io/valkey/actions/runs/21969557125/job/63467641572#step:6:8648

---------

Signed-off-by: Sarthak Aggarwal <sarthagg@amazon.com>
The test somehow is slow and due to the short cluster-node-timeout, an
automatic
failover may fail to trigger due to cluster-replica-validity-factor:
```
*** [err]: Automatic failover vote is not limited by two times the node timeout - mixed failover in tests/unit/cluster/manual-failover.tcl
The third failover does not happen

xxx # Cluster state changed: fail
xxx # Cluster is currently down: At least one hash slot is not served by any available node. Please check the 'cluster-require-full-coverage' configuration.
```

Closes valkey-io#3203.

Signed-off-by: Binbin <binloveplay1314@qq.com>
in valkey-io#2807 we added safe-iterator invalidation when a hashtable is
deleted.
When we create a safeIterator we initialize it to it->table=0 and
it-index=-1.
HOWEVER say someone creates a safe iterator never progress it and later
on just delete the iterator using `hashtableReleaseIterator`.
the iterator will NOT be untracked since, `hashtableCleanupIterator`
will skip the untrack:

```c
    if (!(iter->index == -1 && iter->table == 0)) {
        if (isSafe(iter)) {
            hashtableResumeRehashing(iter->hashtable);
            untrackSafeIterator(iter);
        } else {
            assert(iter->fingerprint == hashtableFingerprint(iter->hashtable));
        }
    }
```

and then when the hashtable is freed the iterator will be referenced
leading to a use-after-free.

For example:

```c
    it = hashtableCreateIterator(ht, HASHTABLE_ITER_SAFE);
    hashtableReleaseIterator(it);
    hashtableRelease(ht);
```

This PR fix it by ALWAYS untrack safe iterators which have a valid
hashtable when they are being cleaned up.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
…ommands (valkey-io#3160)

Summary:

Prevent MODULE UNLOAD when ACL rules reference a module subcommand.
Avoids crash during ACL recompute after module removal.
Adds coverage for subcommand ACL rules.
Tests:

make test
Fixes valkey-io#2797

---------

Signed-off-by: sachinvmurthy <sachin.murthy97@gmail.com>
Signed-off-by: Sachin Venkatesha Murthy <sachin.murthy97@gmail.com>
Large-memory tests were failing with "I/O error reading reply" after the
first test `EVAL - JSON string encoding a string larger than 2GB`. Three
tests that shared a single server instance led to cascading failures
when the first test's 3GB allocation left the server in a
memory-stressed state.

This PR isolates each test by starting its own server. Additionally,
I've reduced the JSON encoding test size from 3GB to 2.25GB. This
included changing the allocated 1GB string (concatenated 3x) to a 750MB
string to avoid memory exhaustion while still testing strings greater
than 2GB like we intended.

---------

Signed-off-by: Nikhil Manglore <nmanglor@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.