Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Inability to write when reached maxmemory in FLASH mode #645

Open
jianjun126 opened this issue Apr 26, 2023 · 25 comments
Open

[BUG] Inability to write when reached maxmemory in FLASH mode #645

jianjun126 opened this issue Apr 26, 2023 · 25 comments

Comments

@jianjun126
Copy link

jianjun126 commented Apr 26, 2023

Describe the bug
Hi,
I have tested in centos7 with memtier_benchmark (v1.4.0). When the memory reaches its limit, there may be instances of inability to write or very low write performance, such as:

clipboard

After the memory limit is reached, if key timeout or starting test queries performance, the write performance will also degrade to a very low value. such as:

clipboard
clipboard

To reproduce

keydb command:
./keydb-server ./keydb.conf --storage-provider flash /data1/6333/ --storage-provider-options "use_direct_reads=true;allow_mmap_reads=false;use_direct_writes=true;allow_mmap_writes=false"
keydb config:
keydb.conf.zip

memtier_benchmark command:
taskset -c 26-29,78-84 memtier_benchmark -s 127.0.0.1 -p 6333 -t 4 -c 20 -n 2000000 --distinct-client-seed --command="set __key__ __data__ ex 66000" --key-prefix="testkey_v3_" --key-minimum=100000000 --key-maximum=999000000 -R -d 800

Expected behavior
When reaching maximum memory in FLASH mode, keydb can write normally and maintain good performance

Additional information
I have tried to modify some parameters of keydb, but the above phenomenon still exists.

@jianjun126 jianjun126 changed the title [BUG] [BUG] Inability to write when reached maxmemory in FLASH mode Apr 26, 2023
@paulmchen
Copy link
Contributor

It sounds running out of the replication buffer hard limit and triggers a fast fullsync (check your log for confirmation). Increasing the replication hardlimit in your conf will help if it is running out of the replication hardlimit.

client-output-buffer-limit replica 2gb 2gb 60

In the SSD case, you can also adjust the following parameters to a value larger than 1, to better handle large write loads.

maxmemory-eviction-tenacity 35

@jianjun126
Copy link
Author

@paulmchen Thank you very much for your suggestion. After I modify the configuration as your suggestion, there are still same issues with inability to write and low write performance. There are two screenshot of the test results.

clipboard
image

@jianjun126
Copy link
Author

@paulmchen
Hi paulmchen,
I tried to modify some other parameters, but still couldn't solve the problem. Could you please give me some more advice on this issue. Thanks for your kind attention and look forward your prompt reply.

@msotheeswaran-sc
Copy link
Collaborator

Hi @jianjun126 are the writes being rejected or hanging, it could be a similar problem to #646, however with the expireset taking up all the memory instead of the slots_to_keys map.

@paulmchen
Copy link
Contributor

@jianjun126 as it looks like you have a single master configuration (without a slave), it won't be a replication backlog issue or a fast full sync issue. It is suggested to run FlameGraph to determine where the bottleneck is causing low CPU usage.

Follow these instructions to set up and run FlameGraph to identify your system's performance issue:

  1. Place FlameGraph tool under an empty folder: (e.g. /FlameGraph).
    git clone https://github.com/brendangregg/FlameGraph # or download it from github

  2. Start your KeyDB server and run your client workload

  3. Before you see low or zero QPS, do the following:
    cd /FlameGraph
    perf record --call-graph=dwarf -p [process_id] # where process_id is the process id of the KeyDB server running

    record the perf for about 30 seconds (when seeing 0QPs or till the end of the slow cpu period if less than 1 minute), then stop the recording.

    perf script > out.perf # send result to out.perf file
    ./stackcollapse-perf.pl out.perf > out.folded
    ./flamegraph.pl out.folded > keydb-fg-result.svg

  4. It would be helpful if you could share the svg file so we can check where the CPU boundary is

@jianjun126
Copy link
Author

jianjun126 commented May 6, 2023

@paulmchen
Hi paulmchen,
I have gotten two cvg results with FlameGraph. It takes about 50 seconds to run perf each time. Here are two relevant screenshots of memtier_benchmark.
"LOW OPS"
image
"ZERO OPS"
image

"FlameGraph result while LOW OPS"
keydb-fg-slow

"FlameGraph result while ZERO OPS"
keydb-fg-0ops

@paulmchen
Copy link
Contributor

paulmchen commented May 6, 2023

According to the FG diagram, more than 73% of CPU cycles are spent performing evictions. EvictionPoolPopulate also consumes more than 55% of CPU cycles (see evict.cpp). My suspicion is that your volatile-ttl setting for maxmemory-policy may not work well with your benchmark command with ex=66000

memtier_benchmark -s 127.0.0.1 -p 6333 -t 4 -c 20 -n 2000000 --distinct-client-seed --command="set key data ex 66000" --key-prefix="testkey_v3_" --key-minimum=100000000 --key-maximum=999000000 -R -d 800

You may try the following:

  1. Run the benchmark on a separate client server. The benchmark tool is currently running on the same keydb server, and when there is heavy workload, both the client and the server consume CPU/memory resources. (bind 127.0.0.1 -::1 from the configuration is to listen for the loopback network interface. You should change it to use your private IP instead. This enables you to call the keydb server from another client server on the same subnet to run your heavy workload without competing resources between the client and the server.

  2. Try with this maxmemory_policy instead:
    maxmemory-policy allkeys-lru

  3. If you have more than 1 core on your keydb server, giving it a bit more threads to handle the workload will help a lot. For example:

    server-threads 4

@jianjun126
Copy link
Author

@paulmchen
Hi paulmchen,

My test environment is a dual-socket server, where the CPU is 8269CY, and the memory has 6 channels. And almost no other tasks are executed at the same time during the test. So, it's almost certainly not a hardware resource issue.
The other two parameters you suggested, I have also tested many times. Such as server-threads 4, server-threads 5, maxmemory-policy allkeys-lru, allkeys-lfu.

If necessary, I will retest as soon as possible with your suggested parameters and environment.

@quwu0820
Copy link

铂金8269CY,26核52线程,主频2.5G;

@jianjun126
Copy link
Author

@paulmchen @msotheeswaran

I wrote a script to test the performance of keydb when the write rate is low, but I found that even at a low write rate, there is still the problem of not being able to write.

The test method is to randomly write 5000-8000 times per second for 50-70 seconds within 300 seconds; and write 250 times per second for the rest of the 300 seconds, and the amount of data written each time is 8000 bytes.

During the test, I adjusted the maxmemory, whether it is 1GB, 4GB, or 24GB, this problem will occur. The timing of the occurrence is approximately 1.5 hours or 3.5 hours of the test duration.

@paulmchen
Copy link
Contributor

That is a bug, the following commit addresses the 0 and low QPS issue. 0 QPS is caused by eviction right after the maxmemory is reached. However, it may have a side effect on the code which could cause issues after reaching maxmemory, and memory usage may continue to grow. @JohnSully, John, could the following commits be added to main as well, are there any side effects? For example, the memory will keep growing without effectively being evicted on time?

#439

@jianjun126 you can try this commit and see if it helps.

@jianjun126
Copy link
Author

jianjun126 commented May 20, 2023

@paulmchen
Hi paulmchen,
I tried the code modified in #439 with memtier_benchmark, it does addresses the 0 OPS issue, but there was still low OPS issue.
At the same time, as guessed, the memory keep growing without effectively being evicted.

clipboard
clipboard

@paulmchen
Copy link
Contributor

@JohnSully @msotheeswaran John and Malavan, it seems to be a pretty serious problem, i am able to reproduce it as well. Note: with the fixes in #439 , Zero QPS problem is gone, however after reaching the maximum memory, the memory continues to grow. . Could it be something related to the GC not taking effect?

@msotheeswaran-sc
Copy link
Collaborator

msotheeswaran-sc commented May 24, 2023

I believe it is actually from the expireset, currently when a key is evicted to storage the expire is still in memory in the expireset. There is no mechanism to expire keys in the storage provider, so without keeping the entry in the expire set the key will stay in rocksdb without being expired until it is accessed again. I am working on a bigger change to add support for expiring from rocksdb in the meantime you can try this commit: 6eb595d but I have not tested it.

Edit: There was a mistake so you will also need this commit: 6a32023

@hengku
Copy link
Contributor

hengku commented May 24, 2023

Hi @jianjun126 Not sure if the following (suggested by @JohnSully) can help your case, I tried and it reduced occurrences of 0 qps a lot, at least from my testing environment.

  1. still use the original code without applying the fix in Enable eviction tenacity feature for storage providers #439
  2. set maxmemory-samples to 5 in keydb.conf. Its default value is 16 in the code if you don't specify it in the conf
  3. use/set maxmemory-eviction-tenacity as 10 as its default value

@jianjun126
Copy link
Author

jianjun126 commented May 25, 2023

@msotheeswaran Hi Malavan, I tried the code modified in 6a32023, with memtier_benchmark, it alse does addresses the 0 OPS issue and avoids memory growth, but there was still low OPS issue. When the test program first started running, the write rate could reach 40,000 OPS, but after two hours, it was only 1000-2000OPS.
clipboard
clipboard

@jianjun126
Copy link
Author

jianjun126 commented May 25, 2023

@hengku Hi hengku, Thanks very much for your attention and suggestions. I tried the parameters you suggested with keydb v6.3.3. However, during the test, there were still 0OPS issues which lasted for 235 seconds.

clipboard

clipboard

Here is the config file and test command.
keydb.zip
memtier_benchmark -s 127.0.0.1 -p 6600 -t 4 -c 20 -n 1000000 --distinct-client-seed --command="set __key__ __data__ ex 66000" --key-prefix="testkey_v1_" --key-minimum=100000000 --key-maximum=999000000 -R -d 800

From my test results, this seems to be quite different from yours. Could you help me to check the config file or share your testing process?

@hengku
Copy link
Contributor

hengku commented May 25, 2023

oh, I am using 6.2.1 version with some in-house code changes. I also observed 0 qps issue recently and it seems fixed by setting those 2 parameters. Below is my conf and memtier command:

port 6379
bind 192.168.0.2
protected-mode no
daemonize no
timeout 0
server-threads 3
maxmemory 1gb
storage-provider flash /test
maxmemory-policy allkeys-lru
maxmemory-samples 5
maxmemory-eviction-tenacity 10
save ""
appendonly no
min-clients-per-thread 0

memtier_benchmark -s 192.168.0.2 -p 6379 -t 10 -c 100 -n 10000000000 -d 256 --key-minimum=1 --key-maximum=10000000000 --ratio 1:0 --key-pattern=P:P

@hengku
Copy link
Contributor

hengku commented May 26, 2023

@jianjun126 I tried another way using the same testing environment and memtier command above, which I didn't observe 0/ very low qps or continuous growing used memory. Not sure if you still want to take a try and to see if that works for your case.

  1. apply fix Enable eviction tenacity feature for storage providers #439 on top of your original code
  2. set maxmemory-eviction-tenacity to 35
  3. regarding maxmemory-samples, either 5 or 16 was working for my case

@jianjun126
Copy link
Author

@hengku Hi hengku, thanks again for your share and suggestions. I tried the parameters and command you suggested. The test results are summarized below.
a、keydb v6.3.3 + your conf + your memtier command:
----There still was 0 OPS issue, and it dropped to less than 1000 OPS after 360 seconds, and the memory does not grow.
b、keydb v6.3.3 + codes in #439 + your conf + your memtier command:
----I have test for twice, the memory kept growing, and reached to 240GB afer about 2 hours.
c、keydb v6.3.3 + codes in #439 + your conf + your memtier command(changed the params "-d 256" to "-d 800"):
----I have test for three times, there was some OOM issue as it reached the maxmemory, and then it dropped to less than 100 OPS. Not sure if the memory would keep growing, because I stopped the testing as the wrote speed was so low.
d、keydb v6.3.3 + codes in #439 + your conf + my memtier command:
----I have test for twice, there were still low OPS issues which maybe last for some seconds, and the memory kept growing.
e、v6.2.1:
----I don't have a license for the keydb pro, so I can't test it.

@msotheeswaran-sc
Copy link
Collaborator

@msotheeswaran Hi Malavan, I tried the code modified in 6a32023, with memtier_benchmark, it alse does addresses the 0 OPS issue and avoids memory growth, but there was still low OPS issue. When the test program first started running, the write rate could reach 40,000 OPS, but after two hours, it was only 1000-2000OPS. clipboard clipboard

@jianjun126 what was the memtier command you used for this? Eventually in memory will be full, and all new keys will require evicting existing keys to FLASH first, which would result in much lower QPS.

@jianjun126
Copy link
Author

@msotheeswaran Hi Malavan
The memtier command is:
memtier_benchmark -s 127.0.0.1 -p 6600 -t 4 -c 20 -n 1000000 --distinct-client-seed --command="set __key__ __data__ ex 66000" --key-prefix="testkey_v1_" --key-minimum=100000000 --key-maximum=999000000 -R -d 800

My application scenario is the same as what you said. Memory is always full. New data is continuously written to keydb under a high OPS, the existing data needs to be continuously evicted, and the older data is deleted from disk after some time.

Could you give some suggestions for this application scenario?

@JohnSully
Copy link
Collaborator

@jianjun126 How come you disable mmap? This results in large buffers getting created and free'd which can exacerbate this problem

@jianjun126
Copy link
Author

@JohnSully we want to use direct I/O, so mmap cannot be enabled.

image

@jianjun126
Copy link
Author

jianjun126 commented Nov 20, 2023

@msotheeswaran-sc Through the release note, I found that this issue seems to have been fixed. Therefore, I tested using the same configuration as before (maxmemory was modified to 8GB). Through testing, I found that the performance of the new version has indeed improved, but there are still similar issues as before.
image1

If the test data len is changed from "- d 800" to "- d 8", it will also solve the 0 OPS problem.
img2

If maxstorage is configured, there will be a large number of OOMs.
OOM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants