latency spike and wave pattern in disk usage - v19.2.4 #45557

ghost · 2020-03-01T11:13:04Z

Describe the problem

Please describe the issue you observed, and any steps we can take to reproduce it:
latency problem; production system.

To Reproduce

What did you do? Describe in your own words.
added a whole lot of BLOBs (~1TB; 50 mil. rows)

Additional data / screenshots
1 week graph

Environment:

single node
started with v19.2.2 and when noticing the problem, updated to v19.2.4
Linux 4.15.0-74-generic 84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Jira issue: CRDB-8005

petermattis · 2020-03-02T02:23:04Z

The wave pattern in disk usage is really strange. Even with RocksDB compactions we never see fluctuations that dramatic. Can you share the RocksDB SSTables and RocksDB Flushes/Compactions graphs (on the Storage dashboard)?

What did you do? Describe in your own words.
added a whole lot of BLOBs (~1TB; 50 mil. rows)

It would be helpful for you to precisely describe what your workload is doing, the types of queries, the size of the cluster, including the number of nodes and the types of machines. Even better if you can share source code from the workload program.

ghost · 2020-03-02T02:44:59Z

There are two drops, these are caused by me restarting the node.
The reason its now less violent is (I guess), because I set gc.ttlseconds = 1000000.

It would be helpful for you to precisely describe what your workload is doing, the types of queries, the size of the cluster, including the number of nodes and the types of machines. Even better if you can share source code from the workload program.

Single node; 48 cores; 512 GB ram; 6 disks NVMe RAID10.
As said, this is a production system, not a stress test system.
All types of queries, with an avg. of 2-3k q/s

ghost · 2020-03-07T08:21:20Z

I yesterday ran into that problem again (latency spikes) and talked with @dt about how to change the soft and hard limit for compaction bytes and now that I think of it, I dropped about a week before the problems started a column in a table that was maybe 400-500 GB-ish JSONB
So maybe this is related to #24029 and #26693

cockroach-rocksdb.db.root.2020-03-07T07_37_58Z.043406.log

mwang1026 · 2022-03-02T20:21:27Z

19.2 has been EOL for a while. If you repro this in a higher version feel free to reopen

petermattis added this to Incoming in Storage via automation Mar 2, 2020

petermattis moved this from Incoming to To Do (future milestone) in Storage Mar 23, 2020

petermattis moved this from To Do (future milestone) to To Do (investigations) in Storage Apr 30, 2020

jlinder added the T-storage Storage Team label Jun 16, 2021

mwang1026 closed this as completed Mar 2, 2022

Storage automation moved this from To Do (investigations) to Done Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latency spike and wave pattern in disk usage - v19.2.4 #45557

latency spike and wave pattern in disk usage - v19.2.4 #45557

ghost commented Mar 1, 2020 •

edited by cockroach-jira-scripts

petermattis commented Mar 2, 2020

ghost commented Mar 2, 2020

ghost commented Mar 7, 2020 •

edited by ghost

mwang1026 commented Mar 2, 2022

latency spike and wave pattern in disk usage - v19.2.4 #45557

latency spike and wave pattern in disk usage - v19.2.4 #45557

Comments

ghost commented Mar 1, 2020 • edited by cockroach-jira-scripts

petermattis commented Mar 2, 2020

ghost commented Mar 2, 2020

ghost commented Mar 7, 2020 • edited by ghost

mwang1026 commented Mar 2, 2022

ghost commented Mar 1, 2020 •

edited by cockroach-jira-scripts

ghost commented Mar 7, 2020 •

edited by ghost