New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SELECT query sometimes taking a lot of time in 3 node cluster #25151
Comments
What kind of disks are you running on? This message
means that RocksDB is taking a long time to sync to disk. Perhaps the IO load is too high for your disk configuration. You can use tools like |
To verify this again I created a new set-up again with 5 node cluster. I am not seeing the high IO log message now that you are referring to. But the query is still slow.
Number of rows in table.
The query is still slow. @jordanlewis I can share with you the debug logs ( |
@jordanlewis The nodes are of |
Thanks. You can email me, first name @ cockroachlabs.com. The lines
are rather suspicious. It had to wait 5 seconds for overlapping requests before executing. After the wait, it finished very quickly. What other load is on the system while you're doing this? cc @nvanbenschoten anything else we can infer about what's going on from here? |
@jordanlewis I have emailed the logs with Subject - Cockroach Debug Log. Let me know if you have received it. |
Yep, got it. Thanks. What other queries are you running on the system when you experience these slow selects? Are the selects still slow without other load on the system? |
@jordanlewis There are other queries executing on the system. Is there a way I can get the snapshots of the queries that are being executed on the cockroach DB then I can share those across? |
@jordanlewis The load on the system |
@jordanlewis - I stopped all sql queries to the cluster as you can see from the below image The select is still slow
The number of rows in the table
|
@jordanlewis @nvanbenschoten - Let me know if you need any more info from my side and is there a way I can improve the performance of the cluster? |
@debraj-manna do you see the same slowness if you narrow down your select to a single row? I.e. you run the slow query and then run it again with a predicate that explicitly only selects the first row. Is it still slow in that case, every time? |
@tschottdorf - Yes
|
@nvanbenschoten - Did you observe any misconfiguration of the cluster on my part? |
@debraj-manna could you provide the output of If that's the case, we can take a look at tracing. To do this, first run |
To help us get a bit more information, could you also provide a screenshot of the page |
@nvanbenschoten - I have enabled all the SQL queries again. So all these reports are with other SQL queries running. Please find attached the output of On opening Below is the output on clicking |
@nvanbenschoten - I have stopped all queries again. So all below reports for node 2 are with no sql queries coming on cluster.
-no-requests.pdf Could you also explain what does the |
@debraj-manna how large are these |
Also, what is the workload running on this |
@nvanbenschoten - The value are not that huge.
Yes it is having inserts, updates along with selects |
@nvanbenschoten -Below is the current load on the 5 node cluster. About 30% of these queries are on config_str table. |
Are these inserts and updates also taking multiple seconds? Also, have there been any recent logs on node 2 with the string |
@nvanbenschoten No I am not seeing any recent logs with
Insert seems to be failing in node 2 tried doing a random insert
|
@nvanbenschoten - I have emailed you the debug logs at |
@debraj-manna is this forum post written by someone working on the same database as you: https://forum.cockroachlabs.com/t/trying-to-split-same-key-range/1592/31? I ask because I see that your |
Yes it is me. Ok I will try to increase this further and get back to you.
…On Sat 5 May, 2018, 1:50 AM Nathan VanBenschoten, ***@***.***> wrote:
@debraj-manna <https://github.com/debraj-manna> is this forum post
written by someone working on the same database as you:
https://forum.cockroachlabs.com/t/trying-to-split-same-key-range/1592/31?
I ask because I see that your kv.raft.command.max_size setting is set to
the same value as what we suggested in that post, and increasing that value
even further (SET CLUSTER SETTING kv.raft.command.max_size='200000000';)
should help with this issue as well. I suspect that the root cause of this
performance problem is related to #25233
<#25233>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#25151 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHw8JFr0-HxLgkHQz-QJYk1zf99g1LtSks5tvLgBgaJpZM4Trr6f>
.
|
@nvanbenschoten - Do I have to restart the cluster after increasing the command size ? |
@debraj-manna, no you don't need to. However, it's propogated through a gossip protocol though, so it might take a few seconds to take effect. |
@nvanbenschoten - The performance seems to be improving. I will monitor this further and post back. But one query over the time the size of the index will grow so do I have to increase the command size further over time ? |
You shouldn't need to, but we've already seen this twice with your cluster so I'm not confident that it won't come back up again. To clarify, this is is only a problem with rare internal commands that are performing range splits, and it's something we've only seen in your cluster. We'll continue to monitor this issue in #25233, so I'll close this thread. Please re-open if you see performance degrade again. |
I am not able to reopen it. So commented on #25233 . |
Crossposting from forum
I am having a 3 node cockroach cluster in a dev set-up.
On a table like below
Sometimes the query like below is taking more than 20 seconds.
The output pprof & cockroach logs from each node is attached.
Load on cluster is about 600 reads per second and about 150 inserts per sec.
Number of rows in table.
In my log I am seeing the below trace of warning sometimes
Environment
Cockroach Nodes were started like below
Some sample data
cockroach_logs.zip
The text was updated successfully, but these errors were encountered: