New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extremely slow INSERT INTO <table> SELECT #7253
Comments
|
Two questions. Can you show the MAL plan? Is the insert also slow on version 11.41? |
|
Inserting data into a table nowadays does a sort of "auto vacuum". Unused slots (due to previous deletes) may get reused for new data. That is why we don't simply do an append, but do an update instead. The update may well end up appending the data if no slots were available. |
|
Here's the trace (75 minutes this time): |
|
@sjoerdmullender Indeed many rows were deleted from the receiving table. @PedroTadim at the moment I don't have a 11.41 deployment where to test this and it takes hours to get to the state where this happens. So unless really necessary I would avoid that. |
|
It looks like some 134000 rows were added. How big is the receiving table? |
|
The receiving table is |
|
While in the debugger, can you also print |
|
This operation was in a transaction. To test again and check what you asked, I rolled back and restarted the same transaction. Is it perhaps because the receiving bat had had its gaps filled by the previous attempt? Even though that transaction was rolled back? |
|
Is |
|
Also add |
|
This happened yesterday already and I hoped it was a glitch. But it seems to happen consistently after this When it happened yesterday I let it go the whole night, but it never completed. Can it be related? |
|
Ouch! The problem above (stuck at startup) doesn't happen after the I had taken a tar of the database in the state it was just before the So I need to recreate the db from scratch, it will take a couple of hours :/ |
|
Did you tar a running database? If so, you might want to consider the |
|
Yes, I did it with the hot snapshot. |
|
One of the things I wanted to know is how many duplicates there are, which translates into an estimate of the hash chain lengths. But I realized I might be able to get this information from the already provided information. |
|
I will be able to be back in gdb in a few minutes I think. In the meantime, I am also worried about the startup issue, which does not depend on the insert. |
|
Yes, doesn't sound too good. |
|
Yes, I'll make a separate ticket for the other issue. Here it is: |
|
Possibly related (and reproducible): #7254 |
|
@sjoerdmullender is this information enough? I needed to rollback the deployment where I observed this, but I can reproduce it if we need more info. |
|
Can you run the slow query (and perhaps also the fast one) after the query |
|
Result of Then the same, after a rollback: |
|
I'm trying to replicate the insert/replace issue in a reproducible script, but no luck so far. The script should more or less be like:
The debug output above would suggest that the occurs when a persistent hash is available on the receiving bat. |
|
The hash should be persistent automatically, but only if the table on which it is created is already persistent. And there probably lies the rub. It may well be that you have to restart the server inbetween, i.e. after creating and filling the table, and before the point query. |
|
This is the script I have so far (still not triggering the issue):
I have tried to stop ad restart the server in between each of these steps, the hash table never seems to be persistent (I can see the hash with |
|
|
The difference between the queries is that the count(*) splits the table in mitosis pieces, whereas the straight select doesn't. Why the difference, I don't know. |
|
Just to be sure: how much should I trust what If I do what you said, after the point query I can see a hash in In any case, the 3 steps above + deletion + insert still do not trigger the issue unfortunately. |
|
I didn't look at sys.storage(), I looked at the file system. After my set of queries, there were two large hash related files for a bat (*.thashl and *.tashb) which stay around. |
|
Indeed, I can confirm that the hashes are persistent. But apparently something is still missing to trigger the issue. |
|
However, now that the issue slow start issue is solved, I can bootstrap from the tar backup of the original db, so I can have the issue under gdb in a few minutes. Please let me know if there is something I can still check. |
|
A remote gdb session would probably be fruitful. But let's not do that today, it's getting a bit late. |
|
No, of course. I am actually right now trying a different approach. |
|
I now know what causes the problem. You have a lot of duplicate values in the column of which a whole bunch were deleted. Now that you're trying to insert new values, those slots get reused. It is at this point that the old value is deleted from the hash table. In order to find the value in the chain, we have to go through a long list because of all the duplicates. This apparently happens for a lot of slots, so that is taking a lot of time. If you're adding new values that also already occur many times, we will need to go through the list again to find the proper place to add a link in the chain (the chains are ordered). I just checked, the value that had been deleted and is now being replaced is 23607200. There are still 15643023 occurrences of that value in the table, not counting the already deleted ones. That's a very long chain to have to go through, so perhaps this is a possible trigger to delete the hash. |
…ove hash. This fixes bug #7253.
|
I pushed a fix. The fix involves throwing away the hash table in certain conditions (more than 1000 links in the collission/duplicates list). This value of 1000 is tunable using the |
|
Yep, seems fixed indeed, thanks! |
Describe the bug
An
INSERT INTO <table> SELECT ..statement that takes less than 1 second in 11.39 takes about 45 minutes in 11.43.I could not yet make a reproducible script for the issue. I see it happening regularly on a production ETL.
The query is
where:
What
gdbsays:This is
b:This is
n:This is
p:It seems that the time is spent appending the new values one by one, calling
HASHdelete_locked()andHASHinsert_locked()every time.I don't really understand why this operation doesn't just result in a simple
BATappend().What in the original query should trigger a row-at-a-time update?
Software versions
The text was updated successfully, but these errors were encountered: