New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task Query with Authorization and Identity link are slow #2217
Comments
This comment was imported from JIRA and written by user @tmetzke Copied from CAM-14092: Hello Tobias, Thank you for reply.
SELECT COUNT(1) FROM ACTREPROCDEF; --> 3; ps: As you can see tables are small in row numbers.
<explain-in-strategy.html> 120,097 sec / Lost connection... <explain-or-strategy.html> 29,106 sec / 0,0000069 sec <explain-multiples-joins-strategy.html> 0,779 sec / 0,0000050 sec Please let me know if you need something more.
Regards, Denis. |
I have the same problem in task table with 14k records. The duration of calculating the count/list request is about 1 minute and 30 seconds. SELECT COUNT(DISTINCT RES.ID_ )
FROM ACT_RU_TASK RES
LEFT JOIN ACT_RU_IDENTITYLINK I ON I.TASK_ID_ = RES.ID_
LEFT JOIN ACT_RE_PROCDEF D ON RES.PROC_DEF_ID_ = D.ID_
INNER JOIN (
SELECT A.*
FROM ACT_RU_AUTHORIZATION A
WHERE A.TYPE_ < 2
AND (
A.USER_ID_ IN ( 'user1', '*')
OR A.GROUP_ID_ IN ( 'G1', 'G2', 'G3', 'G4', 'G5', 'G6' )
)
AND (A.RESOURCE_TYPE_ = 7 AND A.PERMS_ & 2 = 2 OR A.RESOURCE_TYPE_ = 6 AND A.PERMS_ &64 = 64)
) AUTH ON AUTH.RESOURCE_ID_ IN ( RES.ID_ , D.KEY_ , '*' )
WHERE RES.SUSPENSION_STATE_ = 1
AND AUTH.RESOURCE_ID_ IS NOT NULL; In general, when combining tables, if possible, it is better to move SELECT COUNT(DISTINCT RES.ID_ )
FROM ACT_RU_TASK RES
LEFT JOIN ACT_RE_PROCDEF D ON RES.PROC_DEF_ID_ = D.ID_
INNER JOIN (
SELECT A.*
FROM ACT_RU_AUTHORIZATION A
WHERE A.TYPE_ < 2
AND (
A.USER_ID_ IN ( 'user1', '*')
OR A.GROUP_ID_ IN ( 'G1', 'G2', 'G3', 'G4', 'G5', 'G6' )
)
AND (A.RESOURCE_TYPE_ = 7 AND A.PERMS_ & 2 = 2 OR A.RESOURCE_TYPE_ = 6 AND A.PERMS_ &64 = 64)
) AUTH ON AUTH.RESOURCE_ID_ IN ( RES.ID_ , D.KEY_ , '*' )
LEFT JOIN ACT_RU_IDENTITYLINK I ON I.TASK_ID_ = RES.ID_
WHERE RES.SUSPENSION_STATE_ = 1
AND AUTH.RESOURCE_ID_ IS NOT NULL; If you agree with this idea, I will prepare PR. |
Thanks for sharing this idea. I haven't heard of it yet, but it can well be that this improves performance. Is there any reference that contributes your statement that the join order matters besides the observed execution time? If you have, it would also be nice to see query plans for the original and the refactored query. Cheers, |
The main idea is based on the number of final results per join. The lowest number of results is selected in
|
Thanks for sharing that. I'm a little surprised that the query planner isn't able to rewrite the order of joins to the most optimal one, but I guess it just cannot tell that it's irrelevant if we get less output rows after all the joins are applied. Feel free to create a pull request for that change. We should create a separate ticket for this optimiziation, too. The description here is about indexes on MySQL not being applied and it's unlikely the join order changes this. Btw did you modify the query from your previous comment, such as removing predicates? I'm asking because the join with |
Sorry, I made the change in the source code and tested it on MS SQL Server, but apparently two predicates were not sent when writing the comment!
SELECT COUNT(DISTINCT RES.ID_ )
FROM ACT_RU_TASK RES
LEFT JOIN ACT_RU_IDENTITYLINK I ON I.TASK_ID_ = RES.ID_
LEFT JOIN ACT_RE_PROCDEF D ON RES.PROC_DEF_ID_ = D.ID_
INNER JOIN (
SELECT A.*
FROM ACT_RU_AUTHORIZATION A
WHERE A.TYPE_ < 2
AND (
A.USER_ID_ IN ( 'user1', '*')
OR A.GROUP_ID_ IN ( 'G1', 'G2', 'G3', 'G4', 'G5', 'G6' )
)
AND (A.RESOURCE_TYPE_ = 7 AND A.PERMS_ & 2 = 2 OR A.RESOURCE_TYPE_ = 6 AND A.PERMS_ &64 = 64)
) AUTH ON AUTH.RESOURCE_ID_ IN ( RES.ID_ , D.KEY_ , '*' )
WHERE I.TYPE_ = 'candidate'
AND I.GROUP_ID_ IN (N'G1')
AND RES.SUSPENSION_STATE_ = 1
AND AUTH.RESOURCE_ID_ IS NOT NULL;
SELECT COUNT(DISTINCT RES.ID_ )
FROM ACT_RU_TASK RES
LEFT JOIN ACT_RE_PROCDEF D ON RES.PROC_DEF_ID_ = D.ID_
INNER JOIN (
SELECT A.*
FROM ACT_RU_AUTHORIZATION A
WHERE A.TYPE_ < 2
AND (
A.USER_ID_ IN ( 'user1', '*')
OR A.GROUP_ID_ IN ( 'G1', 'G2', 'G3', 'G4', 'G5', 'G6' )
)
AND (A.RESOURCE_TYPE_ = 7 AND A.PERMS_ & 2 = 2 OR A.RESOURCE_TYPE_ = 6 AND A.PERMS_ &64 = 64)
) AUTH ON AUTH.RESOURCE_ID_ IN ( RES.ID_ , D.KEY_ , '*' )
LEFT JOIN ACT_RU_IDENTITYLINK I ON I.TASK_ID_ = RES.ID_
WHERE I.TYPE_ = 'candidate'
AND I.GROUP_ID_ IN (N'G1')
AND RES.SUSPENSION_STATE_ = 1
AND AUTH.RESOURCE_ID_ IS NOT NULL; |
Not sure I understand the query plans. Both plans begin (at the right hand side) with an inner join (second join to the right) between If you could export the plans as |
Files are attached. I don't think I left anything else :) Do you get another |
Hey @mojtaba-khallash, Sorry for the delay here, I was pretty busy the last two weeks. Thanks for sharing the plans, I had a first look at them. Before I move ahead: Did you take generate these plans against a database that has a production-like amount of data? The reason I am asking is that the stats seem off. For example, the clustered index scan on |
No I did not have access to the original data and reported on the test database. The following files are executed on the original data. Of course, the data has changed since the first comment was posted, but the improvement is still visible. |
Thanks for sharing the actual plans, I had a closer look now. I can see that neither of the plans use an index to access the authorization table for this query and in both queries the authorization check is the expensive operation (nested loop join (node id 6) in the original query plan; nested loops join (node id 7) and key lookup (node id 28) in the refactored query plan). I believe that two factors contribute to the costs: They use table scans on ACT_RU_AUTHORIZATION (or key lookups for a very large number of rows) and they use nested loops joins (instead of e.g. more efficient merge joins). The improvement you report is okay, but not really an order of magnitude better. 90 secs vs 30 secs is not a drastic improvement, of course it's noticable in real life, but it doesn't seem like solving the root problem that we see here. In that sense, I'm rather reluctant to rewrite the query like proposed and would rather look in other directions that may improve the query more. Picking up on the analysis from the first paragraph, there are two ways:
In the engine, we already have two indexes:
Neither of them is used. I would guess because the predicate is on both USER_ID_ and GROUP_ID_ and is an OR condition. Maybe one index that uses both fields would work better. I recommend to look into this with a DBA, since this depends on more factors than what we can see in the query plan (e.g. table cardinalities and data distribution) and I cannot claim I am a SQL server expert. |
This issue was imported from JIRA:
What is this name?
This pseudonym name was generated based on the user name in JIRA to protect the personal data of our JIRA users. You can use this identifier to search for issues by the same reporter.The task query in combination with authorization and identity link are slow on MySQL.
To reproduce start 5000 process instances of the invoice process.
With authorization the query returns a result after 30s, without authorization after 1 s.
On 7.4 with the old authorization query it runs as well much faster.
Problem is that on this combination the existing indices are not used by MySQL.
Hint:
Since we test in the performance test suite the task query with authorization and 50k task.
I assume that the combination with the identity link table is the problem.
Links:
The text was updated successfully, but these errors were encountered: