revisit the subscription finishing query #4865

ticoann · 2013-12-02T22:06:45Z

https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMBS/MySQL/Subscriptions/GetAndMarkNewFinishedSubscriptions.py
Not sure how can it be improved or what other things need to be changed.
There are a couple of problems on this query.

performance. It takes ~4hours to finish the query sometimes.
hard to know where the condition fails. (Whether is is not finished because of child subscription is not done or current subscription is not finished).

Will it be helpful to break down the query?

Dirk could you look at this if you have a chance? If you don't have time, could you give me some hint to improve this.

hufnagel · 2013-12-02T23:40:49Z

It would be good to make a snapshot of a database when it is in a state where the query takes a very long time. I can take a look at things, but without being able to test various different query optimizations and look at the execution plans and trace the queries, it's hard to make any improvements.

If this is encountered in an active agent, just shutdown the agent and let me know. I can then open a ticket with Oracle support to make a copy of the db and we can restart the agent after. It will mean at least a few hours downtime though.

ticoann · 2013-12-03T04:22:47Z

We probably need to do that. The one I was looking at (vocms201) currently takes more than ~23h and going for this query to finish. I have to ask ops whether we can shutdown the agent completely for a few hours. So I guess it is not possible to copy db while agent is running. Anyway, I will talk to ops tomorrow.

hufnagel · 2013-12-03T08:47:54Z

I think at the very least one has to lock the db while the copy is running. Otherwise the copied information would not be internally consistent, because you would allow changes in some already copied tables while other not yet copied tables haven't been copied yet and that would mean the target database might have inconsistent information (FK relationships missing etc). If you can let me know the database, I can open a ticket with CERN Oracle Support and ask, they might have a way (blocking the application for a while might be preferable to shutting the agent down outright).

hufnagel · 2013-12-03T09:10:39Z

Btw, I have been observing this from the other end. On the weekly CMSR Oracle production use statistics reports send by the CERN Oracle team, the WMAgent production instances have been constantly among the highest load of all of the CMS production Oracle accounts. And we have 6 of them ! This is just an example from the other end that contributes to that, I am sure there are others. We need to do a schema review and look at how we do queries and make sure needed indexes are there and what the most busiest queries are etc etc. Otherwise we will run against a wall at some point (would say queries taking 24 hours already is hitting a wall).

ericvaandering · 2013-12-03T14:10:37Z

Yes, this is on the work plan. Probably Feb or March is when it will start.

On Dec 3, 2013, at 3:10, Dirk Hufnagel notifications@github.com wrote:

Btw, I have been observing this from the other end. On the weekly CMSR Oracle production use statistics reports send by the CERN Oracle team, the WMAgent production instances have been constantly among the highest load of all of the CMS production Oracle accounts. And we have 6 of them ! This is just an example from the other end that contributes to that, I am sure there are others. We need to do a schema review and look at how we do queries and make sure needed indexes are there and what the most busiest queries are etc etc. Otherwise we will run against a wall at some point (would say queries taking 24 hours already is hitting a wall).

—
Reply to this email directly or view it on GitHub.

ticoann · 2014-02-06T17:26:25Z

            SELECT complete_subscription.id
            FROM ( SELECT wmbs_subscription.id,
                          wmbs_subscription.fileset,
                          wmbs_workflow.name
                   FROM wmbs_subscription
                       INNER JOIN wmbs_fileset ON
                           wmbs_fileset.id = wmbs_subscription.fileset AND
                           wmbs_fileset.open = 0
                       INNER JOIN wmbs_workflow ON
                           wmbs_workflow.id = wmbs_subscription.workflow AND
                           wmbs_workflow.injected = 1
                       LEFT OUTER JOIN wmbs_sub_files_available ON
                           wmbs_sub_files_available.subscription = wmbs_subscription.id
                       LEFT OUTER JOIN wmbs_sub_files_acquired ON
                           wmbs_sub_files_acquired.subscription = wmbs_subscription.id
                       LEFT OUTER JOIN wmbs_jobgroup ON
                           wmbs_jobgroup.subscription = wmbs_subscription.id
                       LEFT OUTER JOIN wmbs_job ON
                           wmbs_job.jobgroup = wmbs_jobgroup.id AND
                           wmbs_job.state_time > 1386074466 AND
                           wmbs_job.state != 4
                   WHERE wmbs_subscription.finished = 0
                   GROUP BY wmbs_subscription.id,
                            wmbs_subscription.fileset,
                            wmbs_workflow.name
                   HAVING COUNT(wmbs_sub_files_available.subscription) = 0
                   AND COUNT(wmbs_sub_files_acquired.subscription) = 0
                   AND COUNT(wmbs_job.id) = 0 ) complete_subscription
                INNER JOIN wmbs_fileset ON
                    wmbs_fileset.id = complete_subscription.fileset
                LEFT OUTER JOIN wmbs_fileset_files ON
                    wmbs_fileset_files.fileset = wmbs_fileset.id
                LEFT OUTER JOIN wmbs_file_parent ON
                    wmbs_file_parent.parent = wmbs_fileset_files.fileid
                LEFT OUTER JOIN wmbs_fileset_files child_fileset ON
                    child_fileset.fileid = wmbs_file_parent.child
                LEFT OUTER JOIN wmbs_subscription child_subscription ON
                    child_subscription.fileset = child_fileset.fileset AND
                    child_subscription.finished = 0
                LEFT OUTER JOIN wmbs_workflow child_workflow ON
                    child_subscription.workflow = child_workflow.id AND
                    child_workflow.name != complete_subscription.name
            GROUP BY complete_subscription.id
            HAVING COUNT(child_workflow.name) = 0

1097 rows selected.

Execution Plan

Plan hash value: 1923316344

| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |

| 0 | SELECT STATEMENT | | 1 | 197 | | 557K (94)| 00:00:39 |
|* 1 | FILTER | | | | | | |
| 2 | HASH GROUP BY | | 1 | 197 | | 557K (94)| 00:00:39 |
|* 3 | HASH JOIN RIGHT OUTER | | 10976 | 2111K| | 557K (94)| 00:00:39 |
| 4 | TABLE ACCESS FULL | WMBS_WORKFLOW | 915 | 56730 | | 15 (0)| 00:00:01 |
| 5 | VIEW | | 10976 | 1447K| | 557K (94)| 00:00:39 |
|* 6 | HASH JOIN RIGHT OUTER | | 10976 | 2208K| | 557K (94)| 00:00:39 |
|* 7 | TABLE ACCESS FULL | WMBS_SUBSCRIPTION | 1806 | 21672 | | 11 (0)| 00:00:01 |
|* 8 | HASH JOIN OUTER | | 10976 | 2079K| | 557K (94)| 00:00:39 |
|* 9 | HASH JOIN OUTER | | 5590 | 998K| | 552K (95)| 00:00:38 |
| 10 | NESTED LOOPS OUTER | | 2748 | 458K| | 549K (95)| 00:00:38 |
| 11 | NESTED LOOPS | | 53 | 8480 | | 548K (95)| 00:00:38 |
| 12 | VIEW | | 53 | 8215 | | 548K (95)| 00:00:38 |
|* 13 | FILTER | | | | | | |
| 14 | HASH GROUP BY | | 53 | 6678 | | 548K (95)| 00:00:38 |
|* 15 | HASH JOIN RIGHT OUTER | | 2108M| 247G| | 95679 (71)| 00:00:07 |
|* 16 | TABLE ACCESS FULL | WMBS_JOB | 9697 | 198K| | 25277 (1)| 00:00:02 |
|* 17 | HASH JOIN RIGHT OUTER | | 2108M| 206G| | 37937 (91)| 00:00:03 |
| 18 | TABLE ACCESS FULL | WMBS_JOBGROUP | 96366 | 846K| | 241 (2)| 00:00:01 |
|* 19 | HASH JOIN | | 63M| 5805M| | 5228 (39)| 00:00:01 |
|* 20 | TABLE ACCESS FULL | WMBS_FILESET | 49732 | 339K| | 244 (3)| 00:00:01 |
|* 21 | HASH JOIN RIGHT OUTER| | 63M| 5381M| 7176K| 4006 (26)| 00:00:01 |
| 22 | TABLE ACCESS FULL | WMBS_SUB_FILES_ACQUIRED | 459K| 1793K| | 901 (2)| 00:00:01 |
|* 23 | HASH JOIN OUTER | | 249K| 20M| | 619 (3)| 00:00:01 |
|* 24 | HASH JOIN | | 1805 | 142K| | 27 (4)| 00:00:01 |
|* 25 | TABLE ACCESS FULL | WMBS_SUBSCRIPTION | 1806 | 28896 | | 11 (0)| 00:00:01 |
|* 26 | TABLE ACCESS FULL | WMBS_WORKFLOW | 905 | 58825 | | 15 (0)| 00:00:01 |
| 27 | TABLE ACCESS FULL | WMBS_SUB_FILES_AVAILABLE | 249K| 976K| | 588 (2)| 00:00:01 |
|* 28 | INDEX UNIQUE SCAN | WMBS_FILESET_PK | 1 | 5 | | 0 (0)| 00:00:01 |
| 29 | TABLE ACCESS BY INDEX ROWID | WMBS_FILESET_FILES | 52 | 572 | | 8 (0)| 00:00:01 |
|* 30 | INDEX RANGE SCAN | WMBS_FILESET_FILES_IDX_FILESET | 80 | | | 2 (0)| 00:00:01 |
| 31 | TABLE ACCESS FULL | WMBS_FILE_PARENT | 5342K| 61M| | 3719 (5)| 00:00:01 |

| 32 | TABLE ACCESS FULL | WMBS_FILESET_FILES | 5157K| 54M| | 4824 (4)| 00:00:01 |

Predicate Information (identified by operation id):

1 - filter(COUNT("CHILD_WORKFLOW"."NAME")=0)
3 - access("CHILD_SUBSCRIPTION"."WORKFLOW"="CHILD_WORKFLOW"."ID"(+))
filter("CHILD_WORKFLOW"."NAME"(+)<>"COMPLETE_SUBSCRIPTION"."NAME")
6 - access("CHILD_SUBSCRIPTION"."FILESET"(+)="CHILD_FILESET"."FILESET")
7 - filter("CHILD_SUBSCRIPTION"."FINISHED"(+)=0)
8 - access("CHILD_FILESET"."FILEID"(+)="WMBS_FILE_PARENT"."CHILD")
9 - access("WMBS_FILE_PARENT"."PARENT"(+)="WMBS_FILESET_FILES"."FILEID")
13 - filter(COUNT("WMBS_SUB_FILES_AVAILABLE"."SUBSCRIPTION")=0 AND COUNT("WMBS_SUB_FILES_ACQUIRED"."SUBSCRIPTION")=0
AND COUNT("WMBS_JOB"."ID")=0)
15 - access("WMBS_JOB"."JOBGROUP"(+)="WMBS_JOBGROUP"."ID")
16 - filter("WMBS_JOB"."STATE"(+)<>4 AND "WMBS_JOB"."STATE_TIME"(+)>1386074466)
17 - access("WMBS_JOBGROUP"."SUBSCRIPTION"(+)="WMBS_SUBSCRIPTION"."ID")
19 - access("WMBS_FILESET"."ID"="WMBS_SUBSCRIPTION"."FILESET")
20 - filter(TO_NUMBER("WMBS_FILESET"."OPEN")=0)
21 - access("WMBS_SUB_FILES_ACQUIRED"."SUBSCRIPTION"(+)="WMBS_SUBSCRIPTION"."ID")
23 - access("WMBS_SUB_FILES_AVAILABLE"."SUBSCRIPTION"(+)="WMBS_SUBSCRIPTION"."ID")
24 - access("WMBS_WORKFLOW"."ID"="WMBS_SUBSCRIPTION"."WORKFLOW")
25 - filter("WMBS_SUBSCRIPTION"."FINISHED"=0)
26 - filter("WMBS_WORKFLOW"."INJECTED"=1)
28 - access("WMBS_FILESET"."ID"="COMPLETE_SUBSCRIPTION"."FILESET")
30 - access("WMBS_FILESET_FILES"."FILESET"(+)="WMBS_FILESET"."ID")

Statistics

   442  recursive calls
     0  db block gets
667078  consistent gets
     0  physical reads
   940  redo size
 19677  bytes sent via SQL*Net to client
  1323  bytes received via SQL*Net from client
    75  SQL*Net roundtrips to/from client
    57  sorts (memory)
     0  sorts (disk)
  1097  rows processed

ghost assigned hufnagel Dec 2, 2013

ticoann mentioned this issue Feb 6, 2014

WMCore-02-2014-issues #4935

Closed

ticoann added this to the WMAgent1404 milestone Apr 3, 2014

ticoann mentioned this issue Apr 3, 2014

WMCore-04-2014-issues #4997

Closed

ticoann mentioned this issue Apr 18, 2014

improve task archiver query performance #5088

Merged

ticoann closed this as completed in #5088 Apr 27, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisit the subscription finishing query #4865

revisit the subscription finishing query #4865

ticoann commented Dec 2, 2013

hufnagel commented Dec 2, 2013

ticoann commented Dec 3, 2013

hufnagel commented Dec 3, 2013

hufnagel commented Dec 3, 2013

ericvaandering commented Dec 3, 2013

ticoann commented Feb 6, 2014

revisit the subscription finishing query #4865

revisit the subscription finishing query #4865

Comments

ticoann commented Dec 2, 2013

hufnagel commented Dec 2, 2013

ticoann commented Dec 3, 2013

hufnagel commented Dec 3, 2013

hufnagel commented Dec 3, 2013

ericvaandering commented Dec 3, 2013

ticoann commented Feb 6, 2014

Execution Plan

| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |

| 32 | TABLE ACCESS FULL | WMBS_FILESET_FILES | 5157K| 54M| | 4824 (4)| 00:00:01 |

Predicate Information (identified by operation id):

Statistics