New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
revisit the subscription finishing query #4865
Comments
It would be good to make a snapshot of a database when it is in a state where the query takes a very long time. I can take a look at things, but without being able to test various different query optimizations and look at the execution plans and trace the queries, it's hard to make any improvements. If this is encountered in an active agent, just shutdown the agent and let me know. I can then open a ticket with Oracle support to make a copy of the db and we can restart the agent after. It will mean at least a few hours downtime though. |
We probably need to do that. The one I was looking at (vocms201) currently takes more than ~23h and going for this query to finish. I have to ask ops whether we can shutdown the agent completely for a few hours. So I guess it is not possible to copy db while agent is running. Anyway, I will talk to ops tomorrow. |
I think at the very least one has to lock the db while the copy is running. Otherwise the copied information would not be internally consistent, because you would allow changes in some already copied tables while other not yet copied tables haven't been copied yet and that would mean the target database might have inconsistent information (FK relationships missing etc). If you can let me know the database, I can open a ticket with CERN Oracle Support and ask, they might have a way (blocking the application for a while might be preferable to shutting the agent down outright). |
Btw, I have been observing this from the other end. On the weekly CMSR Oracle production use statistics reports send by the CERN Oracle team, the WMAgent production instances have been constantly among the highest load of all of the CMS production Oracle accounts. And we have 6 of them ! This is just an example from the other end that contributes to that, I am sure there are others. We need to do a schema review and look at how we do queries and make sure needed indexes are there and what the most busiest queries are etc etc. Otherwise we will run against a wall at some point (would say queries taking 24 hours already is hitting a wall). |
Yes, this is on the work plan. Probably Feb or March is when it will start. On Dec 3, 2013, at 3:10, Dirk Hufnagel notifications@github.com wrote:
|
1097 rows selected. Execution PlanPlan hash value: 1923316344 | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time || 0 | SELECT STATEMENT | | 1 | 197 | | 557K (94)| 00:00:39 | | 32 | TABLE ACCESS FULL | WMBS_FILESET_FILES | 5157K| 54M| | 4824 (4)| 00:00:01 |Predicate Information (identified by operation id):1 - filter(COUNT("CHILD_WORKFLOW"."NAME")=0) Statistics
|
https://github.com/dmwm/WMCore/blob/master/src/python/WMCore/WMBS/MySQL/Subscriptions/GetAndMarkNewFinishedSubscriptions.py
Not sure how can it be improved or what other things need to be changed.
There are a couple of problems on this query.
Will it be helpful to break down the query?
Dirk could you look at this if you have a chance? If you don't have time, could you give me some hint to improve this.
The text was updated successfully, but these errors were encountered: