-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DBS3Upload component keeps running during the injection of not valid files coming from a race condition on DBS server side #11107
Comments
@jhonatanamado you outlined three different issues here:
Therefore, the latter two issues should disappear in new workflows submissions as I put them in into new DBS server version which is already deployed on testbed, but the original issue that WMCore should raise exception is what this ticket should address. |
@jhonatanamado Jhonatan, the database constraint error is likely something that shouldn't even be reported to the end user. It's likely not the error that we should be looking at as well (there are a few errors nested in the error message, as far as I could follow). Anyhow, I see that Valentin has made some changes to the DBSWriter code and we have a new dbs2go release in testbed. Before doing any further investigation, could you please trigger a new replay (and ensure output datasets will be unique, which I think it's already done in your workflow)? Please let us know about the outcome. Thanks |
As requested, I deployed a second replay ensuring that the output of dataset will be unique in testbed with some runs that will produce a considerable pressure on the server. Checking the Componentlog I saw some Oracle Errors including
|
@jhonatanamado thanks for running another test. From the log you posted, this error is strange to me (even though not critical):
Regarding this error message:
I have two comments to make:
Last but not least, the |
Regarding oracle errors:
|
I'm not sure you answered my questions raised above. But to add to this, given that we are inserting one unit - which is the block and its related data/metadata information - I think it should all be done within a single transaction. If you separate it in multiple transactions, how can you rollback a transaction that have been completed and committed? In addition to that, I do not think the database layer information should be returned to the end user (end user does not need to know anything about database, tables, columns, constraints and so on. What we care is that a block, or a file, or a processing_string failed or succeeded, or is duplicated, so on. If I understand correctly what is done on the server side, you:
is that correct? If so, then it means that the client could get 10 error messages back because the very first insertion (e.g. dataset id) failed to be inserted into the database, cascading this error to all the other inserts running concurrently. |
We had a chat with Alan and agreed that dataset output configurations will be injected within single transaction. I rolled back my changes on testbed. I also provided a temporary solution to solve racing conditions, please see my comments here: #11106 (comment) |
Regarding the @jhonatanamado as there is no real problem here, besides the future code refactoring to be addressed in #11106, shall we close this ticket? |
Yes @amaltaro Im closing this ticket. |
Impact of the bug
T0 Agent, WMAgent, Agents/Services using DBS3Upload
Describe the bug
Testing with a replay dbsgo server we found that the DBS3Upload component from Tier0 agent keeps running when the following error appears in the component log.
Later all other calls for injecting data to dbs succeed and the replay finished without issues. (All wfs marked as archived) But for the block
/ExpressCosmics/Tier0_REPLAY_2022-Express-v37/FEVT#7555befd-7860-4940-b23b-a1ad71d30eac
the files in dasgoclient appear as not valid.More information about this race condition is described in #11106
How to reproduce it
This error appears only when a new replay version is set. For dataset/blocks already created in dbs the injection of new files succeed.
Expected behavior
The component should raise an error exception if the injection is not possible.
Additional context and error message
Full component log can be found in
/afs/cern.ch/user/c/cmst0/public/WMCoreIssues/DBS2GO
The text was updated successfully, but these errors were encountered: