Update data files for DropBoxMetadata tag #40522

malbouis · 2023-01-13T18:10:38Z

PR description:

This is to update the json files that are used to produce the DropBoxMetada tag. Some of the tags (SiStripBadChannel_PCL_v1_prompt, BeamSpotObjects_PCL_byRun_v1_prompt, BeamSpotObjects_PCL_byLumi_v1_prompt) had pcl synchronisation in Prep DB and this was preventing upload from Tier0 replays. This PR replaces the faulty tags and also updates a few other ones to have a different name between Prod and Prep tags to try avoid such a case of wrong synch in Prep DB in the future.

The difference between the new DMD tag produced with the changes introduced here and the previous one can be seen in https://cern.ch/go/LC9S

The new DMD tag has been included in the 126X_dataRun3_Express_Queue and will be picked up when a new Express GT is created.

PR validation:

created a Candidate GT by queueing the new tag to 126X_dataRun3_Express_Queue
ran RTM the workflow 1001.3 with the candidate GT from above

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

Not a backport.

cmsbuild · 2023-01-13T18:15:12Z

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-40522/33727

This PR adds an extra 12KB to repository

cmsbuild · 2023-01-13T18:15:35Z

A new Pull Request was created by @malbouis for master.

It involves the following packages:

CondFormats/Common (db, alca)

@malbouis, @yuanchao, @cmsbuild, @saumyaphor4252, @ggovi, @tvami, @ChrisMisan, @francescobrivio can you please review it and eventually sign? Thanks.
@mmusich, @tocheng, @seemasharmafnal this is something you requested to watch as well.
@perrotta, @dpiparo, @rappoccio you are the release manager for this.

cms-bot commands are listed here

malbouis · 2023-01-13T18:15:42Z

please test

cmsbuild · 2023-01-13T20:40:36Z

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b59c0c/29986/summary.html
COMMIT: 226b19c
CMSSW: CMSSW_13_0_X_2023-01-13-1100/el8_amd64_gcc11
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/40522/29986/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

No significant changes to the logs found
Reco comparison results: 7 differences found in the comparisons
DQMHistoTests: Total files compared: 49
DQMHistoTests: Total histograms compared: 3555538
DQMHistoTests: Total failures: 6
DQMHistoTests: Total nulls: 0
DQMHistoTests: Total successes: 3555510
DQMHistoTests: Total skipped: 22
DQMHistoTests: Total Missing objects: 0
DQMHistoSizes: Histogram memory added: 0.0 KiB( 48 files compared)
Checked 211 log files, 162 edm output root files, 49 DQM output files
TriggerResults: no differences found

mmusich · 2023-01-13T21:25:48Z

@malbouis
I am puzzled by this PR:

(SiStripBadChannel_PCL_v1_prompt, ...) had pcl synchronisation in Prep DB and this was preventing upload from Tier0 replays.

the corresponding tag for prod for the strip bad components has pcl syncronization as well:

cmssw/CondFormats/Common/data/SiStripBadStripRcd_prod.json

Line 4 in 226b19c

"SiStripBadChannel_FromOfflineCalibration_GR10_v1_prompt": {}

https://cms-conddb.cern.ch/cmsDbBrowser/list/Prod/tags/SiStripBadChannel_FromOfflineCalibration_GR10_v1_prompt

and it has been set like this on purpose.
See the "manual":

prompt synchronization would imply that if the payload produced by the PCL has an IoV smaller than the FCSR it is uploaded nonetheless on the FCSR+1, but this is not OK for a bad component condition that changes dynamically run-by-run as you don't want to write the bad components computed on run n on the run n+1 since the dynamic part driven by readout noise might be temporary.
Can you elaborate why

had pcl synchronisation in Prep DB and this was preventing upload from Tier0 replays.

?
Also shouldn't this change be taken in concert with the subsystem expert?

malbouis · 2023-01-13T21:58:51Z

Hi Marco,

as far as I understand, the Prep DB tag is only used for the replays in this case. All replays do the upload to the Prep DB. If the synchronisation of the tag being uploaded by a replay is pcl, it will not be possible to do the upload as the run from the replay is in the past, therefore the change in this PR.

I don't see anything wrong with it as we are not touching the Prod DB.

Is the Prep DB tag from the strips bad component used for anything other than the replays? If so, I would recommend moving it to Prod DB.

Thanks.

mmusich · 2023-01-13T22:05:32Z

it will not be possible to do the upload as the run from the replay is in the past, therefore the change in this PR.

I think the intention of the original author was to have the same setup as used in production. Is it relevant that replays re-upload the same conditions on the prep tag?

Is the Prep DB tag from the strips bad component used for anything other than the replays?

I don't think so

malbouis · 2023-01-13T22:08:48Z

I think the intention of the original author was to have the same setup as used in production. Is it relevant that replays re-upload the same conditions on the prep tag?

We would like to always make sure the upload from Tier0 is successful. At the moment, we are getting an error from the Tier0 uploader when doing the replays and this is sub-optimal.

Is the Prep DB tag from the strips bad component used for anything other than the replays?

I don't think so

Then we are good, right? Since we would like to have a successful upload from the Tier0 uploader during the replays, we could use this new tag created in this PR.

mmusich · 2023-01-13T22:11:10Z

Then we are good, right? Since we would like to have a successful upload from the Tier0 uploader during the replays, we could use this new tag created in this PR.

well, I am still unsure why you want to have a successful upload. The uploads were rejected without any problems in the past 10+ years of operations.

malbouis · 2023-01-13T22:13:43Z

well, I am still unsure why you want to have a successful upload. The uploads were rejected without any problems in the past 10+ years of operations.

I think it is an extra layer of checks that all is as expected in a replay. If the tag is not used for anything else, I also don't see why it cannot be used for the original intent (replays).

mmusich · 2023-01-13T22:18:03Z

I think it is an extra layer of checks that all is as expected in a replay. If the tag is not used for anything else, I also don't see why it cannot be used for the original intent (replays).

the fact that the upload is rejected and that follows the rules of the pcl synchronization is exactly what it is expected to do. Unless there is desire to check at every replay the content of the conditions (I am not sure who would be supposed to that), I don't see the need of the upload. Also seems dangerous to have the replay and production diverge as in the future someone might be tempted to change the production environment because to even out (and that would be wrong).

malbouis · 2023-01-13T22:28:22Z

I think it is an extra layer of checks that all is as expected in a replay. If the tag is not used for anything else, I also don't see why it cannot be used for the original intent (replays).

the fact that the upload is rejected and that follows the rules of the pcl synchronization is exactly what it is expected to do. Unless there is desire to check at every replay the content of the conditions (I am not sure who would be supposed to that), I don't see the need of the upload. Also seems dangerous to have the replay and production diverge as in the future someone might be tempted to change the production environment because to even out (and that would be wrong).

But if the Prep DB is for the replays, I don't see how someone would like to even Prep and Prod in the future, knowing that Prep DB is used only for the replays. Even if they do, the synchronisation of the prod tags should be kept in any case as we cannot have a tag with 'any' synch in prod. I honestly don't see how someone would try to even out the tags between prep and prod. And I still think it is more productive to have a successful upload (to eventually check the uploaded conditions to prep, it does't have to be at every replay, but it might be useful ar some point) than just having a failure and getting used to ignore upload failures.

mmusich · 2023-01-13T22:54:42Z

Even if they do, the synchronisation of the prod tags should be kept in any case as we cannot have a tag with 'any' synch in prod.

So - you plan to leave the prep tag unsyncronized? Mmh, not a very realistic setup.

tvami · 2023-01-15T13:05:57Z

hi @mmusich

without any problems in the past 10+ years of operations

As you know, this year we had a few replays where the replays were "successful", but when we plugged it in to real operations we had issues. Those issues could have been discovered by reading the logs. This situation triggered the T0 team to take warnings/errors in the logs more seriously.

We'd prefer not to have this error come up in all the cases of the replays...
If you are worried that the tag in Prep is used for something more than technical checks, we could rename them and add forReplays or something in the tag name.
About synchronization... we could set them to offline, we barely make replays in runs that are before the last replayed run, so adding an IOV before the last IOV boundary will not come up, while adding an IOV to the same run number will come up a few times.

What do you think?

mmusich · 2023-01-16T09:33:19Z

This situation triggered the T0 team to take warnings/errors in the logs more seriously.

OK, that's good to know.

About synchronization... we could set them to offline, we barely make replays in runs that are before the last replayed run, so adding an IOV before the last IOV boundary will not come up, while adding an IOV to the same run number will come up a few times.

some times, one does "preplays" though right? especially when you are not really sure if Prompt will come out right or not. I saw that happening at least a couple of times in the past.

malbouis · 2023-01-16T13:57:14Z

I think that in this specific case of the replays the any synchronisation is actually appropriate. We are not interested in the overall history of the tags but rather only each replay independently.

To sumarize:

We would like to have clean logs from the Tier0 uploader for the replays (as a good practice to not ignore errors related to replays and by request from Tier0 experts)
For this, we think the best would be to keep all the PrepDB tags with any synchronisation

I would like, if possible to move forward with this PR. Marco, please let us know if there is any objection from your side.

cmsbuild · 2023-01-24T17:20:30Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

mmusich · 2023-01-25T08:41:24Z

seems we converged,

I wouldn't say we have converged, rather we have established that we have different perceptions of what actually matters to test in a replay. Having said that, this PR is rather unconsequential, as the actual change in the Tier0 replay behavior hinges on a change of the Global Tag used there and not on the content of the release.

rappoccio · 2023-01-25T15:18:49Z

hold

We probably should discuss this strategy, it would be good to have everyone on board with the plan.

cmsbuild · 2023-01-25T15:19:16Z

Pull request has been put on hold by @rappoccio
They need to issue an unhold command to remove the hold state or L1 can unhold it for all

francescobrivio · 2023-01-25T15:43:55Z

Ciao @mmusich,

in that way the synchronization of the condition uploader deamon is not tested.

but the syncronization checks are not a feature of the Tier0 daemon right?
AFAIU the sync is a feature of the tags themselves and the checks are performed by the cmsDbUpload service (whether the upload request comes from the uploadConditions.py or from the Tier0 daemon).
So I believe that a replay, which is by definition out of sync (at least in my mind) given it is always deployed on old data, is not really the best way to test the synchronization upload policies.

that we have different perceptions of what actually matters to test in a replay.

Yes I think that's the case 😄 As Helena mentioned already earlier in the thread, with this PR we want to make sure the replays are able to upload conditions at each replay, i.e. to check that there is no issue with the upload it self (e.g. possible authentication failures, old passwords in Tier0, etc..., all cases that happened in 2021 after the LS2 break).

tvami · 2023-01-25T16:01:08Z

Hi @rappoccio

Having said that, this PR is rather unconsequential,

I see this comment from Marco as no objection to get this PR merged

mmusich · 2023-01-25T16:04:21Z

but the syncronization checks are not a feature of the Tier0 daemon right?

well, no.
The "synchronization" you are referring to is a field in the database, but that is what is checked (together with the information of the FCSR) by the uploader setup to decide whether upload or not. So it is a feature of the uploading system - in the case of a Tier-0 replay of the condtion uploader daemon.

given it is always deployed on old data, is not really the best way to test the synchronization upload policies.

why not? Part of the PCL concept hinges precisely on the "too late to be accepted" (see detailed explanation at #40522 (comment))

(e.g. possible authentication failures, old passwords in Tier0, etc..., all cases that happened in 2021 after the LS2 break).

all of this can be checked as well with an upload failure - just read the logs :)

tvami · 2023-01-25T19:36:22Z

all of this can be checked as well with an upload failure - just read the logs :)

but the point is that if the T0 team will make errors popping up based on the logs then all replays will be unsuccessful for a know feature

mmusich · 2023-01-25T19:39:13Z

then all replays will be unsuccessful for a know feature

Certainly the log parsing can be adapted to distinguish between expected failure and real failure.

malbouis · 2023-01-27T18:11:53Z

@mmusich , AlCaDB would really like to have this PR merged, as we have been dealing with many replays and this would ease the evaluation of the replays for us.
Do you have strong objections for merging this PR?
We can of course re-discuss this matter in the future and re-evaluate the decision taken here.
But for now, we would really appreciate it to be able to merge it as is.
Please let us know.

mmusich · 2023-01-27T19:24:22Z

Do you have strong objections for merging this PR?

@malbouis, as far as I can tell nothing is holding this PR.
I am still a bit dubious of the implied consequences. I hope that the change in the replays behaviour will be properly announced to the subsystem experts that might not be following the gitHub thread, and that once the policy change will be enforced the now "successful" replay uploads will be actively scrutinized, not only relying on the return code of the uploader.

tvami · 2023-01-27T19:35:21Z

@rappoccio
given #40522 (comment) can you please unhold?

perrotta · 2023-01-29T09:15:58Z

unhold

Even if not happy with it, it looks like @mmusich agrees with merging this pull request

cmsbuild · 2023-01-29T09:16:22Z

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

perrotta · 2023-01-29T09:16:43Z

+1

update json files for DMD tag

226b19c

cmsbuild added this to the CMSSW_13_0_X milestone Jan 13, 2023

cmsbuild added alca-pending code-checks-pending db-pending orp-pending pending-signatures tests-pending labels Jan 13, 2023

cmsbuild added code-checks-approved and removed code-checks-pending labels Jan 13, 2023

cmsbuild added tests-started and removed tests-pending labels Jan 13, 2023

cmsbuild added tests-approved and removed tests-started labels Jan 13, 2023

cmsbuild added alca-approved db-approved fully-signed labels Jan 24, 2023

cmsbuild added hold pending-signatures and removed fully-signed labels Jan 25, 2023

cmsbuild mentioned this pull request Jan 26, 2023

Backport codes for adding new data catalogs from Rucio storage description (RucioCatalog) and use it by default instead of trivial data catalogs (TrivialCatalog) #37278, and related bug fixes #40628

Closed

cmsbuild added fully-signed and removed pending-signatures hold labels Jan 29, 2023

cmsbuild added orp-approved and removed orp-pending labels Jan 29, 2023

cmsbuild merged commit 35cd7fc into cms-sw:master Jan 29, 2023

saumyaphor4252 mentioned this pull request Jan 30, 2023

[12_6_X] Update data files for DropBoxMetadata tag #40640

Merged

cmsbuild mentioned this pull request Jan 31, 2023

[ROOT626] Updated root to tip of branch v6-26-00-patches cms-sw/cmsdist#8277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update data files for DropBoxMetadata tag #40522

Update data files for DropBoxMetadata tag #40522

malbouis commented Jan 13, 2023

cmsbuild commented Jan 13, 2023

cmsbuild commented Jan 13, 2023

malbouis commented Jan 13, 2023

cmsbuild commented Jan 13, 2023

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023 •

edited

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023

mmusich commented Jan 13, 2023

tvami commented Jan 15, 2023

mmusich commented Jan 16, 2023

malbouis commented Jan 16, 2023

cmsbuild commented Jan 24, 2023

mmusich commented Jan 25, 2023

rappoccio commented Jan 25, 2023

cmsbuild commented Jan 25, 2023

francescobrivio commented Jan 25, 2023

tvami commented Jan 25, 2023

mmusich commented Jan 25, 2023

tvami commented Jan 25, 2023

mmusich commented Jan 25, 2023

malbouis commented Jan 27, 2023

mmusich commented Jan 27, 2023

tvami commented Jan 27, 2023

perrotta commented Jan 29, 2023

cmsbuild commented Jan 29, 2023

perrotta commented Jan 29, 2023

Update data files for DropBoxMetadata tag #40522

Update data files for DropBoxMetadata tag #40522

Conversation

malbouis commented Jan 13, 2023

PR description:

PR validation:

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

cmsbuild commented Jan 13, 2023

cmsbuild commented Jan 13, 2023

malbouis commented Jan 13, 2023

cmsbuild commented Jan 13, 2023

Comparison Summary

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023 • edited

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023

mmusich commented Jan 13, 2023

malbouis commented Jan 13, 2023

mmusich commented Jan 13, 2023

tvami commented Jan 15, 2023

mmusich commented Jan 16, 2023

malbouis commented Jan 16, 2023

cmsbuild commented Jan 24, 2023

mmusich commented Jan 25, 2023

rappoccio commented Jan 25, 2023

cmsbuild commented Jan 25, 2023

francescobrivio commented Jan 25, 2023

tvami commented Jan 25, 2023

mmusich commented Jan 25, 2023

tvami commented Jan 25, 2023

mmusich commented Jan 25, 2023

malbouis commented Jan 27, 2023

mmusich commented Jan 27, 2023

tvami commented Jan 27, 2023

perrotta commented Jan 29, 2023

cmsbuild commented Jan 29, 2023

perrotta commented Jan 29, 2023

malbouis commented Jan 13, 2023 •

edited