make a T0-like relval for 2018 HI #24619

slava77 · 2018-09-21T12:12:52Z

[as a follow up to the discussion in the joint ops meeting on Sep 21]
please add a relval matrix workflow to be used for testing T0-like setup in offline environment and IBs. Initial setup can be based on the 2018 MD3 HI test runs.
Once we start running, another or this setup can be updated to use actual data.

@franzoni

cmsbuild · 2018-09-21T12:13:05Z

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

slava77 · 2018-09-21T12:13:06Z

assign pdmv

cmsbuild · 2018-09-21T12:13:18Z

New categories assigned: pdmv

@prebello,@pgunnell,@zhenhu you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 · 2018-09-21T12:38:17Z

the best setup for this would be to use the files with data content before repacking of the FED data so that this relval test also includes the reHLT part as well

zhenhu · 2018-09-21T12:43:33Z

Hi @slava77 , could you please give us a bit more information about this workflow? such as the input, the era, the conditions, etc.

slava77 · 2018-09-21T13:18:04Z

assign alca,hlt

I'm adding AlCa to the thread to provide inputs on the conditions and perhaps ALCA parts of the workflows.
I'm also adding HLT to suggest a configuration.

I think that a fraction of this workflow will be like wf 140.55, only using Run2_2018_pp_on_AA era and scenario --pp

cmsbuild · 2018-09-21T13:18:14Z

New categories assigned: hlt,alca

@lpernie,@franzoni,@pohsun,@tocheng,@Martin-Grunewald,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

slava77 · 2018-09-21T13:19:20Z

@mandrenguyen @icali
please follow and/or advise here as well in order to get the right inputs

zhenhu · 2018-09-21T13:51:41Z

I chatted with @mmusich and got some receipt on how to modify wf 140.55.

We will update the input dataset with the MD3 data
We need a menu from HLT which does the hybrid ZS + repacking (something similar to HYBRIDZSHI2015)
The rest steps in 140.55, such as 'RECOHID15','HARVESTDHI15', can be reused. But we need to change the conditions for 2018 detector.
What we are missing is mainly the 2nd item above. Please let me know if you have any comments.

icali · 2018-09-21T17:30:55Z

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.

Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

slava77 · 2018-09-21T20:49:14Z

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.

Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

I'm a bit confused about the possible variety of content.
Do we have a one-to-one mapping of PDs to output FED collection names? (one PD <-> one FED name).
If so, the T0 configuration will just need to deal with a different FED collection name pickup.

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

IIRC, using MassReplaceInputTag and doing something at T0 repacking steps are two different solutions

instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications. @Dr15Jones can EDAlias be used to "rename" a product made in another process?

in the regular T0 (or rereco) configuration apply MassReplaceInputTag depending on which FED collection name the input dataset has.

recall that for the HI configuration we use the ConfigBuilder --repacked option to make this renaming of input tags from rawDataRepacker to rawDataCollector

similar to this, one can add --reduced flag and copy-paste/edit the implementation

cmssw/Configuration/Applications/python/Options.py

Lines 285 to 289 in 1cd19ae

    
           expertSettings.add_option("--repacked", 
        
                                     help="When the input file is a file with repacked raw data with label rawDataRepacker", 
        
                                     action="store_true", 
        
                                     default=False, 
        
                                     dest="isRepacked"

and

cmssw/Configuration/Applications/python/ConfigBuilder.py

Lines 2219 to 2223 in 1cd19ae

    
           if self._options.isRepacked: 
        
               self.pythonCfgCode +="\n" 
        
               self.pythonCfgCode +="from Configuration.Applications.ConfigBuilder import MassReplaceInputTag\n" 
        
               self.pythonCfgCode +="MassReplaceInputTag(process)\n" 
        
               MassReplaceInputTag(self.process)

with MassReplaceInputTag(process, new="rawDataReducer")

My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data.
Option 2. is more practical.

Dr15Jones · 2018-09-21T21:12:09Z

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

icali · 2018-09-22T06:41:13Z

adding @FHead and @stahlleiton that are going to implement the HLT menu.

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.
Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

I'm a bit confused about the possible variety of content.
Do we have a one-to-one mapping of PDs to output FED collection names? (one PD <-> one FED name).
If so, the T0 configuration will just need to deal with a different FED collection name pickup.

Yes, there is going a one-to-one mapping of PDs to output FED collection. In order to close the loop, I would propose the following naming convention:

PDs without the HI in the name (standard pp name): rawDataCollector
PDs with HI in the name and without the mention ReducedFormat: rawDataRepacker
PDs with HI in the name and with the mention ReducedFormat: rawDataReducedFormat (I removed to Repacker from the previous proposed name to shorten the name)

Please let us know if this convention looks reasonable to you. Any name suggestion for the reduced format collection/PDs is more than welcome.

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

IIRC, using MassReplaceInputTag and doing something at T0 repacking steps are two different solutions
instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications. @Dr15Jones can EDAlias be used to "rename" a product made in another process?
in the regular T0 (or rereco) configuration apply MassReplaceInputTag depending on which FED collection name the input dataset has.
recall that for the HI configuration we use the ConfigBuilder --repacked option to make this renaming of input tags from rawDataRepacker to rawDataCollector
similar to this, one can add --reduced flag and copy-paste/edit the implementation
  [cmssw/Configuration/Applications/python/Options.py](https://github.com/cms-sw/cmssw/blob/1cd19ae3348cf8de750490004c266d2e8b48b328/Configuration/Applications/python/Options.py#L285-L289)


    Lines 285 to 289
  in
  [1cd19ae](/cms-sw/cmssw/commit/1cd19ae3348cf8de750490004c266d2e8b48b328)





    
      
       expertSettings.add_option("--repacked", 
    

    
      
                                 help="When the input file is a file with repacked raw data with label rawDataRepacker", 
    

    
      
                                 action="store_true", 
    

    
      
                                 default=False, 
    

    
      
                                 dest="isRepacked" 
and
  [cmssw/Configuration/Applications/python/ConfigBuilder.py](https://github.com/cms-sw/cmssw/blob/1cd19ae3348cf8de750490004c266d2e8b48b328/Configuration/Applications/python/ConfigBuilder.py#L2219-L2223)


    Lines 2219 to 2223
  in
  [1cd19ae](/cms-sw/cmssw/commit/1cd19ae3348cf8de750490004c266d2e8b48b328)





    
      
       if self._options.isRepacked: 
    

    
      
           self.pythonCfgCode +="\n" 
    

    
      
           self.pythonCfgCode +="from Configuration.Applications.ConfigBuilder import MassReplaceInputTag\n" 
    

    
      
           self.pythonCfgCode +="MassReplaceInputTag(process)\n" 
    

    
      
           MassReplaceInputTag(self.process) 
with MassReplaceInputTag(process, new="rawDataReducer")
My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data.
Option 2. is more practical.

I personally would have preferred option 1 because it would simplify the operations for any future RAW data manipulation. However, if it is less "safe", lets go with option 2. Could it be possible to add also the second collection name to the same --repacker flag? It would/could simplify the operation for future raw processing.
Thank you again!

slava77 · 2018-09-22T13:39:50Z

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

Is it possible to have an EDAlias specific to output file?
Let's say, we are writing in a given process the same type of product from producerA and producerB: producerA goes to file A, producerB to file B. I would like to make the consumers of file A or B to get this product with the same InputTag.

slava77 · 2018-09-22T13:42:47Z

Please let us know if this convention looks reasonable to you. Any name suggestion for the reduced format collection/PDs is more than welcome.

these look OK to me

Could it be possible to add also the second collection name to the same --repacker flag? It would/could simplify the operation for future raw processing.

I think this will work even better, it just needs a bit more creative coding (not just copy-paste/replace that I proposed now).

slava77 · 2018-09-22T13:45:48Z

[in the T0 repacking step] 1. instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications.

I personally would have preferred option 1 because it would simplify the operations for any future RAW data manipulation.

@drkovalskyi @hufnagel may want to comment on feasibility of this request for the T0 developments (to be delivered in ~4 weeks).

fwyzard · 2018-09-22T14:01:17Z

On Fri, 21 Sep 2018, 22:50 Slava Krutelyov, ***@***.***> wrote: My guess is that option 1. will require more time than we have to develop

and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data. Doesn't it "just" require 3 different repacking configurations, and a mapping between dataset name and which configuration to use?

Option 2. is more practical.

Actually, Option 2. pushes the complexity to all present and future consumers of these data. It is more practical only for the person that would have to implement Option 1.

…

slava77 · 2018-09-22T14:08:43Z

On 9/22/18 7:01 AM, Andrea Bocci wrote: On Fri, 21 Sep 2018, 22:50 Slava Krutelyov, ***@***.***> wrote: > My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data. Doesn't it "just" require 3 different repacking configurations, and a mapping between dataset name and which configuration to use?

Yes, it's not that complicated. But making mistakes in the repacking step are particularly dangerous. So, debugging/validation requires extreme care.

> Option 2. is more practical. Actually, Option 2. pushes the complexity to all present and future consumers of these data. It is more practical only for the person that would have to implement Option 1. >

You are correct in the long run. My assessment was considering the delivery time of just about 4 weeks

hufnagel · 2018-09-23T01:15:11Z

Repack configurations are generated in

Configuration.DataProcessing.Repack

and we would need a tweak there to convert the data products somehow and generate the correct output. I don't see how producing that tweak and testing it standalone falls anywhere under Tier0 development.

Once this configuration tweak has been produced and is integrated into Config.DP, this would turn into a Tier0 testing/validation issue though. But I don't see that as a big problem assuming the previous standalone testing was thorough.

davidlange6 · 2018-09-23T08:39:44Z

eg, the tier0 work is to create a matrix of configurations for repack just as there is for prompt (which does go wrong from time to time..) - how much time do people have to notice repack errors before data falls on the floor?

…

On Sep 23, 2018, at 3:15 AM, Dirk Hufnagel ***@***.***> wrote: Repack configurations are generated in Configuration.DataProcessing.Repack and we would need a tweak there to convert the data products somehow and generate the correct output. I don't see how producing that tweak and testing it standalone falls anywhere under Tier0 development. Once this configuration tweak has been produced and is integrated into Config.DP, this would turn into a Tier0 testing/validation issue though. But I don't see that a big problem assuming the previous standalone testing was thorough. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

hufnagel · 2018-09-23T15:14:44Z

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

Next level would be passing a parameter to Configuration.DataProcessing.Repack that configures whether or not we get a standard repack or this new thing. Yeah, that would need Tier0 development work.

Repack errors that crash CMSSW cause paused jobs which block streamer deletions. As long as we don't run out of space at P5 it even blocks streamer deletion there. A repack error due to bad configuration almost certainly isn't recoverable within the same Tier0 instance though. You are talking about having to do recovery replays here.

The real problematic case is repack errors that don't crash CMSSW. You have 7 days to notice that normally, but if we are very busy it could be less (since we delete streamers more aggressively then).

fwyzard · 2018-09-23T16:35:49Z

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

I would say because you need to look at the data to figure out what to do, and you cannot do that at configuration level.

hufnagel · 2018-09-23T22:31:02Z

Frankly, the only piece of the Tier0 that looks at the data are the CMSSW jobs. Nothing else cares about how the 0's and 1's are organized in the data files.

So how would this work then ? We trust that HLT puts such data into a special stream and we configure that stream to be repacked in a special way ?

Either way, before there can be any Tier0 development here, someone needs to create a valid repack configuration for this data. That repack configuration needs to be tested standalone and then needs to be integrated into Config.DP.Repack (add a parameter to activate it that is false by default for instance).

Once all of that is in place, doing the Tier0 development work to create a config flag to enable this for a stream and use it shouldn't be that much work. Few days for the code changes, longer for the Tier0Ops validation/testing (could be much longer if the Config.DP.Repack changes weren't done correctly before).

fwyzard · 2018-09-23T22:40:43Z

On 24 September 2018 at 00:31, Dirk Hufnagel ***@***.***> wrote: So how would this work then ? We trust that HLT puts such data into a special stream and we configure that stream to be repacked in a special way ?

Yes, one would map different streams to different job configurations. Either way, before there can be any Tier0 development here, someone needs

to create a valid repack configuration for this data. That repack configuration needs to be tested standalone and then needs to be integrated into Config.DP.Repack (add a parameter to activate it that is false by default for instance).

True, except I would say that the Tier-0 development and validation can happen in parallel to the development of the different repacking configurations.

hufnagel · 2018-09-23T22:55:01Z

True, except I would say that the Tier-0 development and validation can happen in parallel to the development of the different repacking configurations.

The part that can happen in parallel consists of adding a dozen code lines spread across a few places. Which is not difficult assuming you know what these few places are (which I think I do).

The vast majority of the Tier0 development will consist of running this against a new CMSSW release with various stream configurations to make sure it works correctly (extracting and looking at the generated repack configurations). It's a bit pointless starting any of this without having a CMSSW release with the Config.DP.Repack changes in place.

davidlange6 · 2018-09-24T06:22:10Z

On Sep 23, 2018, at 6:35 PM, Andrea Bocci ***@***.***> wrote: You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ? I would say because you need to look at the data to figure out what to do, and you cannot do that at configuration level.

I'm not sure why it would be so hard to have a module look for one of N collections and use it? It does expand "replacing" to be more than just repacking, but that is the case regardless in this thread. [and the process name of the FEDRawData in raw files would change at a minimum.. but again thats likely true in any solution downstream of the HLT]

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

fwyzard · 2018-09-24T06:42:05Z

I'm not sure why it would be so hard to have a module look for one of N collections and use it?

Mhm, yes, I think that technically it would work. I don't know if the existing raw data collector module can cope with missing input collections, but it should be a simple extension. It does expand "replacing" to be more than just repacking, but that is the

case regardless in this thread.

[and the process name of the FEDRawData in raw files would change at a

minimum.. but again thats likely true in any solution downstream of the HLT]

Actually, if we choose what to run at configuration level, we can keep the original process name (LHC or HLT) for all the cases where the full raw data is available, and the renaming is not needed. And now I'm thinking, what if we extend the raw data collector module, or we put an EDFilter in front it, so that the renaming happens only if the data is found to have the "skimmed" name ? As long as the HLT is correctly configured to send a single raw data collection (original, zero-suppressed, or skimmed), it may work with a single repacking configuration.

davidlange6 · 2018-09-24T08:06:50Z

On Sep 24, 2018, at 8:42 AM, Andrea Bocci ***@***.***> wrote: And now I'm thinking, what if we extend the raw data collector module, or we put an EDFilter in front it, so that the renaming happens only if the data is found to have the "skimmed" name ?

its possible (as you would know better than me..) - but then you give up the consistent raw data format across data sets

fwyzard · 2018-09-24T08:27:08Z

Yes, but the inconsistency would only be in the "process name" part of the collections, which most configurations ignore anyway.

So we should be able to run the same downstream configuration on all the inputs, and still have a simple way to check from the data what one is running on (and differentiate if needed).

icali · 2018-09-25T09:31:57Z

Thank you for all the input. Not an easy decision between 1 and 2. We should consider also that the same running mode will happen also during run 3 so what is decided now will be kept for the next 4 HI runs. In this optic I'm still in favor of option 1 also if more complex to implement. As HI we can inject some manpower and update/test locally the Config.DP.Repack but we would need some guidance.

However, what is not clear to me is how it is possible to have the failure mode in which the repacking jobs are not crashing but the data results to be corrupted. The majority of our data will indeed not be reconstructed immediately. A subtile failure in the RAW data repacking would be spot only when the streamers file will not be available anymore.

Thinking out laud, do you think that option 3 could be feasible? The option 3 would be implement option 2 and include in the reco sequence a raw skimming configuration that takes rawDataReducedFormat and produces rawDataRepacker. Only the rawDataRepacker will go to tape while the original data will be delete.

davidlange6 · 2018-09-25T09:39:36Z

On Sep 25, 2018, at 11:31 AM, Ivan Amos Cali ***@***.***> wrote: Thank you for all the input. Not an easy decision between 1 and 2. We should consider also that the same running mode will happen also during run 3 so what is decided now will be kept for the next 4 HI runs. In this optic I'm still in favor of option 1 also if more complex to implement. As HI we can inject some manpower and update/test locally the Config.DP.Repack but we would need some guidance. However, what is not clear to me is how it is possible to have the failure mode in which the repacking jobs are not crashing but the data results to be corrupted. The majority of our data will indeed not be reconstructed immediately. A subtile failure in the RAW data repacking would be spot only when the streamers file will not be available anymore.

right - that is the risk of introducing complexities into the repack. Its not corrupted that I worry about, its dropped data products due to a job misconfiguration.

Thinking out laud, do you think that option 3 could be feasible? The option 3 would be implement option 2 and include in the reco sequence a raw skimming configuration that takes rawDataReducedFormat and produces rawDataRepacker. Only the rawDataRepacker will go to tape while the original data will be delete.

waiting until prompt is done to archive to tape has a much higher risk of losing the raw data.. (especially given your comments above)

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 · 2018-09-25T12:17:11Z

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

Is it possible to have an EDAlias specific to output file?
Let's say, we are writing in a given process the same type of product from producerA and producerB: producerA goes to file A, producerB to file B. I would like to make the consumers of file A or B to get this product with the same InputTag.

@Dr15Jones please comment if this is a possibility from the framework/edm side.

Dr15Jones · 2018-09-25T12:52:04Z

@slava77 it is not possible.

slava77 · 2018-09-26T12:41:38Z

Based on the inputs so far, I've been thinking of still using options "2" (editing only the reco/processing step).
One possible option here is to modify the RAW2DIGI step, or make a new one, say RAWS2RAW and have it in the standard processing for everything. This way we just do the collection renaming in the same process, not splitting it to multiple processes.

The implementation will be a modified version of RawDataCollectorByLabel. For a standard rawDataCollector FED collection name

it will have its instance rawDataCollector
in the configuration it will have a list of alternative collections to pick, all required to skip the current process name.
- By implementation only the first available collection can be used and if more than one is available in the first or any later event, there will be an exception
- Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step
as an optimization option, if the input has rawDataCollector , this producer doesn't write anything to the event so that the downstream picks up this collection from the inputs.

davidlange6 · 2018-09-26T13:00:53Z

On Sep 26, 2018, at 2:41 PM, Slava Krutelyov ***@***.***> wrote: Based on the inputs so far, I've been thinking of still using options "2" (editing only the reco/processing step). One possible option here is to modify the RAW2DIGI step, or make a new one, say RAWS2RAW and have it in the standard processing for everything. This way we just do the collection renaming in the same process, not splitting it to multiple processes. The implementation will be a modified version of RawDataCollectorByLabel. For a standard rawDataCollector FED collection name • it will have its instance rawDataCollector • in the configuration it will have a list of alternative collections to pick, all required to skip the current process name. • By implementation only the first available collection can be used and if more than one is available in the first or any later event, there will be an exception • Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step • as an optimization option, if the input has rawDataCollector , this producer doesn't write anything to the event so that the downstream picks up this collection from the inputs.

Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..)

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 · 2018-09-26T13:52:47Z

Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..)

If the optimization option is in place (the standard collection is not copied), the overhead is minimal.
in all other cases the overhead in making a copy is perhaps unavoidable and the memory use can improve by using the early deletion.

A solution with a smaller overhead can include switching to producing only a "pointer" to the right FED collection. This will need one time modification to the algorithms consuming the FED collection.

davidlange6 · 2018-09-26T14:11:27Z

On Sep 26, 2018, at 3:53 PM, Slava Krutelyov ***@***.***> wrote: Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..) • If the optimization option is in place (the standard collection is not copied), the overhead is minimal. • in all other cases the overhead in making a copy is perhaps unavoidable and the memory use can improve by using the early deletion. A solution with a smaller overhead can include switching to producing only a "pointer" to the right FED collection. This will need one time modification to the algorithms consuming the FED collection.

one of the L1 pull requests merged today had the same problem.. there I imagine the overhead is low just to ignore..Thinking outloud, putting an InputTag into the event that identifies the "right" FED collection would be less ugly than a pointer-like interface.. [but the only good thing about these solutions is that since they are not in the repack step, they are less likely to cause data loss if buggy]

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

icali · 2018-09-27T17:02:21Z

it is true that using a module running all the time and being part of the standard rawtodigi as a rawtoraw will make it transparent from having datasets with different collection name and it could be extendible without issues and no-one should care anymore about HI or pp data.

Just a naive question, why this mechanism cannot simply be implemented in the flag mechanism. Now we have --data and --repacked. In a previous discussion in this thread it was proposed a way to adjust the --repacked flag to have 2 input collections. Wouldn't be enough to adjust/update the --data flag allowing to understand the tree collections name?

As announcement, we will have a slot in the joint meeting tomorrow to discuss the issue. If we have a sufficient critical mass, it would be very useful.

franzoni · 2018-09-28T10:39:03Z

Greetings,

what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data:
i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration
(cmsDriver does not detect primary dataset name in input).

@slava77 , when do we actually collect RAW data w/ this feature:

Non-standard RAW files/streams with multiple FED collections will have to be configured in
processing to not have the RAWS2RAW step

, if ever?

davidlange6 · 2018-09-28T10:51:40Z

On Sep 27, 2018, at 7:02 PM, Ivan Amos Cali ***@***.***> wrote: it is true that using a module running all the time and being part of the standard rawtodigi as a rawtoraw will make it transparent from having datasets with different collection name and it could be extendible without issues and no-one should care anymore about HI or pp data.

well for those using cmsDriver driven configurations at least.. other configs will need adjusting.

Just a naive question, why this mechanism cannot simply be implemented in the flag mechanism. Now we have --data and --repacked. In a previous discussion in this thread it was proposed a way to adjust the --repacked flag to have 2 input collections. Wouldn't be enough to adjust/update the --data flag allowing to understand the tree collections name?

that is to say to have 3 different config data processing scenarios, picking the right one in the tier0 configuration.. That has been generally disfavored given its additional complexity

…

As announcement, we will have a slot in the joint meeting tomorrow to discuss the issue. If we have a sufficient critical mass, it would be very useful. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

davidlange6 · 2018-09-28T10:59:28Z

On Sep 28, 2018, at 12:39 PM, Giovanni Franzoni ***@***.***> wrote: Greetings, what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data: i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration (cmsDriver does not detect primary dataset name in input). @slava77 , when do we actually collect RAW data w/ this feature: Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step , if ever?

@slava77 is correctly anticipating future (even if currently thought as crazy) scenarios..

…

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

slava77 · 2018-09-28T12:07:40Z

On 9/28/18 3:59 AM, David Lange wrote: > On Sep 28, 2018, at 12:39 PM, Giovanni Franzoni ***@***.***> wrote: > > Greetings, > > what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data: > i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration > (cmsDriver does not detect primary dataset name in input). > > @slava77 , when do we actually collect RAW data w/ this feature: > > Non-standard RAW files/streams with multiple FED collections will have to be configured in > processing to not have the RAWS2RAW step > > , if ever? @slava77 is correctly anticipating future (even if currently thought as crazy) scenarios.. >

I was not thinking of just the future. I think that we have data like this already, although I don't have an example at hand. I would like to be proven wrong though.

fabiocos · 2018-11-11T09:43:56Z

@slava77 @icali @mandrenguyen I wonder whether it would make sense to replace the initial test input data for workflows 140.56 and 140.57 with the first collision data now that they arrive

mandrenguyen · 2018-11-11T10:00:22Z

@fabiocos Good call. I believe the following datasets would be a good choice:
/HIHardProbes/HIRun2018A-v1/RAW
/HIMinimumBiasReducedFormat0/HIRun2018A-v1/RAW
It looks like for run 326383 all detectors were on for all LS.
I believe the policy is to remove the RAW from disk shortly after prompt reco is done. Should we copy the relevant files to the CERN T2?

mandrenguyen · 2018-11-11T10:07:12Z

On closer inspection the tracker was off in 326383 after lumi 243, so the last 20 LS of the run.
So we should either add a lumi mask or consider another run.
Another possibility is 326479 which looks like a short run of 23 LS, where all detectors were on the whole time.

prebello · 2018-11-12T09:34:45Z

hi @mandrenguyen, for confirmation, may we enable 140.56 and 140.57 wfs to use
/HIHardProbes/HIRun2018A-v1/RAW
/HIMinimumBiasReducedFormat0/HIRun2018A-v1/RAW, respectively,
in the run 326479 with LS [1,23] ?

mandrenguyen · 2018-11-12T09:48:11Z

Hi @prebello
Yes !

slava77 · 2019-01-23T21:55:17Z

it looks like this issue can be closed

cmsbuild added the pending-assignment label Sep 21, 2018

cmsbuild added pdmv-pending pending-signatures and removed pending-assignment labels Sep 21, 2018

cmsbuild added alca-pending hlt-pending labels Sep 21, 2018

icali mentioned this issue Oct 7, 2018

Raw data remapper #24819

Merged

fabiocos mentioned this issue Oct 24, 2018

make a production-like workflow for HI 2018 #24587

Closed

zhenhu mentioned this issue Oct 25, 2018

add 2 production-like workflows for HI 2018 #25005

Merged

prebello mentioned this issue Nov 12, 2018

changing RAW input dataset for HI2018 wfs #25210

Merged

slava77 closed this as completed Jan 23, 2019

make a T0-like relval for 2018 HI #24619

make a T0-like relval for 2018 HI #24619

Comments

slava77 commented Sep 21, 2018

cmsbuild commented Sep 21, 2018

slava77 commented Sep 21, 2018

cmsbuild commented Sep 21, 2018

slava77 commented Sep 21, 2018

zhenhu commented Sep 21, 2018

slava77 commented Sep 21, 2018

cmsbuild commented Sep 21, 2018

slava77 commented Sep 21, 2018

zhenhu commented Sep 21, 2018

icali commented Sep 21, 2018

slava77 commented Sep 21, 2018

Dr15Jones commented Sep 21, 2018

icali commented Sep 22, 2018

slava77 commented Sep 22, 2018

slava77 commented Sep 22, 2018

slava77 commented Sep 22, 2018

fwyzard commented Sep 22, 2018 via email

slava77 commented Sep 22, 2018 via email

hufnagel commented Sep 23, 2018 • edited

davidlange6 commented Sep 23, 2018 via email

hufnagel commented Sep 23, 2018

fwyzard commented Sep 23, 2018

hufnagel commented Sep 23, 2018

fwyzard commented Sep 23, 2018 via email

hufnagel commented Sep 23, 2018

davidlange6 commented Sep 24, 2018 via email

fwyzard commented Sep 24, 2018 via email

davidlange6 commented Sep 24, 2018 via email

fwyzard commented Sep 24, 2018

icali commented Sep 25, 2018

davidlange6 commented Sep 25, 2018 via email

slava77 commented Sep 25, 2018

Dr15Jones commented Sep 25, 2018

slava77 commented Sep 26, 2018

davidlange6 commented Sep 26, 2018 via email

slava77 commented Sep 26, 2018

davidlange6 commented Sep 26, 2018 via email

icali commented Sep 27, 2018

franzoni commented Sep 28, 2018

davidlange6 commented Sep 28, 2018 via email

davidlange6 commented Sep 28, 2018 via email

slava77 commented Sep 28, 2018 via email

fabiocos commented Nov 11, 2018

mandrenguyen commented Nov 11, 2018

mandrenguyen commented Nov 11, 2018

prebello commented Nov 12, 2018

mandrenguyen commented Nov 12, 2018

slava77 commented Jan 23, 2019

hufnagel commented Sep 23, 2018 •

edited