Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make a T0-like relval for 2018 HI #24619

Closed
slava77 opened this issue Sep 21, 2018 · 48 comments
Closed

make a T0-like relval for 2018 HI #24619

slava77 opened this issue Sep 21, 2018 · 48 comments

Comments

@slava77
Copy link
Contributor

slava77 commented Sep 21, 2018

[as a follow up to the discussion in the joint ops meeting on Sep 21]
please add a relval matrix workflow to be used for testing T0-like setup in offline environment and IBs. Initial setup can be based on the 2018 MD3 HI test runs.
Once we start running, another or this setup can be updated to use actual data.

@franzoni

@cmsbuild
Copy link
Contributor

A new Issue was created by @slava77 Slava Krutelyov.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos, @kpedro88 can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@slava77
Copy link
Contributor Author

slava77 commented Sep 21, 2018

assign pdmv

@cmsbuild
Copy link
Contributor

New categories assigned: pdmv

@prebello,@pgunnell,@zhenhu you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor Author

slava77 commented Sep 21, 2018

the best setup for this would be to use the files with data content before repacking of the FED data so that this relval test also includes the reHLT part as well

@zhenhu
Copy link
Contributor

zhenhu commented Sep 21, 2018

Hi @slava77 , could you please give us a bit more information about this workflow? such as the input, the era, the conditions, etc.

@slava77
Copy link
Contributor Author

slava77 commented Sep 21, 2018

assign alca,hlt

I'm adding AlCa to the thread to provide inputs on the conditions and perhaps ALCA parts of the workflows.
I'm also adding HLT to suggest a configuration.

I think that a fraction of this workflow will be like wf 140.55, only using Run2_2018_pp_on_AA era and scenario --pp

@cmsbuild
Copy link
Contributor

New categories assigned: hlt,alca

@lpernie,@franzoni,@pohsun,@tocheng,@Martin-Grunewald,@fwyzard you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor Author

slava77 commented Sep 21, 2018

@mandrenguyen @icali
please follow and/or advise here as well in order to get the right inputs

@zhenhu
Copy link
Contributor

zhenhu commented Sep 21, 2018

I chatted with @mmusich and got some receipt on how to modify wf 140.55.

  1. We will update the input dataset with the MD3 data
  2. We need a menu from HLT which does the hybrid ZS + repacking (something similar to HYBRIDZSHI2015)
  3. The rest steps in 140.55, such as 'RECOHID15','HARVESTDHI15', can be reused. But we need to change the conditions for 2018 detector.
    What we are missing is mainly the 2nd item above. Please let me know if you have any comments.

@icali
Copy link
Contributor

icali commented Sep 21, 2018

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.

Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

@slava77
Copy link
Contributor Author

slava77 commented Sep 21, 2018

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.

Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

I'm a bit confused about the possible variety of content.
Do we have a one-to-one mapping of PDs to output FED collection names? (one PD <-> one FED name).
If so, the T0 configuration will just need to deal with a different FED collection name pickup.

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

IIRC, using MassReplaceInputTag and doing something at T0 repacking steps are two different solutions

  1. instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications. @Dr15Jones can EDAlias be used to "rename" a product made in another process?
  2. in the regular T0 (or rereco) configuration apply MassReplaceInputTag depending on which FED collection name the input dataset has.
    • recall that for the HI configuration we use the ConfigBuilder --repacked option to make this renaming of input tags from rawDataRepacker to rawDataCollector
    • similar to this, one can add --reduced flag and copy-paste/edit the implementation
      expertSettings.add_option("--repacked",
      help="When the input file is a file with repacked raw data with label rawDataRepacker",
      action="store_true",
      default=False,
      dest="isRepacked"
      and
      if self._options.isRepacked:
      self.pythonCfgCode +="\n"
      self.pythonCfgCode +="from Configuration.Applications.ConfigBuilder import MassReplaceInputTag\n"
      self.pythonCfgCode +="MassReplaceInputTag(process)\n"
      MassReplaceInputTag(self.process)
      with MassReplaceInputTag(process, new="rawDataReducer")

My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data.
Option 2. is more practical.

@Dr15Jones
Copy link
Contributor

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

@icali
Copy link
Contributor

icali commented Sep 22, 2018

adding @FHead and @stahlleiton that are going to implement the HLT menu.

in the HLT menu that was provided was the /cdaq/special/HeavyIonTest2018/TS2TestFull/V4. However, the menu contains data with the MB ReducedFormat collection that however still have the same rawDataRepacker name.
Any advice on how we can create the configuration able to manage both the rawDataRepacker and rawDataRepackerReducedFormat collections?

I'm a bit confused about the possible variety of content.
Do we have a one-to-one mapping of PDs to output FED collection names? (one PD <-> one FED name).
If so, the T0 configuration will just need to deal with a different FED collection name pickup.

Yes, there is going a one-to-one mapping of PDs to output FED collection. In order to close the loop, I would propose the following naming convention:

  • PDs without the HI in the name (standard pp name): rawDataCollector
  • PDs with HI in the name and without the mention ReducedFormat: rawDataRepacker
  • PDs with HI in the name and with the mention ReducedFormat: rawDataReducedFormat (I removed to Repacker from the previous proposed name to shorten the name)

Please let us know if this convention looks reasonable to you. Any name suggestion for the reduced format collection/PDs is more than welcome.

We have been advices to use the module FWCore.ParameterSet.MassReplaceInputTag. In which part of the workflow can we apply it? probably in the T0 repacking step would be the best place so to have all the RAW data on tape with the same name collection.

IIRC, using MassReplaceInputTag and doing something at T0 repacking steps are two different solutions

  1. instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications. @Dr15Jones can EDAlias be used to "rename" a product made in another process?

  2. in the regular T0 (or rereco) configuration apply MassReplaceInputTag depending on which FED collection name the input dataset has.

    • recall that for the HI configuration we use the ConfigBuilder --repacked option to make this renaming of input tags from rawDataRepacker to rawDataCollector

    • similar to this, one can add --reduced flag and copy-paste/edit the implementation

        [cmssw/Configuration/Applications/python/Options.py](https://github.com/cms-sw/cmssw/blob/1cd19ae3348cf8de750490004c266d2e8b48b328/Configuration/Applications/python/Options.py#L285-L289)
      
      
          Lines 285 to 289
        in
        [1cd19ae](/cms-sw/cmssw/commit/1cd19ae3348cf8de750490004c266d2e8b48b328)
      
      
      
      
      
          
            
             expertSettings.add_option("--repacked", 
          
      
          
            
                                       help="When the input file is a file with repacked raw data with label rawDataRepacker", 
          
      
          
            
                                       action="store_true", 
          
      
          
            
                                       default=False, 
          
      
          
            
                                       dest="isRepacked" 
      

      and

        [cmssw/Configuration/Applications/python/ConfigBuilder.py](https://github.com/cms-sw/cmssw/blob/1cd19ae3348cf8de750490004c266d2e8b48b328/Configuration/Applications/python/ConfigBuilder.py#L2219-L2223)
      
      
          Lines 2219 to 2223
        in
        [1cd19ae](/cms-sw/cmssw/commit/1cd19ae3348cf8de750490004c266d2e8b48b328)
      
      
      
      
      
          
            
             if self._options.isRepacked: 
          
      
          
            
                 self.pythonCfgCode +="\n" 
          
      
          
            
                 self.pythonCfgCode +="from Configuration.Applications.ConfigBuilder import MassReplaceInputTag\n" 
          
      
          
            
                 self.pythonCfgCode +="MassReplaceInputTag(process)\n" 
          
      
          
            
                 MassReplaceInputTag(self.process) 
      

      with MassReplaceInputTag(process, new="rawDataReducer")

My guess is that option 1. will require more time than we have to develop and will attempt to add complexity to the repacking step with an increased chance of errors. Errors in the repacking step are likely a complete loss of data.
Option 2. is more practical.

I personally would have preferred option 1 because it would simplify the operations for any future RAW data manipulation. However, if it is less "safe", lets go with option 2. Could it be possible to add also the second collection name to the same --repacker flag? It would/could simplify the operation for future raw processing.
Thank you again!

@slava77
Copy link
Contributor Author

slava77 commented Sep 22, 2018

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

Is it possible to have an EDAlias specific to output file?
Let's say, we are writing in a given process the same type of product from producerA and producerB: producerA goes to file A, producerB to file B. I would like to make the consumers of file A or B to get this product with the same InputTag.

@slava77
Copy link
Contributor Author

slava77 commented Sep 22, 2018

Please let us know if this convention looks reasonable to you. Any name suggestion for the reduced format collection/PDs is more than welcome.

these look OK to me

Could it be possible to add also the second collection name to the same --repacker flag? It would/could simplify the operation for future raw processing.

I think this will work even better, it just needs a bit more creative coding (not just copy-paste/replace that I proposed now).

@slava77
Copy link
Contributor Author

slava77 commented Sep 22, 2018

[in the T0 repacking step] 1. instruct the repacking job to "rename" rawDataRepackerReducedFormat to rawDataCollector (the standard name) by copying or with an EDAlias. This way all downstream consumers of this data will be able to read the FED data without any modifications.

I personally would have preferred option 1 because it would simplify the operations for any future RAW data manipulation.

@drkovalskyi @hufnagel may want to comment on feasibility of this request for the T0 developments (to be delivered in ~4 weeks).

@fwyzard
Copy link
Contributor

fwyzard commented Sep 22, 2018 via email

@slava77
Copy link
Contributor Author

slava77 commented Sep 22, 2018 via email

@hufnagel
Copy link

hufnagel commented Sep 23, 2018

Repack configurations are generated in

Configuration.DataProcessing.Repack

and we would need a tweak there to convert the data products somehow and generate the correct output. I don't see how producing that tweak and testing it standalone falls anywhere under Tier0 development.

Once this configuration tweak has been produced and is integrated into Config.DP, this would turn into a Tier0 testing/validation issue though. But I don't see that as a big problem assuming the previous standalone testing was thorough.

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 23, 2018 via email

@hufnagel
Copy link

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

Next level would be passing a parameter to Configuration.DataProcessing.Repack that configures whether or not we get a standard repack or this new thing. Yeah, that would need Tier0 development work.

Repack errors that crash CMSSW cause paused jobs which block streamer deletions. As long as we don't run out of space at P5 it even blocks streamer deletion there. A repack error due to bad configuration almost certainly isn't recoverable within the same Tier0 instance though. You are talking about having to do recovery replays here.

The real problematic case is repack errors that don't crash CMSSW. You have 7 days to notice that normally, but if we are very busy it could be less (since we delete streamers more aggressively then).

@fwyzard
Copy link
Contributor

fwyzard commented Sep 23, 2018

You mean you wouldn't be able to handle this within one repack configuration that auto-detects what it's supposed to be doing ? Why not ?

I would say because you need to look at the data to figure out what to do, and you cannot do that at configuration level.

@hufnagel
Copy link

Frankly, the only piece of the Tier0 that looks at the data are the CMSSW jobs. Nothing else cares about how the 0's and 1's are organized in the data files.

So how would this work then ? We trust that HLT puts such data into a special stream and we configure that stream to be repacked in a special way ?

Either way, before there can be any Tier0 development here, someone needs to create a valid repack configuration for this data. That repack configuration needs to be tested standalone and then needs to be integrated into Config.DP.Repack (add a parameter to activate it that is false by default for instance).

Once all of that is in place, doing the Tier0 development work to create a config flag to enable this for a stream and use it shouldn't be that much work. Few days for the code changes, longer for the Tier0Ops validation/testing (could be much longer if the Config.DP.Repack changes weren't done correctly before).

@fwyzard
Copy link
Contributor

fwyzard commented Sep 23, 2018 via email

@hufnagel
Copy link

True, except I would say that the Tier-0 development and validation can happen in parallel to the development of the different repacking configurations.

The part that can happen in parallel consists of adding a dozen code lines spread across a few places. Which is not difficult assuming you know what these few places are (which I think I do).

The vast majority of the Tier0 development will consist of running this against a new CMSSW release with various stream configurations to make sure it works correctly (extracting and looking at the generated repack configurations). It's a bit pointless starting any of this without having a CMSSW release with the Config.DP.Repack changes in place.

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 24, 2018 via email

@fwyzard
Copy link
Contributor

fwyzard commented Sep 24, 2018 via email

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 24, 2018 via email

@fwyzard
Copy link
Contributor

fwyzard commented Sep 24, 2018

Yes, but the inconsistency would only be in the "process name" part of the collections, which most configurations ignore anyway.

So we should be able to run the same downstream configuration on all the inputs, and still have a simple way to check from the data what one is running on (and differentiate if needed).

@icali
Copy link
Contributor

icali commented Sep 25, 2018

Thank you for all the input. Not an easy decision between 1 and 2. We should consider also that the same running mode will happen also during run 3 so what is decided now will be kept for the next 4 HI runs. In this optic I'm still in favor of option 1 also if more complex to implement. As HI we can inject some manpower and update/test locally the Config.DP.Repack but we would need some guidance.

However, what is not clear to me is how it is possible to have the failure mode in which the repacking jobs are not crashing but the data results to be corrupted. The majority of our data will indeed not be reconstructed immediately. A subtile failure in the RAW data repacking would be spot only when the streamers file will not be available anymore.

Thinking out laud, do you think that option 3 could be feasible? The option 3 would be implement option 2 and include in the reco sequence a raw skimming configuration that takes rawDataReducedFormat and produces rawDataRepacker. Only the rawDataRepacker will go to tape while the original data will be delete.

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 25, 2018 via email

@slava77
Copy link
Contributor Author

slava77 commented Sep 25, 2018

@Dr15Jones can EDAlias be used to "rename" a product made in another process?

No.

Is it possible to have an EDAlias specific to output file?
Let's say, we are writing in a given process the same type of product from producerA and producerB: producerA goes to file A, producerB to file B. I would like to make the consumers of file A or B to get this product with the same InputTag.

@Dr15Jones please comment if this is a possibility from the framework/edm side.

@Dr15Jones
Copy link
Contributor

@slava77 it is not possible.

@slava77
Copy link
Contributor Author

slava77 commented Sep 26, 2018

Based on the inputs so far, I've been thinking of still using options "2" (editing only the reco/processing step).
One possible option here is to modify the RAW2DIGI step, or make a new one, say RAWS2RAW and have it in the standard processing for everything. This way we just do the collection renaming in the same process, not splitting it to multiple processes.

The implementation will be a modified version of RawDataCollectorByLabel. For a standard rawDataCollector FED collection name

  • it will have its instance rawDataCollector
  • in the configuration it will have a list of alternative collections to pick, all required to skip the current process name.
    • By implementation only the first available collection can be used and if more than one is available in the first or any later event, there will be an exception
    • Non-standard RAW files/streams with multiple FED collections will have to be configured in processing to not have the RAWS2RAW step
  • as an optimization option, if the input has rawDataCollector , this producer doesn't write anything to the event so that the downstream picks up this collection from the inputs.

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 26, 2018 via email

@slava77
Copy link
Contributor Author

slava77 commented Sep 26, 2018

Any idea about the performance overhead of this? (I guess another way to exploit the delete early mechanism..)

  • If the optimization option is in place (the standard collection is not copied), the overhead is minimal.
  • in all other cases the overhead in making a copy is perhaps unavoidable and the memory use can improve by using the early deletion.

A solution with a smaller overhead can include switching to producing only a "pointer" to the right FED collection. This will need one time modification to the algorithms consuming the FED collection.

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 26, 2018 via email

@icali
Copy link
Contributor

icali commented Sep 27, 2018

it is true that using a module running all the time and being part of the standard rawtodigi as a rawtoraw will make it transparent from having datasets with different collection name and it could be extendible without issues and no-one should care anymore about HI or pp data.

Just a naive question, why this mechanism cannot simply be implemented in the flag mechanism. Now we have --data and --repacked. In a previous discussion in this thread it was proposed a way to adjust the --repacked flag to have 2 input collections. Wouldn't be enough to adjust/update the --data flag allowing to understand the tree collections name?

As announcement, we will have a slot in the joint meeting tomorrow to discuss the issue. If we have a sufficient critical mass, it would be very useful.

@franzoni
Copy link

Greetings,

what proposed here has the BIG benefit that we don't need to discriminate between 3 different names "from memory" when setting up cmsDrivers to process data:
i.e. no need to remember in the future (and we'll fail to remember) that the HIN data of 2018 have a PD-dependent cmsDriver configuration
(cmsDriver does not detect primary dataset name in input).

@slava77 , when do we actually collect RAW data w/ this feature:

Non-standard RAW files/streams with multiple FED collections will have to be configured in
processing to not have the RAWS2RAW step

, if ever?

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 28, 2018 via email

@davidlange6
Copy link
Contributor

davidlange6 commented Sep 28, 2018 via email

@slava77
Copy link
Contributor Author

slava77 commented Sep 28, 2018 via email

@fabiocos
Copy link
Contributor

@slava77 @icali @mandrenguyen I wonder whether it would make sense to replace the initial test input data for workflows 140.56 and 140.57 with the first collision data now that they arrive

@mandrenguyen
Copy link
Contributor

@fabiocos Good call. I believe the following datasets would be a good choice:
/HIHardProbes/HIRun2018A-v1/RAW
/HIMinimumBiasReducedFormat0/HIRun2018A-v1/RAW
It looks like for run 326383 all detectors were on for all LS.
I believe the policy is to remove the RAW from disk shortly after prompt reco is done. Should we copy the relevant files to the CERN T2?

@mandrenguyen
Copy link
Contributor

On closer inspection the tracker was off in 326383 after lumi 243, so the last 20 LS of the run.
So we should either add a lumi mask or consider another run.
Another possibility is 326479 which looks like a short run of 23 LS, where all detectors were on the whole time.

@prebello
Copy link
Contributor

hi @mandrenguyen, for confirmation, may we enable 140.56 and 140.57 wfs to use
/HIHardProbes/HIRun2018A-v1/RAW
/HIMinimumBiasReducedFormat0/HIRun2018A-v1/RAW, respectively,
in the run 326479 with LS [1,23] ?

@mandrenguyen
Copy link
Contributor

Hi @prebello
Yes !

@slava77
Copy link
Contributor Author

slava77 commented Jan 23, 2019

it looks like this issue can be closed

@slava77 slava77 closed this as completed Jan 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests