Skip to content

Request Types Arguments

Alan Malta Rodrigues edited this page Mar 16, 2020 · 5 revisions

The following tables contain the description of all the parameters that can be used to configure a request, it adds a small explanation of the parameter and the possible values.

These are all accessible for the users to specify when creating a request. The required arguments MUST be specified by the user, otherwise a validation error will occur; the optional arguments will take the specified default value if no value is given by the user.

Navigating this page

  1. Common Arguments
  2. DataProcessing 0. ReReco
  3. TaskChain
  4. Resubmission

Common arguments (Top)

These arguments affect all request types, unless otherwise specified in the specific request type section.

Required arguments

Argument Type Restrictions Description
Priority Integer 1 to 1M Absolute priority of the request (higher number has higher priority)
Requestor String - Login for the user creating the request
Group String - Requestor's user group
CMSSWVersion String CMSSW release valid name CMSSW release to be used
ScramArch String Production arch for the chosen CMSSW release ScramArch for the CMSSW release
TimePerEvent Number Positive Average time expected to process a single event in the main task, in seconds
Memory Number Positive Average RSS usage expected by a cmsRun process in the main task
SizePerEvent Number Positive Average size contribution to the output for every event in the input, in case of requests with no input then it is the average size of a produced event, in KBytes

Optional arguments

Argument Type Restrictions Description Default
VoGroup String - Virtual Organization Group unknown
VoRole String - Virtual Organization Role unknown
AcquisitionEra String No dashes, first character is a letter Suggested acquisition era for the output datasets None
ProcessingVersion Integer Positive values Suggested processing version for the output datasets 0
ProcessingString String - Suggested processing string for the output datasets -
SiteBlacklist Comma separated list Elements of the list are CMS sites Suggested site blacklist for the request []
SiteWhitelist Comma separated list Elements of the list are CMS sites Suggested site whitelist for the request []
UnmergedLFNBase String - Suggested unmerged LFN base /store/unmerged
MergedLFNBase String - Suggested merged LFN base /store/data
MinMergeSize Integer Positive values Suggested minimum size for merged files, in bytes 2 GiB
MaxMergeSize Integer Positive values Suggested maximum size for merged files, in bytes 4 GiB
MaxWaitTime Integer Positive values Suggested time that a unmerged file will wait before being merged forcefully, in seconds 1 day
MaxMergeEvents Integer Positive values Suggested maximum number of events in merged files 100k
ValidStatus String Either VALID or PRODUCTION DBS status of the output datasets PRODUCTION
DbsUrl String Valid URL URL to the DBS instance where the input data is registered http://cmsdbsprod.cern.ch/cms_dbs_prod_global/servlet/DBSServlet
DashboardHost String Valid URL URL to the dashboard instance where the request information will be reported cms-wmagent-job.cern.ch
DashboardPort Integer Positive values Port for the dashboard instance 8884
OverrideCatalog String - Path to the override catalog to use for reading the input data in the jobs -
RunNumber Integer - Run number for scenario-based DQM harvesting 0
PeriodicHarvestInterval Integer Positive value or zero Frequency for periodic DQM harvesting in seconds, 0 means no periodic harvesting. 0
DQMUploadProxy String - Fixed location of the proxy to use for the DQM upload (Tier-0 only) -
DQMUploadUrl String - URL to the DQM instance where the harvested data will be uploaded https://cmsweb.cern.ch/dqm/dev
DQMSequences Comma separated list - List of DQM sequences for DQM scenario-based harvesting []
DQMConfigCacheID String Valid ID for a CouchDB document Configuration for DQM harvesting -
EnableHarvesting String "True" or "False" Indicates if harvesting should be enable on DQM output False
EnableNewStageout String "True" or "False" Indicates if the new stageout plugins should be used in runtime False
IncludeParents String "True" or "False" Indicates if the first task processing should include parents or not False
Multicore String "auto" or number of cores Indicates if multicore CMSSW should be used and with how many cores -

DataProcessing (Top)

The following arguments are shared by MonteCarloFromGEN, ReReco and ReDigi requests.

Required arguments

Argument Type Restrictions Description
InputDataset String Dataset format Input dataset for the request
GlobalTag String - Global tag for the processing jobs

Optional arguments

Argument Type Restrictions Description Default
ConfigCacheUrl String Couch URL format URL to the CouchDB where the configuration documents are stored Defaults to the CouchDB instance configured in the ReqMgr
OpenRunningTimeout Integer Positive or zero Time that the WorkQueue will wait for new blocks to appear in the input dataset before closing the request 0
BlockBlacklist Comma separated list Elements of the list are valid blocks List of blocks to be excluded from the input dataset in the processing []
BlockWhitelist Comma separated list Elements of the list are valid blocks List of blocks from the input dataset to be processed []
RunBlacklist Comma separated list Elements of the list are positive numbers List of runs to be excluded from the input dataset in the processing []
RunWhitelist Comma separated list Elements of the list are positive numbers List of runs from the input dataset to be processed []
SplittingAlgo String EventAwareLumiBased, EventBased, LumiBased or FileBased Splitting algorithm for the jobs EventAwareLumiBased
EventsPerJob Integer Positive Desired events per job if EventAwareLumiBased or EventBased are chosen as the splitting algorithm Adjusted to produce 8h jobs based on the specified time per event
LumisPerJob Integer Positive Desired lumis per job if LumiBased is chosen as the splitting algorithm 8
FilesPerJob Integer Positive Desired files per job if FileBased is chosen as the splitting algorithm 1

ReReco (Top)

Required arguments

Argument Type Restrictions Description
ConfigCacheID String ID of a valid CouchDB document Configuration to be used in the processing jobs

Optional arguments

Argument Type Restrictions Description Default
TransientOutputModules Comma separated list The modules must exist as output of the processing task and must have at least one skim that uses them as input It indicates output modules in the task that won't be stored permanently, i.e. not merged []

Skim arguments

ReReco has the option of multiple skim tasks specified through the following arguments. All of them are initially optional, however if SkimName#N is defined for "#N" then SkimInput#N and Skim#NConfigCacheID become not optional. Note that these definitions use "#N" as replacement for the skim number which must be sequential from 1 to the number of desired skims.

Argument Type Restrictions Description Default
SkimName#N String - Name for the Nth skim -
SkimInput#N String - Output module to use as input for the Nth skim -
Skim#NConfigCacheID String ID for a valid CouchDB document Configuration to be used in the jobs of the Nth skim
SkimTimePerEvent#N Number Positive Average time expected to process a single event in the Nth skim, in seconds The time per event of the processing task
SkimMemory#N Number Positive Average RSS usage expected by a cmsRun process in the Nth skim Memory specified for the processing task
SkimSizePerEvent#N Number Positive Average size contribution to the output for every event in the input for the Nth skim, since the number of events in the ouput of the skim is less than the input then this value should be equal to the size of an event in the output times the skim efficiency Size pere event of the processing task, in KBytes
SkimSplittingAlgo#N String EventAwareLumiBased, EventBased, LumiBased or FileBased Splitting algorithm for the Nth skim jobs FileBased
SkimEventsPerJob#N Integer Positive Desired events per job if EventAwareLumiBased or EventBased are chosen as the splitting algorithm for the Nth skim Adjusted to produce 8h jobs based on the specified time per event
SkimLumisPerJob#N Integer Positive Desired lumis per job if LumiBased is chosen as the splitting algorithm for the Nth skim 8
SkimFilesPerJob#N Integer Positive Desired files per job if FileBased is chosen as the splitting algorithm for the Nth skim 1

Notes

  • Although optional, StepOneOutputModuleName becomes mandatory if StepTwoConfigCacheID is specified. Same happens with StepTwoOutputModuleName if StepThreeConfigCacheID is specified.

TaskChain (Top)

TaskChain is a special workflow where individual tasks are specified as dictionary inside the main request dictionary.

Main dictionary arguments

Required arguments

Argument Type Restrictions Description
TaskChain Integer Positive Number of tasks in the request
GlobalTag String - Global tag for the processing jobs

Optional arguments

Argument Type Restrictions Description Default
ConfigCacheUrl String Couch URL format URL to the CouchDB where the configuration documents are stored Defaults to the CouchDB instance configured in the ReqMgr
FirstEvent Integer Positive Number for the first event to be produced, provided the top task is generation 1
FirstLumi Integer Positive Number for the first lumi to be produced, provided the top task is generation 1
IgnoredOutputModules Comma separated list - List of output modules that won't be staged out from the worker nodes []

Task dictionary arguments

Required arguments

Argument Type Restrictions Description
TaskName String - Unique name for the task
ConfigCacheID String ID for a valid CouchDB document Configuration to be used in the jobs of this task
PrimaryDataset String - Primary dataset for the output of the task, it is only mandatory in the first task if it is a generator task, otherwise it is optional
InputDataset String Dataset format Input dataset for the task, it is only mandatory and has effect on the first task if it is not a generator task
InputTask String - Name of the task that serves as input for this task, it is not required in the first task
InputFromOutputModule String - Name of the output module that serves as input for this task, it is not required in the first task
RequestNumEvents Integer Positive Number of requested events, only required if it is the first task and is a generation task

Optional arguments

Argument Type Restrictions Description Default
KeepOutput String "True" or "False" Indicates if the output for the current task should be stored permanently (i.e. merged) True
Seeding String AutomaticSeeding or ReproducibleSeeding Seeding to the be used in the production jobs AutomaticSeeding
MCPileup String Dataset format Dataset to be used in the MixingModule in the jobs -
DataPileup String Dataset format Dataset to be used in the DataMixingModule in the jobs -
TransientOutputModules Comma separated list The modules must exist as output of the task and must have at least one subsequent task that uses them as input It indicates output modules in the task that won't be stored permanently, i.e. not merged []
BlockBlacklist Comma separated list Elements of the list are valid blocks List of blocks to be excluded from the input dataset in the processing []
BlockWhitelist Comma separated list Elements of the list are valid blocks List of blocks from the input dataset to be processed []
RunBlacklist Comma separated list Elements of the list are positive numbers List of runs to be excluded from the input dataset in the processing []
RunWhitelist Comma separated list Elements of the list are positive numbers List of runs from the input dataset to be processed []
SplittingAlgo String EventAwareLumiBased, EventBased, LumiBased or FileBased Splitting algorithm for the jobs EventAwareLumiBased
EventsPerJob Integer Positive Desired events per job if EventAwareLumiBased or EventBased are chosen as the splitting algorithm Adjusted to produce 8h jobs based on the specified time per event
LumisPerJob Integer Positive Desired lumis per job if LumiBased is chosen as the splitting algorithm 8
FilesPerJob Integer Positive Desired files per job if FileBased is chosen as the splitting algorithm 1

Notes

  • The number of task dictionaries must be equal to the number in the TaskChain argument.

Resubmission (Top)

Resubmission requests don't share the arguments defined as common above.

Required arguments

Argument Type Restrictions Description
InitialTaskPath String - Task to recover with the resubmission request
ACDCServer String Valid couch URL URL to the server where the ACDC records are located
ACDCDatabase String - Name of the CouchDB database that holds the ACDC records

Optional arguments

Argument Type Restrictions Description Default
CollectionName String - Alternative name for the collection record in the ACDC database Equal to the original request name
IgnoredOutputModules Comma separated list - List of output modules that won't be staged out from the worker nodes []

Clone this wiki locally