# Tips of running jobs when datasets are not published yet

There are two ways of doing this, <strong> lxbatch jobs </strong> and <strong>local crab jobs </strong>. For both options, we need to first grab the lists of files (which can be opened by xrootd) 

## <hr> 1. To grep the list of files

    ls (or eos) -l /directory/of/your/files/ |grep [common Prefix] | awk '{print""$[the nth column from -l]}' > filename.list

<strong>For example,</strong> 

    eos ls /store/group/phys_heavyions/ztu/reco/HIPhysicsMinBiasUPC/v0/hlt/histo/v5/000/262/548/ |grep PbPb| awk '{print"root://eoscms.cern.ch//eos/cms/store/group/phys_heavyions/ztu/reco/HIPhysicsMinBiasUPC/v0/hlt/histo/v5/000/262/548/"$1}' > file.list

## <hr> 2. To Merge file on EOS

    cmsLs /store/group/phys_heavyions/ztu/reco/HIPhysicsMinBiasUPC/v0/hlt/histo/v5/000/262/548/ |grep PbPbhisto| awk -v p="" '{if ($1!="") p=p" root://eoscms.cern.ch//eos/cms/store/group/phys_heavyions/ztu/reco/HIPhysicsMinBiasUPC/v0/hlt/histo/v5/000/262/548/"$1}; 

    END{print "hadd PbPbhisto_run262548_v5.root" p}' > source_me

<strong> then Type </strong>
    
    . source_me 

## <hr> 3. To xrdcp (xrootd copy) with a large number of files

The idea is similiar. To grab the list of files and add the xrdcp at the beginning, and leave a space at the end. 

    eos ls /store/group/phys_heavyions/velicanu/HIRun2015/HIMinimumBias2/RAW/v0/000/262/620/ |grep FEVT| awk '{print"xrdcp root://xrootd-cms.infn.it//store/group/phys_heavyions/velicanu/HIRun2015/HIMinimumBias2/RAW/v0/000/262/620/"$1 " ./"$1}' > v0_262_620.list




## 4. To prepare the lxbatch jobs

- <strong> Add this at the top of your config file </strong> (cmsRun config.py)
    
<pre><code>from FWCore.ParameterSet.VarParsing import VarParsing
options = VarParsing('analysis')
options.register ('isPP',
                  False,
                  VarParsing.multiplicity.singleton,
                  VarParsing.varType.bool,
                  "Flag if this is a pp simulation")
options.parseArguments() </pre></code>

- <strong> Replace the relevant blocks with these three </strong>

<pre><code>process.source = cms.Source("NewEventStreamFileReader",
    fileNames = cms.untracked.vstring(options.inputFiles[0])
)

fileName = cms.untracked.string(options.outputFile),

process.maxEvents = cms.untracked.PSet(
    input = cms.untracked.int32(options.maxEvents)
)</pre></code>

- <strong> Always test it before submitting the jobs </strong> 

<pre><code> cmsRun config.py outputFile=test.root maxEvents=3 inputFiles= [input files] </pre></code>

<hr>
- <strong> Prepare another script to submit the lxbatch jobs </strong>
    - The config file name needs to be replaced
    - The output name can be changed if you want.
    

Please refer to <dfn> submitexample.py </dfn> as an example. 

- <strong> Submit the jobs </strong>

<strong> check what queques you want to submit to: </strong>
<pre><code> bqueues </pre></code>

<strong> then </strong>

<pre><code> python submitexample.py -q [queue name] -o [dir/of/output] -i [file.list] </pre></code>

<strong> Use </strong>

    bjobs
    
<strong> to check jobs status </strong>

<strong>Or if you want to kill a specific jobs </strong>

    bkills -J [JOB_NAME]
 
<strong> Or if you want to kill all </strong>

    bkills


## <hr> 5. crab local jobs

crab local jobs are easier than lxbatch. The difference are sometimes the lxbatch job may run faster because the input files are usually at CERN.

<strong> Here is the slightly modifed crab3 local job config file: </strong>

<pre><code>
from CRABClient.UserUtilities import config, getUsernameFromSiteDB
config = config()
config.General.requestName = 'demo_v1'
config.General.workArea = 'demo_v1'
config.General.transferOutputs = True
config.General.transferLogs = True
config.JobType.allowUndistributedCMSSW = True

config.JobType.pluginName = 'Analysis'
config.JobType.psetName = 'config.py'

config.Data.userInputFiles = list(open('file.list'))
config.Data.splitting = 'FileBased'
config.Data.ignoreLocality = False
config.Data.unitsPerJob = 5
config.Data.outLFNDirBase = '/store/user/%s/' % (getUsernameFromSiteDB())
config.Data.publication = False
config.Site.storageSite = 'T2_US_MIT'
</pre></code>

<strong> then </strong>

    crab submit -c [crab config name]