Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runTheMatrix.py -i all whitelists CERN even if files are elsewhere #22278

Closed
wmtford opened this issue Feb 20, 2018 · 17 comments · Fixed by #31535
Closed

runTheMatrix.py -i all whitelists CERN even if files are elsewhere #22278

wmtford opened this issue Feb 20, 2018 · 17 comments · Fixed by #31535

Comments

@wmtford
Copy link
Contributor

wmtford commented Feb 20, 2018

Some workflows in runTheMatrix.py with the recycle option seek a GEN_SIM dataset that exists only at FNAL. The dasquery step fails because the command contains the requirement site=T2_CH_CERN. Could a command-line option --allsites, valid in conjunction with -i, be added to runTheMatrix.py that would allow the user of suppress this site requirement?

For example, in CMSSW_10_0_0_pre2 the command

runTheMatrix.py --command=--number=10 -w 2017 -l 10859.0 -i all

generates at step1
dasgoclient --limit 0 --query 'file dataset=/RelValQCD_Pt_3000_3500_13/CMSSW_10_0_0_pre2-100X_upgrade2018_realistic_v1-v1/GEN-SIM site=T2_CH_CERN'

But looking in DAS I find that this dataset exists only at T1_US_FNAL.

The query is generated in

Configuration/PyReleaseValidation/python/MatrixUtil.py

where we find

    if len(self.run) is not 0:
        return ["file {0}={1} run={2} site=T2_CH_CERN".format(query_by, query_source, query_run) for query_run in self.run]
        # return ["file {0}={1} run={2} ".format(query_by, query_source, query_run) for query_run in self.run]
    else:
        return ["file {0}={1} site=T2_CH_CERN".format(query_by, query_source)]
        # return ["file {0}={1} ".format(query_by, query_source)]

I can run successfully by reversing the comments in these two pairs of lines.

@cmsbuild
Copy link
Contributor

A new Issue was created by @wmtford Bill Ford.

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@fabiocos
Copy link
Contributor

assign pdmv

@cmsbuild
Copy link
Contributor

New categories assigned: pdmv

@fabozzi,@prebello,@GurpreetSinghChahal you have been requested to review this Pull request/Issue and eventually sign? Thanks

@prebello
Copy link
Contributor

@fabiocos PdmV has been dealing with it so far. It is an eventual issue, not always happening.
Therefore PdmV doesn't think that this PR should be considered.

@prebello
Copy link
Contributor

-1

@prebello
Copy link
Contributor

@davidlange6, @Dr15Jones, @smuzaffar, @fabiocos
Could you please refresh my mind about the reason to maintain T2_CH_CERN in the whitelist as it is?
It seems that years ago @hengne has requested to change it in the past, receiving complains from offline for IB and PR tests.

@fabozzi
Copy link
Contributor

fabozzi commented Feb 21, 2018

Hi, the command line option --ibeos introduced recently by @smuzaffar with PR #22072
actually should do the work. Right?

@christopheralanwest
Copy link
Contributor

I would like to note that an even more common command does not work as intended:

runTheMatrix.py -l limited -i all

which is suggested as a test before submitting any PR. As a result, a user cannot reproduce all of the PR integration tests. As an example from a recent PR, the workflow

4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC

executes

dasgoclient --limit 0 --query 'file dataset=/Cosmics/Run2011A-v1/RAW run=160960 site=T2_CH_CERN

which gives no results when run by a user, but somehow the same command returns a non-zero list of files when run within the official PR tests. From an earlier reply, I suppose that this happens because the user does not have the same stale cache that is used by the PR integration tests.

If no code changes will be implemented, may I suggest transferring the files used by the integration tests back to T2_CH_CERN using a PhEDEx user category that is not auto-managed to avoid being removed by the dynamic data placement algorithm?

@schneiml
Copy link
Contributor

schneiml commented Feb 22, 2018

@christopheralanwest as [1] shows, there is some ibeos magic involved with the PR test. This explains why there are files in the PR tests.

[1] https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-22262/26154/runTheMatrix-results/4.22_RunCosmics2011A+RunCosmics2011A+RECOCOSD+ALCACOSD+SKIMCOSD+HARVESTDC/cmdLog (ctrl-f 'ibeos')

@wmtford
Copy link
Contributor Author

wmtford commented Feb 22, 2018

With the --ibeos option I can successfully run runTheMatrix.py as a user on lxplus. On other cites (cmslpc at FNAL, or my usual T3_US_COLORADO), the dasquery step works, but then step2 fails, presumably because the ibeos environment isn't compatible with these non-CERN sites:
ERROR executing cd 10808.0_SingleMuPt100+SingleMuPt100_pythia8_2018_GenSimFullINPUT+DigiFull_2018+RecoFull_2018+ALCAFull_2018+HARVESTFull_2018; cmsDriver.py step2 --conditions auto:phase1_2018_realistic -s DIGI:pdigi_valid,L1,DIGI2RAW,HLT:@relval2018 --datatier GEN-SIM-DIGI-RAW -n 10 --geometry DB:Extended --era Run2_2018 --eventcontent FEVTDEBUGHLT --number=10 --filein filelist:step1_dasquery.log --fileout file:step2.root > step2_SingleMuPt100+SingleMuPt100_pythia8_2018_GenSimFullINPUT+DigiFull_2018+RecoFull_2018+ALCAFull_2018+HARVESTFull_2018.log 2>&1; ret= 16896

@smuzaffar
Copy link
Contributor

@wmtford , can you please provide the log file of the failed tests with --ibeos?

@wmtford
Copy link
Contributor Author

wmtford commented Mar 2, 2018

Yes, here is the output of runTheMatrix.py:
matrix_ibeos.txt
The first 8 lines come from the batch system at T3_US_COLORADO.
Files generated by the script:
step1_dasquery.log

step2_DIGI_L1_DIGI2RAW_HLT.py.txt

step2_SingleMuPt100+SingleMuPt100_pythia8_2018_GenSimFullINPUT+DigiFull_2018+RecoFull_2018+ALCAFull_2018+HARVESTFull_2018.log

@mmusich
Copy link
Contributor

mmusich commented Sep 21, 2020

@smuzaffar and @wmtford
was there perhaps any conclusion on this issue?

@carolinecollard for your information.

@smuzaffar
Copy link
Contributor

@prebello , T2_CH_CERN restriction was added to make sure that we transfer dataset/blocks to CERN under ib-relval group. This is not needed any more as now IBs/PR use files from ibeos area. I have no objections on dropping this restriction or adding a command-line option to drop this.

Let me know which option whould you prefer (adding new command-line option is safe though)?

@mmusich
Copy link
Contributor

mmusich commented Sep 21, 2020

I think @prebello left PdmV since some time.
I am adding the current PdmV team instead: @chayanit, @wajidalikhan, @jordan-martins

@chayanit
Copy link

Thanks for pointing this to us. Adding new command-line would be a good option. Could you provide the PR? @smuzaffar

@smuzaffar
Copy link
Contributor

#31535 adds --sites <site> option to select a specific site. Setting it to emptry string will search all sites. Default is T2_CH_CERN

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants