New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TaskPublish tries to insert illegal dataset #7359
Comments
so the question is how tm_publish_name = NanoTuples-V2p0_lpclonglived-crab_PrivateProduction_Fall18_DR_step3_VLLPair_VLLToTauS_MVLL1000_MS10_ctau1000_batch3_v1-3ee3afd6b5a1410aea6d0b4d52723d06-00000000000000000000000000000000 tm_output_lfn = /store/group/lpcdihiggsboost/NanoTuples/V2p0/MC_Autumn18/v1/ passed through the checks in
given that i.e.
should produce a string of length 15+184+1=200 and Lexicon.procdataset has a limit at 199
|
FWIW I verified that removing one char from the publishname, the |
first finding. In that task submission, when RESTUserWorkflow is called, the argument This actually OK and to be expected since user did not set So why do we end up with groupname in the outputdataset name ? outdataset is set in PostJob and stored in FMD I checked FMD for this task at
|
Problem is likely in PostJob and likely there since a long time, if not ever :-( |
problem started with this commit 86cc6a5 So it is 5 months that we always put groupname in outputdataset name even if the user did not ask for it. It looks like this new naming is not bothering anybody and maybe at this point is better to live with this and simply remove the flexibility. One parameter less to worry about. actions:
|
looks like in that commit lines like these CRABServer/src/python/TaskWorker/Actions/PostJob.py Lines 2340 to 2345 in 33bffca
became like CRABServer/src/python/TaskWorker/Actions/PostJob.py Lines 2724 to 2733 in 86cc6a5
i.e. I did not port over the check on |
so the question is wheter to do 1. or 2.
@mapellidario @novicecpp @dciangot @KatyEllis any advice ? (I like to share blame for difficult decisions !) I am leaning for 2. as indicated above in #7359 (comment) maybe we can talk about this next Tuesday, hopefully the failure which made me look is a rare one. |
I'd say prevent the illegal names seeping through, but I may not have understood the full consequences. |
let's recap. In the following you can assume Behavior 1: no matter where user sends the output, data is published in a dataset named like Behavior 2: the configuration flag WHere problem arise: There are two possible solutions: I guess one could check in Task table of CRAB Oracel DB to find out how much that parameter was used in the past. I suspect one of those features added on "one user whim" and which people can live without. Hope I made things more clear. I agree that it is not easy to pick, but I do not want to have a "referendum" among users either. |
|
so conclusion is that currently we publish as |
we reached consensus on sticking with Behavior 2. I.e. keep current dataset names and
|
hmmm.. need to remove extra option from client first, or server will complain about extra keyword. |
* deprecate publishWithGroupName as per dmwm/CRABServer#7359 * pylint * pylint
CRABClient updated (tag v3.221004 ) and on its way to IB crab-dev cms-sw/cmsdist#8118
|
REST and TW changes are now in my branch and I checked that things works using new client from GH. |
one bit was still missing in REST side. I updated #7359 (comment) |
crab client update requested in cms-sw/cmsdist#8159 |
CRABClient is now updated:
time to finalize |
tagged in v3.221114 |
v3.221114 deployed in preprod and validated https://github.com/dmwm/CRABServer/releases/tag/v3.221114 |
in this task
https://cmsweb.cern.ch/crabserver/ui/task/220801_190609%3Aapresyan_crab_VLLPair_VLLToTauS_MVLL1000_MS10_ctau1000_batch2_v1
CRAB eventually construct this output dataset name
/VLLPair_VLLToTauS_MVLL1000_MS10_ctau1000/lpcdihiggsboost-NanoTuples-V2p0_lpclonglived-crab_PrivateProduction_Fall18_DR_step3_VLLPair_VLLToTauS_MVLL1000_MS10_ctau1000_batch3_v1-3ee3afd6b5a1410aea6d0b4d52723d06-cd471944433cef30a1e69a7cb38aa7e8/USER
which is illegal (second piece between
/
is 200-char long instead of a max of 199)so when TaskPublish tries to use it, it fails immediately at
CRABServer/src/python/Publisher/TaskPublish.py
Line 663 in 644156d
when trying to check what was published already (if any) becasue DBS server rejects the dataset name
checking the dataset name above with Lexicon.dataset returns
AssertionError: '/VLLPair_VLLToTauS_MVLL1000_MS10_ctau1000/lpcdihiggsboost-NanoTuples-V2p0_lpclonglived-crab_PrivateProduction_Fall18_DR_step3_VLLPair_VLLToTauS_MVLL1000_MS10_ctau1000_batch3_v1-3ee3afd6b5a1410aea6d0b4d52723d06-cd471944433cef30a1e69a7cb38aa7e8/USER' does not match regular expression ^/[a-zA-Z0-9-]{1,99}/[a-zA-Z0-9.-]{1,199}/[A-Z-]{1,50}$
The text was updated successfully, but these errors were encountered: