Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variable used before it is defined #8624

Closed
belforte opened this issue Aug 15, 2024 · 7 comments
Closed

variable used before it is defined #8624

belforte opened this issue Aug 15, 2024 · 7 comments

Comments

@belforte
Copy link
Member

Problem handling 240812_094541:avdas_crab_20240812_114528 because of local variable 'ruleId' referenced before assignment failure, traceback follows\nTraceback (most recent call last):\n File "/data/srv/current/lib/python/site-packages/TaskWorker/Actions/RucioActions.py", line 79, in createOrReuseRucioRule\n ruleIds = self.rucioClient.add_replication_rule( # N.B. returns a list\n File "/usr/local/lib/python3.8/site-packages/rucio/client/ruleclient.py", line 71, in add_replication_rule\n raise exc_cls(exc_msg)\nrucio.common.exception.DuplicateRule: A duplicate rule for this account, did, rse_expression, copies already exists.\nDetails: (cx_Oracle.IntegrityError) ORA-00001: unique constraint (CMS_RUCIO_PROD.RULES_SC_NA_AC_RS_CO_UQ_IDX) violated\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/data/srv/current/lib/python/site-packages/TaskWorker/Actions/Handler.py", line 94, in executeAction\n output = work.execute(nexti\n[... message truncated to the first 1000 chars ...]

@belforte belforte self-assigned this Aug 15, 2024
@belforte belforte mentioned this issue Aug 16, 2024
6 tasks
@belforte
Copy link
Member Author

belforte commented Aug 16, 2024

This test (which makes sure that we do not try to extend a rule created by e.g. WMA) needs to be modified

ruleIdGen = self.rucioClient.list_did_rules(scope=did['scope'], name=did['name'])
for rule in ruleIdGen:
if rule['account'] == self.rucioAccount:
ruleId = rule['id']
break
# extend rule lifetime
self.rucioClient.update_replication_rule(ruleId, {'lifetime': lifetime})

I thought this was due to the changes I did for introducing an individual quota for tale recall.
In this case a task from avdas was trying to recall same dataset as in an existing rule created for a task by wkwon.
Now recall rules have the user's account name as account, while it was always crab_tape_recall.

But Rucio should not raise "duplicate rule" if account is different :-(

@belforte
Copy link
Member Author

indeed the error came handling task 240812_094541:avdas_crab_20240812_114528
A bit later task 240812_124249:avdas_crab_20240812_144239 was submitted with same dataset as input, and this time it worked.
That dataset is /DYto2L-2Jets_MLL-50_TuneCP5_13p6TeV_amcatnloFXFX-pythia8/Run3Summer22EEDRPremix-124X_mcRun3_2022_realistic_postEE_v1-v4/AODSIM and has only two rules

  1. 6e937f42aad448c9b0829bff769188f5 from wmcore_output
  2. 761856cd099f48fb8e532d37c5eda3cc from that user advas

@belforte
Copy link
Member Author

I do not understand how we could have got a duplicateException from Rucio before 761856cd099f48fb8e532d37c5eda3cc was created.

So am trying to call this Unreproducible and move on

@belforte
Copy link
Member Author

On Hold until it happens again. Maybe a glitch in Rucio ?

@belforte
Copy link
Member Author

hmm problem is not that rare. It happens every few days and started on Aug 6 (we have logs back to July 12).

twlog.txt.2024-08-06:Problem handling 240806_150852:jbierken_crab_TnP_ntuplizer_muon_Z_Run2022_AOD_Run2022B because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-06:Problem handling 240806_150916:jbierken_crab_TnP_ntuplizer_muon_Z_Run2022_AOD_Run2022C2 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-06:Problem handling 240806_152640:jbierken_crab_TnP_ntuplizer_muon_Z_Run2022_AOD_Run2022B because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-06:Problem handling 240806_152705:jbierken_crab_TnP_ntuplizer_muon_Z_Run2022_AOD_Run2022C2 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-07:Problem handling 240807_112221:jbierken_crab_TnP_ntuplizer_muon_Z_Run2022_AOD_Run2022C2 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-07:Problem handling 240807_153529:aguven_crab_20240807_173507 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-08:Problem handling 240808_073352:avdas_crab_20240808_093334 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-08:Problem handling 240808_093049:jbierken_crab_TnP_ntuplizer_muon_Z_Run2022_AOD_Run2022C2 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-09:Problem handling 240809_153536:pchou_crab_ppref_MC_ZB0_1400v172_HLT_20240808 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-09:Problem handling 240809_153640:pchou_crab_ppref_MC_ZB0_1400v172_noL1_HLT_20240808 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-12:Problem handling 240812_094541:avdas_crab_20240812_114528 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-12:Problem handling 240812_144322:sdansana_crab_HToSS_SmuonHadronFiltered_MH125_MS1p1_ctauS10_2017_20240812164310 because of local variable 'ruleId' referenced before assignment failure, traceback follows
twlog.txt.2024-08-16:Problem handling 240816_100136:aguven_crab_20240816_120109 because of local variable 'ruleId' referenced before assignment failure, traceback follows

Deserves more thought

@belforte belforte reopened this Aug 17, 2024
@belforte
Copy link
Member Author

a possibility is to check that the code finds the duplicate rule and if not, raise some more clear error

the other is to assume that Rucio does not check for "same account" and do that checking ourselves

I am still puzzled.
I can start by printing all rules for that DID when we make the check for same account

for rule in ruleIdGen:
if rule['account'] == self.rucioAccount:
ruleId = rule['id']
break

belforte added a commit to belforte/CRABServer that referenced this issue Aug 17, 2024
belforte added a commit that referenced this issue Aug 19, 2024
* check rule status. Fix #8626

* protect CheckTapeRecall against empty dataframe

* adapt CheckTapeRecall to current use of Rucio. Fix #8630

* add diagnostic and tentative fix for #8624

* pylint
@belforte
Copy link
Member Author

fixed in #8632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant