Skip to content

[8.0] fix(wms): correctly log the pilot job reference during the matching process#7924

Merged
chrisburr merged 2 commits intoDIRACGrid:rel-v8r0from
aldbr:v8.0_FIX_Matcher-logging
Dec 13, 2024
Merged

[8.0] fix(wms): correctly log the pilot job reference during the matching process#7924
chrisburr merged 2 commits intoDIRACGrid:rel-v8r0from
aldbr:v8.0_FIX_Matcher-logging

Conversation

@aldbr
Copy link
Contributor

@aldbr aldbr commented Dec 3, 2024

Closes #7884

BEGINRELEASENOTES
*WorkloadManagement
FIX: correctly log the pilot job reference during the matching process
ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Dec 3, 2024
@aldbr
Copy link
Contributor Author

aldbr commented Dec 4, 2024

https://github.com/DIRACGrid/DIRAC/actions/runs/12144642670/job/33864474014?pr=7924#step:10:935

--- Logging error ---
Traceback (most recent call last):
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/logging/__init__.py", line 1113, in emit
    stream.write(msg + self.terminator)
ValueError: I/O operation on closed file.
Call stack:
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/weakref.py", line 666, in _exitfunc
    f()
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/weakref.py", line 590, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/site-packages/DIRAC/Core/Utilities/DictCache.py", line 247, in _purgeAll
    deleteFunction(cache[cKey]["value"])
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/DB/TaskQueueDB.py", line 1001, in __deleteTQIfEmpty
    result = self.deleteTaskQueueIfEmpty(tqId, tqOwnerDN, tqOwnerGroup)
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/DB/TaskQueueDB.py", line 1027, in deleteTaskQueueIfEmpty
    retVal = self._query(sqlCmd, conn=connObj)
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/site-packages/DIRAC/Core/Utilities/MySQL.py", line 717, in _query
    self.log.debug(f"_query: {self._safeCmd(cmd)}")
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/site-packages/DIRAC/FrameworkSystem/private/standardLogging/Logging.py", line 352, in debug
    return self._createLogRecord(LogLevels.DEBUG, sMsg, sVarMsg)
  File "/home/dirac/ServerInstallDIR/diracos/lib/python3.11/site-packages/DIRAC/FrameworkSystem/private/standardLogging/Logging.py", line 417, in _createLogRecord
    self._logger.log(level, "%s", sMsg, exc_info=exc_info, extra=extra)
Message: '%s'
Arguments: ('_query: SELECT TQId FROM `tq_TaskQueues` WHERE Enabled >= 1 AND `tq_TaskQueues`.TQId = 26 AND `tq_TaskQueues`.TQId not in ( SELECT DISTINCT TQId from `tq_Jobs` )',)

@aldbr
Copy link
Contributor Author

aldbr commented Dec 4, 2024

Update: ok funny, this is not a new bug: https://github.com/DIRACGrid/DIRAC/actions/runs/12140642296/job/33850877559

I am not sure to fully understand, but it looks like it is due to the weakref. When we are in __deleteTQIfEmpty, it's like the streams have already been cleaned up, and the logger, even if it's still alive, cannot emit the messages anymore.

May be we should not try to log/print messages in methods used by weakref? But then we take the risk of missing some errors

@aldbr aldbr force-pushed the v8.0_FIX_Matcher-logging branch from 1a95e90 to 07cc337 Compare December 4, 2024 14:12
@aldbr aldbr marked this pull request as ready for review December 4, 2024 14:12
resourceDescription["Setup"] = self.serviceInfoDict["clientSetup"]
credDict = self.getRemoteCredentials()
pilotRef = resourceDescription.get("PilotReference", "Unknown")
pilotRef = resourceDescription.get("PilotReference")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not harming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was probably to handle cases of old pilots without a pilotRef.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification, I reverted my change.

@@ -0,0 +1,16 @@
""" Logging context module"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the logging code is in DIRAC.FrameworkSystem.private.standardLogging right? You can add this code in one of the existing module (here it feels a bit "lost").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but then it becomes weird to import a package coming from <something>.private.<something_else>.

May be the best solution would be to add it somewhere in DIRAC.FrameworkSystem.private.standardLogging and then call it through DIRAC.FrameworkSystem.Client.Logger.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I applied the changes

@aldbr aldbr force-pushed the v8.0_FIX_Matcher-logging branch from 07cc337 to 50a3882 Compare December 9, 2024 13:34
@chrisburr chrisburr merged commit d5aebd1 into DIRACGrid:rel-v8r0 Dec 13, 2024
@DIRACGridBot DIRACGridBot added sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention labels Dec 13, 2024
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/12317774899

Failed:

  • integration
    cherry-pick d5aebd1 into integration failed
    check merge conflicts on a local copy of this repository
    git fetch upstream
    git checkout upstream/integration -b cherry-pick-2-d5aebd11d-integration
    git cherry-pick -x -m 1 d5aebd11d
    # Fix the conflicts
    git cherry-pick --continue
    git commit --amend -m 'sweep: #7924 fix(wms): correctly log the pilot job reference during the matching process' --author='aldbr <aldbr@outlook.com>'
    git push -u origin cherry-pick-2-d5aebd11d-integration
    
    # If you have the GitHub CLI installed the PR can be made with
    gh pr create \
         --label 'sweep:from rel-v8r0' \
         --base integration \
         --repo DIRACGrid/DIRAC \
         --title '[sweep:integration] fix(wms): correctly log the pilot job reference during the matching process' \
         --body 'Sweep #7924 `fix(wms): correctly log the pilot job reference during the matching process` to `integration`.
    
    Adding original author @aldbr as watcher.
    
    BEGINRELEASENOTES
    *WorkloadManagement
    FIX: correctly log the pilot job reference during the matching process
    ENDRELEASENOTES
    Closes #7952'

aldbr added a commit to aldbr/DIRAC that referenced this pull request Dec 17, 2024
maxnoe pushed a commit to maxnoe/DIRAC that referenced this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR sweep:failed Sweeping failed and needs manual intervention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants