Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process initialisation logging not saved to node #4777

Open
chrisjsewell opened this issue Feb 24, 2021 · 1 comment
Open

Process initialisation logging not saved to node #4777

chrisjsewell opened this issue Feb 24, 2021 · 1 comment

Comments

@chrisjsewell
Copy link
Member

There's also a strange thing - in the daemon log file, I see different errors (for different nodes). E.g. I have this:

02/10/2021 11:09:57 PM <97679> plumpy.processes: [ERROR] Process<112337> failed to register as an RPC subscriber
Traceback (most recent call last):
  File "/home/pizzi/.virtualenvs/aiida-dev/lib/python3.7/site-packages/plumpy/processes.py", line 296, in init
    identifier = self._communicator.add_rpc_subscriber(self.message_receive, identifier=str(self.pid))
  File "/home/pizzi/.virtualenvs/aiida-dev/lib/python3.7/site-packages/plumpy/communications.py", line 120, in add_rpc_subscriber
    return self._communicator.add_rpc_subscriber(converted, identifier)
  File "/home/pizzi/.virtualenvs/aiida-dev/lib/python3.7/site-packages/kiwipy/rmq/threadcomms.py", line 184, in add_rpc_subscriber
    self._communicator.add_rpc_subscriber(self._wrap_subscriber(subscriber), identifier)
  File "/home/pizzi/.virtualenvs/aiida-dev/lib/python3.7/site-packages/pytray/aiothreads.py", line 155, in await_
    return self.await_submit(awaitable).result(timeout=self.task_timeout)
  File "/usr/lib/python3.7/concurrent/futures/_base.py", line 437, in result
    raise TimeoutError()
concurrent.futures._base.TimeoutError

but verdi process report 112337 has no log messages... In a similar weird way, if I e.g. look for 109485 in the log file (that is one with an error above), I only find this line in the logs:

02/10/2021 11:06:31 PM <97678> aiida.orm.nodes.process.workflow.workchain.WorkChainNode: [REPORT] [108861|PwBaseWorkChain|run_process]: launching PwCalculation<109485> iteration #1

so no error... Any idea why? (even the timing is different) I'm confused!

EDIT: (A note on timing: there could be a 1h shift due to UTC vs local time, and a few minutes difference between the submission and the exception - still it's not clear to me why the messages in the log file and in the process report are not the same)

Originally posted by @giovannipizzi in #4745 (comment)

@chrisjsewell
Copy link
Member Author

There's also a strange thing - in the daemon log file, I see different errors (for different nodes). E.g. I have this
so no error... Any idea why? (even the timing is different) I'm confused!

This exception is logged when the process is being created: https://github.com/aiidateam/plumpy/blob/b1bde82403be36a76525b0c6359a175a422c0c1c/plumpy/processes.py#L298-L299.
Perhaps logging to the node has not been initialised at this point 🤷

It is also of note, if the RPC/Broadcast subscribes do fail to register when the process is being created on the daemon, it will mean that these processes will not be able to receive kill/pause/play/status messages.
Catching of the TimeoutError was added in aiidateam/plumpy#81
@sphuber/@muhrin do you think this is ideal behaviour or, if being recreated on the daemon, could/should this except so that it can be re-queued by RMQ?

Anyway I think this should be opened as a separate specific issue

Originally posted by @chrisjsewell in #4745 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant