Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in poll causes crash of entire instance #1064

Closed
reidsunderland opened this issue May 22, 2024 · 4 comments
Closed

Exception in poll causes crash of entire instance #1064

reidsunderland opened this issue May 22, 2024 · 4 comments
Assignees
Labels
bug Something isn't working crasher Crashes entire app.

Comments

@reidsunderland
Copy link
Member

If there's an unhandled exception in a poll, it causes the entire instance to crash.

Obviously it would be ideal if people writing poll plugins handled all exceptions, but imo we should deal with cases where people don't. Many times these exceptions are temporary (network issues, etc.) and won't happen again if the poll runs again. Right now, the whole process crashes and if we're lucky, sr3 sanity will restart it.

I think this section should be wrapped in a try/except:

if self.have_vip:
for plugin in self.plugins['poll']:
new_incoming = plugin()
if len(new_incoming) > 0:
self.worklist.incoming.extend(new_incoming)

We have a try/except for other callbacks, so a poll should be the same.

 

Current Behaviour:

The entire instance crashes.

$ sr3 foreground poll/nasa_omisips_omto3
exchange: xs_NASA-TEST_ split: 5
.2024-07-21 20:15:18,288 [WARNING] 3055003 sarracenia.flow.poll __init__ post_exchange: xs_NASA-TEST is different from exchange: xs_bunnymaster. The settings need for multiple instances to share a poll.
2024-07-21 20:15:18,288 [INFO] 3055003 sarracenia.flow.poll __init__ adding poll plugin, because missing from: ['sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'sarracenia.flowcb.nodupe.disk.Disk', 'poll.nasa_omisips', 'log', 'sarracenia.flowcb.scheduled.poll.Poll']
2024-07-21 20:15:18,288 [WARNING] 3055003 sarracenia.flow.poll __init__ nodupe_ttl < fileAgeMax means some files could age out of the cache and be re-ingested ( see : https://github.com/MetPX/sarracenia/issues/904
2024-07-21 20:15:18,289 [INFO] 3055003 sarracenia.flow loadCallbacks flowCallback plugins to load: ['sarracenia.flowcb.post.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'sarracenia.flowcb.nodupe.disk.Disk', 'poll.nasa_omisips', 'log', 'sarracenia.flowcb.scheduled.poll.Poll', 'sarracenia.flowcb.poll.Poll']
2024-07-21 20:15:18,297 [INFO] 3055003 sarracenia.flowcb.nodupe.disk __init__ time_to_live=1209600,
2024-07-21 20:15:18,306 [INFO] 3055003 sarracenia.flowcb.log __init__ poll initialized with: logEvents: {'after_post', 'after_accept', 'on_housekeeping', 'after_work'},  logMessageDump: False
2024-07-21 20:15:18,309 [INFO] 3055003 sarracenia.flowcb.scheduled update_appointments for 2024-07-21 20:15:18.309413+00:00: []
2024-07-21 20:15:18,310 [INFO] 3055003 sarracenia.flow run callbacks loaded: ['sarracenia.flowcb.post.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'sarracenia.flowcb.nodupe.disk.Disk', 'poll.nasa_omisips', 'log', 'sarracenia.flowcb.scheduled.poll.Poll', 'sarracenia.flowcb.poll.Poll']
2024-07-21 20:15:18,310 [INFO] 3055003 sarracenia.flow run pid: 3055003 poll/nasa_omisips_omto3 instance: 0
2024-07-21 20:15:18,330 [INFO] 3055003 sarracenia.moth.amqp putSetup exchange declared: xs_NASA-TEST (as: amqp://bunnymaster@localhost/)
2024-07-21 20:15:18,330 [INFO] 3055003 sarracenia.flowcb.post.message on_start starting
2024-07-21 20:15:18,332 [INFO] 3055003 sarracenia.flowcb.scheduled.poll gather waiting for next poll
Traceback (most recent call last):
  File "/net/local/home/sunderlandr/sr3/sarracenia/instance.py", line 248, in <module>
    i.start()
  File "/net/local/home/sunderlandr/sr3/sarracenia/instance.py", line 239, in start
    self.running_instance.run()
  File "/net/local/home/sunderlandr/sr3/sarracenia/flow/__init__.py", line 486, in run
    self.gather()
  File "/net/local/home/sunderlandr/sr3/sarracenia/flow/__init__.py", line 1041, in gather
    new_incoming = plugin()
  File "/net/local/home/sunderlandr/.config/sr3/plugins/poll/nasa_omisips.py", line 61, in poll
    logger.info(hello_this_should_crash)
NameError: name 'hello_this_should_crash' is not defined
2024-07-21 20:15:18,399 3054998 [CRITICAL] root run_command subprocess.run failed err=Command '['/usr/bin/python3', '/net/local/home/sunderlandr/sr3/sarracenia/instance.py', '--no', '0', 'foreground', 'poll/nasa_omisips_omto3']' returned non-zero exit status 1.
@reidsunderland
Copy link
Member Author

Behaviour with a try/except added:

sunderlandr@edcm-dirt22-4:~/sr3$ sr3 foreground poll/nasa_omisips_omto3
exchange: xs_NASA-TEST_ split: 5
.2024-07-21 21:24:28,308 [WARNING] 3063160 sarracenia.flow.poll __init__ post_exchange: xs_NASA-TEST is different from exchange: xs_bunnymaster. The settings need for multiple instances to share a poll.
2024-07-21 21:24:28,308 [INFO] 3063160 sarracenia.flow.poll __init__ adding poll plugin, because missing from: ['sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'sarracenia.flowcb.nodupe.disk.Disk', 'poll.nasa_omisips', 'log', 'sarracenia.flowcb.scheduled.poll.Poll']
2024-07-21 21:24:28,308 [WARNING] 3063160 sarracenia.flow.poll __init__ nodupe_ttl < fileAgeMax means some files could age out of the cache and be re-ingested ( see : https://github.com/MetPX/sarracenia/issues/904
2024-07-21 21:24:28,308 [INFO] 3063160 sarracenia.flow loadCallbacks flowCallback plugins to load: ['sarracenia.flowcb.post.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'sarracenia.flowcb.nodupe.disk.Disk', 'poll.nasa_omisips', 'log', 'sarracenia.flowcb.scheduled.poll.Poll', 'sarracenia.flowcb.poll.Poll']
2024-07-21 21:24:28,316 [INFO] 3063160 sarracenia.flowcb.nodupe.disk __init__ time_to_live=1209600,
2024-07-21 21:24:28,323 [INFO] 3063160 sarracenia.flowcb.log __init__ poll initialized with: logEvents: {'after_accept', 'after_work', 'on_housekeeping', 'after_post'},  logMessageDump: False
2024-07-21 21:24:28,325 [INFO] 3063160 sarracenia.flowcb.scheduled update_appointments for 2024-07-21 21:24:28.325443+00:00: []
2024-07-21 21:24:28,326 [INFO] 3063160 sarracenia.flow run callbacks loaded: ['sarracenia.flowcb.post.message.Message', 'sarracenia.flowcb.retry.Retry', 'sarracenia.flowcb.housekeeping.resources.Resources', 'sarracenia.flowcb.nodupe.disk.Disk', 'poll.nasa_omisips', 'log', 'sarracenia.flowcb.scheduled.poll.Poll', 'sarracenia.flowcb.poll.Poll']
2024-07-21 21:24:28,326 [INFO] 3063160 sarracenia.flow run pid: 3063160 poll/nasa_omisips_omto3 instance: 0
2024-07-21 21:24:28,345 [INFO] 3063160 sarracenia.moth.amqp putSetup exchange declared: xs_NASA-TEST (as: amqp://bunnymaster@localhost/)
2024-07-21 21:24:28,345 [INFO] 3063160 sarracenia.flowcb.post.message on_start starting
2024-07-21 21:24:28,346 [INFO] 3063160 sarracenia.flowcb.scheduled.poll gather waiting for next poll
2024-07-21 21:24:28,346 [ERROR] 3063160 sarracenia.flow gather flowCallback plugin poll.nasa_omisips.Nasa_omisips.poll crashed: name 'hello_this_should_crash' is not defined
2024-07-21 21:24:28,846 [INFO] 3063160 sarracenia.flowcb.scheduled.poll gather waiting for next poll

or and with debug logging, it gives the details:

2024-07-21 21:25:33,048 [ERROR] 3063659 sarracenia.flow gather flowCallback plugin poll.nasa_omisips.Nasa_omisips.poll crashed: name 'hello_this_should_crash' is not defined
2024-07-21 21:25:33,049 [DEBUG] 3063659 sarracenia.flow gather details:
Traceback (most recent call last):
  File "/net/local/home/sunderlandr/sr3/sarracenia/flow/__init__.py", line 1042, in gather
    new_incoming = plugin()
  File "/net/local/home/sunderlandr/.config/sr3/plugins/poll/nasa_omisips.py", line 61, in poll
    logger.info(hello_this_should_crash)
NameError: name 'hello_this_should_crash' is not defined

@reidsunderland reidsunderland self-assigned this May 22, 2024
@reidsunderland reidsunderland added the crasher Crashes entire app. label May 22, 2024
@petersilva
Copy link
Contributor

close... but not quite right... for the other plugins, when debug is enabled, we don't try/except... because people want it to crash with maximum info. so you need an if to execute without try/except wrapper when debug is on. That would be consistent with the other entry points.

It's also weird that it isn't it's own _runCallbackPoll() would be the more natural way to arrange things...

@petersilva petersilva added the bug Something isn't working label May 22, 2024
@reidsunderland
Copy link
Member Author

oh yeah, I didn't notice the if in the other ones. I'll add that and move it into _runCallbackPoll().

I'm pretty sure this code was only recently moved into flow/init.py. It used to be in flow/poll.py.

@petersilva
Copy link
Contributor

ok, I added to the same branch:

  • code so that in a flow class, one can declare routines like on_start, on_stop, on_heartbeat, etc... and have them called. before now, that only worked with callbacks.
  • added an on_start to flow/poll.py so that it only loads the default poll if no other callback has already done it.

maybe try it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working crasher Crashes entire app.
Projects
None yet
Development

No branches or pull requests

2 participants