Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash loop on event with ASN = 0 #1765

Closed
gethvi opened this issue Mar 1, 2021 · 2 comments · Fixed by #1766
Closed

Crash loop on event with ASN = 0 #1765

gethvi opened this issue Mar 1, 2021 · 2 comments · Fixed by #1766
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior component: core
Milestone

Comments

@gethvi
Copy link
Contributor

gethvi commented Mar 1, 2021

Every now and then there appears an event with source.asn = 0 in our pipeline. Whenever a bot receives such message, it throws an exception InvalidValue. The bot then gets stuck in a loop between this exception and PipelineError exception.

The defaults configuration:

  "error_dump_message": true,
  "error_log_exception": true,
  "error_log_message": true,
  "error_max_retries": 3,
  "error_procedure": "pass",
  "error_retry_delay": 15,

Log (from intelmqctl run):

abusix-expert-cz: Bot has found a problem.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 269, in start
    self.process()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/bots/experts/abusix/expert.py", line 26, in process
    event = self.receive_message()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 606, in receive_message
    self.__current_message = libmessage.MessageFactory.unserialize(message,
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/message.py", line 72, in unserialize
    return MessageFactory.from_dict(message, harmonization=harmonization,
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/message.py", line 54, in from_dict
    return class_reference(message, auto=True, harmonization=harmonization)
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/message.py", line 539, in __init__
    super().__init__(template, auto, harmonization)
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/message.py", line 127, in __init__
    self.add(key, value, sanitize=True)
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/message.py", line 249, in add
    raise exceptions.InvalidValue(key, old_value)
intelmq.lib.exceptions.InvalidValue: invalid value 0 (<class 'int'>) for key 'source.asn'
abusix-expert-cz: Current Message(event): None.
abusix-expert-cz: Bot will continue in 15 seconds.
abusix-expert-cz: Pipeline failed.
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 269, in start
    self.process()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/bots/experts/abusix/expert.py", line 26, in process
    event = self.receive_message()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/bot.py", line 591, in receive_message
    message = self.__source_pipeline.receive()
  File "/usr/local/lib/python3.8/dist-packages/intelmq/lib/pipeline.py", line 122, in receive
    raise exceptions.PipelineError("There's already a message, first "
intelmq.lib.exceptions.PipelineError: pipeline failed - "There's already a message, first acknowledge the existing one."
abusix-expert-cz: Bot will continue in 15 seconds.

The bot keeps throwing these two exceptions and never stops. Expected behavior would be to follow defaults.conf, try it three times (even better would be to try just once for this kind of exception because the problem is obviously not going to go away) and then just dump the message and continue processing another one. However the message is never dumped (not to dump file nor the log - see the log line Current Message(event): None.) because it is never even loaded.

Currently the InvalidValue exceptions gets caught by a generic Exception handler:

intelmq/intelmq/lib/bot.py

Lines 329 to 353 in b7ba19c

except Exception as exc:
# in case of serious system issues, exit immediately
if isinstance(exc, MemoryError):
self.logger.exception('Out of memory. Exit immediately. Reason: %r.' % exc.args[0])
self.stop()
elif isinstance(exc, (IOError, OSError)) and exc.errno == 28:
self.logger.exception('Out of disk space. Exit immediately.')
self.stop()
error_on_message = sys.exc_info()
if self.error_log_exception:
self.logger.exception("Bot has found a problem.")
else:
self.logger.error(utils.error_message_from_exc(exc))
self.logger.error("Bot has found a problem.")
if self.error_log_message:
# Print full message if explicitly requested by config
self.logger.info("Current Message(event): %r.",
self.__current_message)
# In case of permanent failures, stop now
if isinstance(exc, exceptions.ConfigurationError):
self.stop()

I believe it should be better handled in a similar fashion to DecodingError:

intelmq/intelmq/lib/bot.py

Lines 320 to 327 in b7ba19c

except exceptions.DecodingError as exc:
self.logger.exception('Could not decode message from pipeline. No retries useful.')
# ensure that we do not re-process the faulty message
self.__error_retries_counter = self.error_max_retries + 1
error_on_message = sys.exc_info()
message_to_dump = exc.object

@ghost
Copy link

ghost commented Mar 1, 2021

I agree with the proposed fix - but where does the invalid value come from in the first place? It should not get into the pipeline at all.

@ghost ghost assigned gethvi Mar 1, 2021
@ghost ghost added bug Indicates an unexpected problem or unintended behavior component: core labels Mar 1, 2021
gethvi pushed a commit to gethvi/intelmq that referenced this issue Mar 1, 2021
@ghost ghost added this to the 2.3.0 milestone Mar 1, 2021
@gethvi
Copy link
Contributor Author

gethvi commented Mar 1, 2021

I agree with the proposed fix - but where does the invalid value come from in the first place? It should not get into the pipeline at all.

Well I have no idea. This bug only happened in our test environment which was subject to a lot of cruel experiments over time. :-) For reasons beyond me every now and then there is repetitive appearance of events from 2017 (such event caused this). Hopefully I will investigate deeper at some point.

gethvi pushed a commit to gethvi/intelmq that referenced this issue Mar 1, 2021
@ghost ghost modified the milestones: 2.3.0, 3.0.0 Mar 2, 2021
gethvi pushed a commit to gethvi/intelmq that referenced this issue Mar 2, 2021
gethvi pushed a commit to gethvi/intelmq that referenced this issue Mar 2, 2021
tests for certtools#1765 / PR certtools#1766

checks if invalid values in the pipeline are correctly handled (dumped,
logged)
@ghost ghost closed this as completed in #1766 Mar 3, 2021
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant