Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParserBot: erroneous raw line recovery in error handling #1850

Closed
ghost opened this issue Apr 8, 2021 · 1 comment
Closed

ParserBot: erroneous raw line recovery in error handling #1850

ghost opened this issue Apr 8, 2021 · 1 comment
Labels
bug Indicates an unexpected problem or unintended behavior component: core help wanted Indicates that a maintainer wants help on an issue or pull request

Comments

@ghost
Copy link

ghost commented Apr 8, 2021

This logic here:

intelmq/intelmq/lib/bot.py

Lines 1005 to 1031 in 7ba8b62

for line in self.parse(report):
if not line:
continue
try:
value = self.parse_line(line, report)
if value is None:
continue
elif type(value) is list or isinstance(value, types.GeneratorType):
# filter out None
events = list(filter(bool, value))
else:
events = [value]
except Exception:
self.logger.exception('Failed to parse line.')
self.__failed.append((traceback.format_exc(), line))
else:
events_count += len(events)
self.send_message(*events)
for exc, line in self.__failed:
report_dump = report.copy()
report_dump.change('raw', self.recover_line(line))
if self.parameters.error_dump_message:
self._dump_message(exc, report_dump)
if self._Bot__destination_queues and '_on_error' in self._Bot__destination_queues:
self.send_message(report_dump, path='_on_error')

does not work with all recover_line_* methods. Some methods use the parameter line, others use self.current_line. The overall logic is fine, but there is a major bug:

process collects all fails (self.__failed is appended with line) in the first loop (for line in self.parse(report))
In the second loop (for exc, line in self.__failed), recover_line is called with line.
If recover_line accesses self.current_line, the data is wrong, as self.current_line is then the last line of the report, not the actual one.

Unfortunately, simply fixing some recover_line_* functions is not enough, the process in self.process needs to be thought through and eventually adapted as well.

  • self.current_line should be deleted after the parsing end to prohibit this error in the future.
  • self.recover_line behaviour should be harmonized, making it applicable for use in self.parse_line and in self.process
  • self.process should be investigated

In the future we also need better tests, but that's a bigger task and I'm afraid we can't stem that on a short term. And unfortunately the issue is important, as it leads to bogus (wrong/duplicated) data in the dumps and therefore loss of data.

@ghost ghost added bug Indicates an unexpected problem or unintended behavior component: core labels Apr 8, 2021
@ghost ghost added this to the 2.3.2 milestone Apr 8, 2021
@ghost ghost modified the milestones: 2.3.2, 2.3.3 Apr 27, 2021
@ghost ghost modified the milestones: 2.3.3, 3.0.0 May 31, 2021
@ghost ghost removed this from the 3.0.0 milestone Jun 17, 2021
@sebix sebix added the help wanted Indicates that a maintainer wants help on an issue or pull request label Feb 3, 2022
@sebix
Copy link
Member

sebix commented Aug 8, 2022

fixed by #2192

@sebix sebix closed this as completed Aug 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior component: core help wanted Indicates that a maintainer wants help on an issue or pull request
Projects
None yet
Development

No branches or pull requests

1 participant