ParserBot: erroneous raw line recovery in error handling #1850

ghost · 2021-04-08T18:43:31Z

This logic here:

Lines 1005 to 1031 in 7ba8b62

    
           for line in self.parse(report): 
        
               if not line: 
        
                   continue 
        
               try: 
        
                   value = self.parse_line(line, report) 
        
                   if value is None: 
        
                       continue 
        
                   elif type(value) is list or isinstance(value, types.GeneratorType): 
        
                       # filter out None 
        
                       events = list(filter(bool, value)) 
        
                   else: 
        
                       events = [value] 
        
               except Exception: 
        
                   self.logger.exception('Failed to parse line.') 
        
                   self.__failed.append((traceback.format_exc(), line)) 
        
               else: 
        
                   events_count += len(events) 
        
                   self.send_message(*events) 
        
           for exc, line in self.__failed: 
        
               report_dump = report.copy() 
        
               report_dump.change('raw', self.recover_line(line)) 
        
               if self.parameters.error_dump_message: 
        
                   self._dump_message(exc, report_dump) 
        
               if self._Bot__destination_queues and '_on_error' in self._Bot__destination_queues: 
        
                   self.send_message(report_dump, path='_on_error')

does not work with all recover_line_* methods. Some methods use the parameter line, others use self.current_line. The overall logic is fine, but there is a major bug:

process collects all fails (self.__failed is appended with line) in the first loop (for line in self.parse(report))
In the second loop (for exc, line in self.__failed), recover_line is called with line.
If recover_line accesses self.current_line, the data is wrong, as self.current_line is then the last line of the report, not the actual one.

Unfortunately, simply fixing some recover_line_* functions is not enough, the process in self.process needs to be thought through and eventually adapted as well.

self.current_line should be deleted after the parsing end to prohibit this error in the future.
self.recover_line behaviour should be harmonized, making it applicable for use in self.parse_line and in self.process
self.process should be investigated

In the future we also need better tests, but that's a bigger task and I'm afraid we can't stem that on a short term. And unfortunately the issue is important, as it leads to bogus (wrong/duplicated) data in the dumps and therefore loss of data.

The text was updated successfully, but these errors were encountered:

sebix · 2022-08-08T09:25:59Z

fixed by #2192

ghost added bug Indicates an unexpected problem or unintended behavior component: core labels Apr 8, 2021

ghost added this to the 2.3.2 milestone Apr 8, 2021

ghost modified the milestones: 2.3.2, 2.3.3 Apr 27, 2021

ghost modified the milestones: 2.3.3, 3.0.0 May 31, 2021

ghost removed this from the 3.0.0 milestone Jun 17, 2021

sebix added the help wanted Indicates that a maintainer wants help on an issue or pull request label Feb 3, 2022

sebix closed this as completed Aug 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParserBot: erroneous raw line recovery in error handling #1850

ParserBot: erroneous raw line recovery in error handling #1850

ghost commented Apr 8, 2021

sebix commented Aug 8, 2022

ParserBot: erroneous raw line recovery in error handling #1850

ParserBot: erroneous raw line recovery in error handling #1850

Comments

ghost commented Apr 8, 2021

sebix commented Aug 8, 2022