LogAnalyzer

Test component name

SONiC LogAnalyzer

The scope of SONiC LogAnalyzer is to be able to analyze log content produced by SONiC during the run of PTF SONiC tests. The analysis will be performed after running each PTF test.

Overview

PTF tests are running on a host machine connected to the DUT switch, and perform functional testing and validation of the SONiC . As part of test outcome validation, we also need to validate all the logs produced by SONiC system to check for any possible errors reported.

We will introduce new test component, LogAnalyzer, written in Python language, which will be performing log analysis.

Ansible script for deployment of LogAnalyzer

There will be an Ansible script orchestrating whole test setup and run. The script will setup DUT, and PTF host. In respect to log analysis it'll perform following actions:

Deploy PTF test on test host
Deploy LogAnalyzer and all needed files on switch DUT.
Run LogAnalyzer to add start-marker
Start PTF test
After PFT test finishes, collect the test report
Run LogAnalyzer to add end-marker
Run LogAnalyzer to perform analysis
Collect the report output and provide test summary

The Ansible script will deploy LogAnalyzer as well as all configuration files needed by it to run.

Log Files to be analyzed

SONiC is a collection of several executables, logging into various locations. Currently logging is done in

/var/log/syslog
/var/log/bgpd.log
/var/log/zebra.log

LogAnalyzer will be parametrized to accept a list of log files it'll need to analyze, to allow any set of files to be specified in general.

Log rotation

To avoid log rotation happening while tests are running, Ansible script will perform log rotation at initialization.

LogAnalyzer functionality

LogAnalyzer will perform analysis of logs produced by running SONiC system. LogAnalyzer will generate output file with analysis summary, which will be consumed by the Ansible script.

Output produced by LogAnalyzer will be consumed by the Ansible script. LogAnalyzer will analyze portion of the logs during which PTF test has ran.

Analysis Markers

To analyze only logs produced during test run, the Ansible script will invoke LogAnalyzer before starting test activity, so LogAnalyzer would know the starting point of the analysis. On invocation LogAnalyzer will write a unique start-marker into the log files. This will define starting point of log analysis. All logs following after start-marker will be considered for analysis.

When PTF test finishes, Ansible script will invoke LogAnalyzer to put unique end-marker into the log file. This defines ending point of log analysis. Any messages following end-marker will not be considered during log analysis.

Marker format

Start Marker format: "start-LogAnalyzer-runID"

End marker format: "end-LogAnalyzer-runID"

'runID' - is a string passed in by the caller which identifies the current run. Ansible script will provide the runID.

The runID will be concatenation of test name and time-date (with resolution including seconds) stamp to uniquely identify test and running date.

When instructed to perform analysis, LogAnalyzer will iterate over specified log files and

find start-marker
perform analysis
stop analysis once discovered end-marker.

Analysis of log messages

Set of messages to match for

As part of input LogAnalyzer will be provided with file representing the set of types of messages to look for and detect as errors or warnings. The file will contain a set of regular expressions, log.analysis.match.set.txt. The file will be placed on the DUT as part of setup and path provided to LogAnalyzer at startup.

For each PTF test there will be provided its own version of the file with content specific for the test. The file will be authored by the test developer.

The messages will contain following keywords: ERROR, ERR, WARN, WARNING, panic.

When LogAnalyzer scans the log files it will match log content against the marching set. All matching messages will be collected in a separate file, log.analysis.result.log.

Ignore list for non-relevant messages

SONiC consists of various components which perform logging. Some of the components will BGP related logs, some other components log about learned IP addresses. During runtime, any of the SONiC components can asynchronously output an error or warning message, which is not related to the current test being ran. For example, while running trap tests, there can be error messages about learned IP addresses. Such messages should be ignored, otherwise we'll face many false positives.

To address this issue we will provide data file to LogAnalyzer where all non-relevant messages will be listed, as regular expressions - log.analysis.ignore.set.txt.

Another example of such messages are platform specific messages.

For each PTF test there will be its private version of the file with the ignore content. This file will be created along with the PTF test development.

Analysis of a logs

For each log file LogAnalyzer will analyzer all messages between start-marker and end-marker in following way:

Compare message against expressions in log.analysis.ignore.set.txt.
- if message matches - skip it.
otherwise, compare against set of expressions in log.analysis.match.set.txt
- if messages matches collect message into log.analysis.result.log

Return value

When LogAnalyzer finishes it will return number of detected messages matching regex expressions. value 0 will mean no concerning messages detected.

Summary of log analysis

LogAnalyzer will produce a log.analysis.summary.log with summary of found issues. The summary will be collected by the Ansible script and presented to user after test runs.

The format will be:

LOG ANALYSIS SUMMARY:

MESSAGES DETECTED: NUMBER

DUMP file

If LogAnalyzer detects errors in logs, it will need to also collect a DUMP file representing full content of SAI and underlying SDK.

Backup of logs in case of errors/warnings

In event of identifying errors LogAnalyzer will return non-0 value to the invoking Ansible script, which will in turn create backup of log files for post-mortem analysis.

LogAnalyzer Command line parameters

Log analyzer will run in 2 modes: to initialize analysis, and to perform analysis.

Initialize analysis

Ansible script will invoke LogAnalyzer and instruct to insert start-marker into all log files.

The parameters specified for this phase will be:

action="init"
log_file_list=path,{path}
start_marker="string"

Analysis phase

After PTF test runs and finishes, the Ansible script will invoke LogAnalyzer to perform analysis of the logs.

action='analyze'
'analysis_src=path' - Path to
1. log.analysis.match.set.txt
2. log.analysis.ignore.set.txt
Destination path where following files will be placed:
1. 'analysis_destination=path'
  1. log.analysis.result.log
  2. log.analysis.summary.log
  3. Backed up log file(s).
log_file_list=path,{path} 1.List of log files to analyze.
start_marker="string"
end_marker="string"

Interaction of PTF test with LogAnalyzer

PTF tests will not directly interact with LogAnalyzer, because this function will be performed by Ansible script.

Interaction of Ansible script with LogAnalyzer

Ansible script will provide LogAnalyzer with input all input parameters. The Ansible script will create ssh session to DUT through which it will:

invoke LogAnalyzer, before starting PTF test - to put start-marker into log file.
invoke LogAnalyzer at test end to put end-marker
invoke LogAnalyzer to perform analysis and create output file. LogAnalyser return code will indicate whether any errors were found.
copy all output files produced by LogAnalyzer from DUT.

If LogAnalyzer returns non-0 value

the script will back up all test files and logs from DUT to a directory which will have same name as runID (see definition of the start marker).
the content of log.analysis.summary.log will be printed to the console, following output of PTF test.

Usage

loganalyzer input parameters: --help Print usage

--verbose Print verbose output during the run

--action init|analyze - action to perform.
init - initialize analysis by placing start-marker
to all log files specified in --logs parameter.
analyze - perform log analysis of files specified in --logs parameter.

--out_dir path Directory path where to place output files,
must be present when --action == analyze

--logs path{,path} List of full paths to log files to be analyzed.
Implicetly system log file will be also processed

--run_id string String passed to loganalyzer, uniquely identifying
analysis session. Used to construct start/end markers.

--match_files_in path{,path} List of paths to files containing strings. A string from log file
matching any string from match_files_in will be collected and
reported. Must be present when action == analyze.
By default syslog will be always analyzed and should be passed by match_files_in.

--ignore_files_in path{,path} List of paths to files containing string.
A string from log file matching any string from these
files will be ignored during analysis. Must be present
when action == analyze.

Sample

command line

python /tmp/loganalyzer.py --action init --run_id fib.2016-12-01.22-16-24 This command will write a start marker in syslog: Dec 2 00:10:14 str-msn2700-04 INFO start-LogAnalyzer-fib.2016-12-02.00-10-09

python /tmp/loganalyzer.py --action analyze --run_id fib.2016-12-01.22-16-24 --out_dir /tmp/test_results --match_files_in test_match_messages.txt,common_match.txt

This command will write end-marker in syslog and perform analysis. Dec 2 00:10:18 str-msn2700-04 INFO end-LogAnalyzer-fib.2016-12-02.00-10-09

Strings between the start and end marker will be analyzed, see below for produced output.

file test_match_messages.txt

s, "removeNeighbor: Neighbor is still referenced"

file common_match.txt

r, ".ERR", ".WARN", "crash"

r, "kernel:.*Oops", "kernel:.*hung", "kernel.*oom\s"

r, "kernel:.*scheduling", "kernel:.*atomic", "kernel:.*panic"

r, "kernel:.*\serr", "kernel:.*allocation", "kernel:.*kill",

r, "kernel:.kmemleak.","kernel:.* Err:"

s, "ERR"

Sample output

Result file:

-----------Matches found in file:/var/log/syslog-----------

Dec 2 00:10:15 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Dec 2 00:10:16 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Dec 2 00:10:17 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Dec 2 00:10:18 str-msn2700-04 ERR orchagent: :- removeNeighbor: Neighbor is still referenced ip:10.0.0.31

Matches:4 Total matches:4

Summary file:

LOG ANALYSIS SUMMARY

FILE: /var/log/syslog MATCHES 4

TOTAL MATCHES: 4

Scale / Performance

LogAnalyzer is not intended to measure Scale/Performance, but as a result of log analysis it may report performance related error conditions appeared during a test run.

LogAnalyzer should not cause noticeable degradation of performance of the DUT during analysis.

Related DUT CLI commands

TODO

Related DUT configuration files

DUT will be provided with configuration files for LogAnalyzer to operate on. The files are specified in the section 'Analysis of log messages'. In summary those are:

Related SAI APIs

LogAnalyzer will analyze messages from SONiC, not targeting SAI. SAI and messages will not be analyzed.

Output from LogAnalyzer

Please refer to 'Analysis of log messages' section above.

Test structure

Setup configuration

PTF host needs to be connected to a port through which it will send packets to the switch, and needs to have connection via ports through which the switch will send forward received packet back to the host for validation.

Configuration scripts