Skip to content

[receiver/filelog] encoding not applied to multiline option #39011

Open
@DougManton

Description

@DougManton

Component(s)

receiver/filelog

What happened?

Description

SAP audit log files have utf-16le encoding, continuously updated with new logs, fixed length records, and no line termination. This should be supportable using the Filelog receivers encoding and multiline support, but I've got a reproducible bug and workaround.

Steps to Reproduce

  1. Create a utf-16le file named auditlog.txt containing 10 SAP audit log records in the format:
2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        2AUK20250227000000002316500018D110.102.8BATCH_ALRI                      SAPMSSY1                                0501Z91_VALR_IF&&Z91_VAL_PLSTATUS                                   10.122.81.29        
  1. Configure filelog receiver as following:
receivers:
  filelog/sap:
    include: [ auditlog.txt ]
    encoding: utf-16le
    multiline:
      line_start_pattern: '([23])[A-Z][A-Z][A-Z0-9]\d{14}00'
    preserve_trailing_whitespaces: true
    start_at: beginning

Expected Result

10 log events

Actual Result

1 log event

Workaround

I suspected the multiline processing is not honouring the file encoding, and therefore failing to match the pattern. To test this theory, I adjusted the multiline to only match on the first 8 bits of each 16 bit character:

receivers:
  filelog/sap:
    include: [ auditlog.txt ]
    encoding: utf-16le
    multiline:
      line_start_pattern: '([23]).[A-Z].[A-Z].[A-Z0-9].(\d.){14}0.0.'
    preserve_trailing_whitespaces: true
    start_at: beginning

This configuration outputs 10 log records, each one containing a complete 200 character record.

Collector version

v0.122.0

Environment information

Environment

OS: MacOS 15.3.2

OpenTelemetry Collector configuration

receivers:
  filelog/sap:
    include: [ auditlog.txt ]
    encoding: utf-16le
    multiline:
      line_start_pattern: '([23])[A-Z][A-Z][A-Z0-9]\d{14}00'
    preserve_trailing_whitespaces: true
    start_at: beginning
exporters:
  file/debug:
    path: debug.json
service:
  pipelines:
    logs:
      receivers:
        - filelog/sap
      exporters:
        - file/debug

Log output

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needednever staleIssues marked with this label will be never staled and automatically removedreceiver/filelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions