Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF-8 encoding is not enforced on all messages #180

Open
jasonwbarnett opened this issue Nov 11, 2021 · 3 comments
Open

UTF-8 encoding is not enforced on all messages #180

jasonwbarnett opened this issue Nov 11, 2021 · 3 comments

Comments

@jasonwbarnett
Copy link
Contributor

Environment

Ruby v2.7.3
semantic_logger v4.7.4

Expected Behavior

I would expect that all messages logged would have UTF-8 encoding forced.

Actual Behavior

E [9051:SemanticLogger::Appenders] SemanticLogger::Appenders -- Failed to log to appender: SemanticLogger::Appender::SplunkHttp -- Exception: Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/splunk_http.rb:102:in `to_json'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/splunk_http.rb:102:in `call'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/http.rb:165:in `log'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appenders.rb:20:in `block in log'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appenders.rb:18:in `each'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appenders.rb:18:in `log'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/async.rb:152:in `process_messages'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/async.rb:121:in `process'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/async.rb:77:in `block in thread'
@reidmorrison
Copy link
Owner

Sounds like an issue with all JSON rendering, since the data is assumed to be UTF-8 compatible.

To fix the issue properly, we should fix all formatters that output to JSON to convert all strings to valid utf-8 prior to calling .to_json. It would have been nice if .to_json directly supported code conversion options to handle this condition.

The log attributes most likely at risk of trying to log non utf-8 data:

  • message
  • tags
  • named_tags
  • payload
  • exception (all messages need cleansing)

Alternatively a recursive helper function could be written to navigate the entire hash structure fixing/stripping all non-utf-8 characters, before calling .to_json on it.

@reidmorrison
Copy link
Owner

@jasonwbarnett I would be interested to know your experience with the SplunkHttp appender. We tried it a few years ago and found that our application became dependent on the availability of the Splunk HTTP servers.

For example, when the Splunk http servers were down for any longer than a few minutes, our Rails apps would run out of memory trying to hold all the logs in memory, waiting for the Splunk http servers to recover.

We have instead move to an asynchronous model where we write the logs to an EBS volume, where a Splunk listener picks ups the logs at its leisure. This decouples Splunk as a dependency to run our mission critical Rails app.

Ideally we want to log to Kafka and have Splunk read from that instead. Would be interesting to know if Splunk have made any progress on Kafka support in the last few years.

Of course none of this fixes the above non UTF-8 issue above, that we definitely need to address.

@jasonwbarnett
Copy link
Contributor Author

@jasonwbarnett I would be interested to know your experience with the SplunkHttp appender. We tried it a few years ago and found that our application became dependent on the availability of the Splunk HTTP servers.

For example, when the Splunk http servers were down for any longer than a few minutes, our Rails apps would run out of memory trying to hold all the logs in memory, waiting for the Splunk http servers to recover.

@reidmorrison The company I work for runs an extremely large Splunk infrastructure that is incredibly robust and reliable. So much so that when we asked Splunk about their SaaS offering they said we're running better infrastructure than they are and couldn't meet our needs. All that to say, I don't believe that Splunk HEC reliability has ever been a problem for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants