Skip to content
This repository has been archived by the owner on Mar 21, 2020. It is now read-only.

Configurable explicit encoding as UTF-8 #8

Merged
merged 2 commits into from
Apr 20, 2017

Conversation

johanneswuerbach
Copy link

@johanneswuerbach johanneswuerbach commented Apr 9, 2017

Fluent is internally encoding events as ASCII-8BIT, while json is encoded as
UTF-8 as default. During that implicit conversion, errors like
Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8 might
be raised when an event contains invalid data and the event is discarded.

This adds a coerce_to_utf8 configuration option, which encodes events
explicitly to UTF-8 and replaces invalid and undefined chars with a configurable
non_utf8_replacement_string.

Coversion code thanks to
https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/blob/dbc28575/lib/fluent/plugin/out_google_cloud.rb#L1284

Based on #7

Allow to use ${record} in addition to ${tag_parts} and ${hostname} as
interpolated values for sourcetype, source and index.

Switches from the mixin to the fluent provided expander similar to
https://github.com/kazegusuri/fluent-plugin-prometheus/blob/348c112d/lib/fluent/plugin/prometheus.rb

Added various test to ensure the old values and the new values work.
Fluent is internally encoding events as ASCII-8BIT, while json is encoded as
UTF-8 as default.

Workaround runtime encoding errors like
`Encoding::UndefinedConversionError: "\xC3" from ASCII-8BIT to UTF-8`,
by adding a `coerce_to_utf8` configuration option, which encodes events
to UTF-8 and replaces invalid and undefined chars.

Coversion code thanks to
https://github.com/GoogleCloudPlatform/fluent-plugin-google-cloud/blob/dbc28575/lib/fluent/plugin/out_google_cloud.rb#L1284
@johanneswuerbach johanneswuerbach changed the title Encode utf8 Configurable explicit encoding as UTF-8 Apr 9, 2017
@brycied00d
Copy link
Owner

@johanneswuerbach Just a heads-up that Github is reporting "The key that signed this is expired. GPG key ID: 74DB0F4D956CCCE3"

@brycied00d brycied00d merged commit 0bdeb49 into brycied00d:master Apr 20, 2017
brycied00d added a commit that referenced this pull request Apr 20, 2017
Configurable explicit encoding as UTF-8
brycied00d added a commit that referenced this pull request Apr 20, 2017
This release merges PR #7 and #8
* Use of ${record} for variable interpolation.
* Encoding/reencoding support and configurability.
@johanneswuerbach johanneswuerbach deleted the encode-utf8 branch April 20, 2017 18:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants