Mark kubectl-sourced strings with invalid encoding as UTF-8 #646

KnVerey · 2019-11-27T19:25:31Z

What are you trying to accomplish with this PR?

Fixes #395 for good.

How is this accomplished?

We know that kubectl is returning utf-8 encoded strings, so if Ruby has interpreted them otherwise, correct it.
Only do the gsub when we're going to use the result.

What could go wrong?

I don't have much experience working with encodings, so if this solution is weird, call me out!

KnVerey · 2019-11-27T19:26:52Z

test/unit/krane/kubectl_test.rb

+  def test_debug_level_output_log_uses_correct_encoding
+    logger.level = ::Logger::DEBUG
+    good = "hélas"
+    bad = good.dup.force_encoding(Encoding::US_ASCII)


This obviously isn't how the encoding goes bad in the wild, but this test does reproduce the stack trace from the bug report. Note that we have to dup because string literals are frozen, and force_encoding is considered a modification even though it doesn't touch the characters themselves.

timothysmith0609

This seems straightforward to me. I don't imagine the responses coming from kubernetes to ever be anything other than UTF-8

dturn

Even after googling I'm not 100% confident I'd know a correct fix vs. one with a subtle issue.

dturn · 2019-11-29T18:58:12Z

lib/krane/kubectl.rb

-        logger.debug("Kubectl out: " + out.gsub(/\s+/, ' ')) unless output_is_sensitive
+
+        # https://github.com/Shopify/krane/issues/395
+        if out.encoding != Encoding::UTF_8


What if we did this only when ENV["LC_ALL"] or ENV["LANG"] aren't set? I'm not 100% sure, but it seems like in other cases (e.g. utf-16) this might be wrong?

it seems like in other cases (e.g. utf-16) this might be wrong?

What are you thinking would cause the string we get back from kubectl to actually have a different encoding?

dalehamel · 2019-12-02T14:55:14Z

I wonder what the consequences of ignoring the locales could be...

but if the code assumes UTF-8 it is probably better to assert this.

KnVerey

After doing more reading on encoding, I'm feeling confident that this is the right way to fix the bug, if we want to fix it. The alternate view is that this is really a problem with the user's environment, and they should be responsible for either fixing it globally, invoking krane with the correct env set, or at least invoking ruby with the correct locale (which you can do via ruby -E, though I'm not sure how to do it with a gem executable).

Here are some resources I found informative:

https://ruby-doc.org/core-2.6/Encoding.html
https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/
Related problem handling UTF8 content from an external source in RubyGems: Broken UTF-8 handling when environment locales are not set rubygems/rubygems#139. And its fix: ruby/psych@fe65329.

Reviewers, please take another look and voice your opinion on whether we want this fix.

KnVerey · 2020-01-23T00:14:11Z

lib/krane/kubectl.rb

-        logger.debug("Kubectl out: " + out.gsub(/\s+/, ' ')) unless output_is_sensitive
+
+        # https://github.com/Shopify/krane/issues/395
+        unless out.valid_encoding?


This returns false when the string uses characters that aren't in the character set for the encoding the string is claiming to have. I think this is more strictly correct than my previous solution of checking if the original tag is utf8 and always changing it if so. The trade-off is potentially performance: in this version we have to check the encoding on every single string, but we only have to dup the string and change it when a non-ASCII character is found. In the previous version, we never had to check the encoding, but if your locales are set wrong, we're dup'ing and forcing encoding on every single string.

Note that setting Encoding.default_external would also fix the bug, but I don't think we should do that, because as a gem, we should not set globals (not to mention that the docs say "don't set this yourself").

KnVerey · 2020-01-23T00:19:35Z

lib/krane/kubectl.rb

+
+        # https://github.com/Shopify/krane/issues/395
+        unless out.valid_encoding?
+          out = out.dup.force_encoding(Encoding::UTF_8)


After further thought and reading about encodings, I am comfortable saying this is safe to do. The string we're tagging here isn't arbitrary--it's coming from kubectl. I'd be shocked if kubectl decided to suddenly start outputting e.g. UTF-16 encoding. Putting that aside, the worst case is that a user gets a non-UTF8 string and still sees the original bug. If we want to be super paranoid, we can add this code after this line:

unless out.valid_encoding? out.encode(invalid: :replace, undef: :replace) end

That will force the string to be UTF8, with invalid characters replaced with "?". But I just don't think it's worth it. I spent a few minutes trying to figure out how to test that, and couldn't figure out how.

I think the better behaviour here is, as you suggest, to let the original bug come through if they hit it. Relying on undef: feels very close to masking a (potentially) larger problem

KnVerey · 2020-01-23T00:23:09Z

test/unit/krane/kubectl_test.rb

@@ -348,6 +348,29 @@ def test_retry_delay_backoff
    end
  end

+  def test_kubectl_run_fixes_encoding_when_locales_set_to_non_utf8


This is now a much better version of the test that reproduces the Ruby environment from the bug report (to confirm, start a console session with the wrong local envs set and check the Encoding values below). Implementation inspired by this patch on Pysch, which I found via a bug report on RubyGems.

Note that I confirmed this is a true regression test that files if I remove my new code.

dturn

Let's ship this. If the perf impact becomes an issue we can roll it back, but I think we should try and be helpful when possible and safe.

timothysmith0609 · 2020-01-25T02:59:57Z

lib/krane/kubectl.rb

+
+        # https://github.com/Shopify/krane/issues/395
+        unless out.valid_encoding?
+          out = out.dup.force_encoding(Encoding::UTF_8)


I think the better behaviour here is, as you suggest, to let the original bug come through if they hit it. Relying on undef: feels very close to masking a (potentially) larger problem

KnVerey commented Nov 27, 2019

View reviewed changes

KnVerey requested review from dturn, jonpulsifer and timothysmith0609 November 27, 2019 19:27

timothysmith0609 approved these changes Nov 27, 2019

View reviewed changes

KnVerey force-pushed the kubectl_out_encoding branch from ea4327c to 91c2db8 Compare November 29, 2019 18:48

dturn reviewed Nov 29, 2019

View reviewed changes

KnVerey requested a review from dalehamel November 29, 2019 19:27

KnVerey added 2 commits January 22, 2020 16:51

Force intepret kubectl response encoding as UTF8

f64ba69

More sophisticated test and check

11ef6ba

KnVerey force-pushed the kubectl_out_encoding branch from 91c2db8 to 11ef6ba Compare January 23, 2020 00:13

KnVerey commented Jan 23, 2020

View reviewed changes

KnVerey requested review from dturn and timothysmith0609 January 23, 2020 00:48

dturn approved these changes Jan 23, 2020

View reviewed changes

timothysmith0609 approved these changes Jan 25, 2020

View reviewed changes

KnVerey changed the title ~~Force intepret kubectl response encoding as UTF8~~ Mark kubectl-sourced strings with invalid encoding as UTF-8 Jan 27, 2020

KnVerey merged commit 8abe30d into master Jan 27, 2020

KnVerey deleted the kubectl_out_encoding branch January 27, 2020 19:42

timothysmith0609 temporarily deployed to rubygems February 25, 2020 18:04 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark kubectl-sourced strings with invalid encoding as UTF-8 #646

Mark kubectl-sourced strings with invalid encoding as UTF-8 #646

KnVerey commented Nov 27, 2019

KnVerey Nov 27, 2019

timothysmith0609 left a comment

dturn left a comment

dturn Nov 29, 2019

KnVerey Nov 29, 2019

dalehamel commented Dec 2, 2019

KnVerey left a comment

KnVerey Jan 23, 2020

KnVerey Jan 23, 2020

timothysmith0609 Jan 25, 2020

KnVerey Jan 23, 2020

dturn left a comment

timothysmith0609 Jan 25, 2020

Mark kubectl-sourced strings with invalid encoding as UTF-8 #646

Mark kubectl-sourced strings with invalid encoding as UTF-8 #646

Conversation

KnVerey commented Nov 27, 2019

Choose a reason for hiding this comment

timothysmith0609 left a comment

Choose a reason for hiding this comment

dturn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalehamel commented Dec 2, 2019

KnVerey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dturn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment