Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark kubectl-sourced strings with invalid encoding as UTF-8 #646

Merged
merged 2 commits into from
Jan 27, 2020

Conversation

KnVerey
Copy link
Contributor

@KnVerey KnVerey commented Nov 27, 2019

What are you trying to accomplish with this PR?

Fixes #395 for good.

How is this accomplished?

  • We know that kubectl is returning utf-8 encoded strings, so if Ruby has interpreted them otherwise, correct it.
  • Only do the gsub when we're going to use the result.

What could go wrong?

  • I don't have much experience working with encodings, so if this solution is weird, call me out!

def test_debug_level_output_log_uses_correct_encoding
logger.level = ::Logger::DEBUG
good = "hélas"
bad = good.dup.force_encoding(Encoding::US_ASCII)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This obviously isn't how the encoding goes bad in the wild, but this test does reproduce the stack trace from the bug report. Note that we have to dup because string literals are frozen, and force_encoding is considered a modification even though it doesn't touch the characters themselves.

Copy link
Contributor

@timothysmith0609 timothysmith0609 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems straightforward to me. I don't imagine the responses coming from kubernetes to ever be anything other than UTF-8

Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even after googling I'm not 100% confident I'd know a correct fix vs. one with a subtle issue.

logger.debug("Kubectl out: " + out.gsub(/\s+/, ' ')) unless output_is_sensitive

# https://github.com/Shopify/krane/issues/395
if out.encoding != Encoding::UTF_8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we did this only when ENV["LC_ALL"] or ENV["LANG"] aren't set? I'm not 100% sure, but it seems like in other cases (e.g. utf-16) this might be wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems like in other cases (e.g. utf-16) this might be wrong?

What are you thinking would cause the string we get back from kubectl to actually have a different encoding?

@dalehamel
Copy link
Member

I wonder what the consequences of ignoring the locales could be...

but if the code assumes UTF-8 it is probably better to assert this.

Copy link
Contributor Author

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After doing more reading on encoding, I'm feeling confident that this is the right way to fix the bug, if we want to fix it. The alternate view is that this is really a problem with the user's environment, and they should be responsible for either fixing it globally, invoking krane with the correct env set, or at least invoking ruby with the correct locale (which you can do via ruby -E, though I'm not sure how to do it with a gem executable).

Here are some resources I found informative:

Reviewers, please take another look and voice your opinion on whether we want this fix.

logger.debug("Kubectl out: " + out.gsub(/\s+/, ' ')) unless output_is_sensitive

# https://github.com/Shopify/krane/issues/395
unless out.valid_encoding?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This returns false when the string uses characters that aren't in the character set for the encoding the string is claiming to have. I think this is more strictly correct than my previous solution of checking if the original tag is utf8 and always changing it if so. The trade-off is potentially performance: in this version we have to check the encoding on every single string, but we only have to dup the string and change it when a non-ASCII character is found. In the previous version, we never had to check the encoding, but if your locales are set wrong, we're dup'ing and forcing encoding on every single string.

Note that setting Encoding.default_external would also fix the bug, but I don't think we should do that, because as a gem, we should not set globals (not to mention that the docs say "don't set this yourself").


# https://github.com/Shopify/krane/issues/395
unless out.valid_encoding?
out = out.dup.force_encoding(Encoding::UTF_8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further thought and reading about encodings, I am comfortable saying this is safe to do. The string we're tagging here isn't arbitrary--it's coming from kubectl. I'd be shocked if kubectl decided to suddenly start outputting e.g. UTF-16 encoding. Putting that aside, the worst case is that a user gets a non-UTF8 string and still sees the original bug. If we want to be super paranoid, we can add this code after this line:

          unless out.valid_encoding?
            out.encode(invalid: :replace, undef: :replace)
          end

That will force the string to be UTF8, with invalid characters replaced with "?". But I just don't think it's worth it. I spent a few minutes trying to figure out how to test that, and couldn't figure out how.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better behaviour here is, as you suggest, to let the original bug come through if they hit it. Relying on undef: feels very close to masking a (potentially) larger problem

@@ -348,6 +348,29 @@ def test_retry_delay_backoff
end
end

def test_kubectl_run_fixes_encoding_when_locales_set_to_non_utf8
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a much better version of the test that reproduces the Ruby environment from the bug report (to confirm, start a console session with the wrong local envs set and check the Encoding values below). Implementation inspired by this patch on Pysch, which I found via a bug report on RubyGems.

Note that I confirmed this is a true regression test that files if I remove my new code.

Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's ship this. If the perf impact becomes an issue we can roll it back, but I think we should try and be helpful when possible and safe.


# https://github.com/Shopify/krane/issues/395
unless out.valid_encoding?
out = out.dup.force_encoding(Encoding::UTF_8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the better behaviour here is, as you suggest, to let the original bug come through if they hit it. Relying on undef: feels very close to masking a (potentially) larger problem

@KnVerey KnVerey changed the title Force intepret kubectl response encoding as UTF8 Mark kubectl-sourced strings with invalid encoding as UTF-8 Jan 27, 2020
@KnVerey KnVerey merged commit 8abe30d into master Jan 27, 2020
@KnVerey KnVerey deleted the kubectl_out_encoding branch January 27, 2020 19:42
@timothysmith0609 timothysmith0609 temporarily deployed to rubygems February 25, 2020 18:04 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid byte sequence in US-ASCII exception when locales are not set as expected
4 participants