-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark kubectl-sourced strings with invalid encoding as UTF-8 #646
Conversation
test/unit/krane/kubectl_test.rb
Outdated
def test_debug_level_output_log_uses_correct_encoding | ||
logger.level = ::Logger::DEBUG | ||
good = "hélas" | ||
bad = good.dup.force_encoding(Encoding::US_ASCII) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This obviously isn't how the encoding goes bad in the wild, but this test does reproduce the stack trace from the bug report. Note that we have to dup
because string literals are frozen, and force_encoding
is considered a modification even though it doesn't touch the characters themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems straightforward to me. I don't imagine the responses coming from kubernetes to ever be anything other than UTF-8
ea4327c
to
91c2db8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even after googling I'm not 100% confident I'd know a correct fix vs. one with a subtle issue.
lib/krane/kubectl.rb
Outdated
logger.debug("Kubectl out: " + out.gsub(/\s+/, ' ')) unless output_is_sensitive | ||
|
||
# https://github.com/Shopify/krane/issues/395 | ||
if out.encoding != Encoding::UTF_8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we did this only when ENV["LC_ALL"]
or ENV["LANG"]
aren't set? I'm not 100% sure, but it seems like in other cases (e.g. utf-16) this might be wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like in other cases (e.g. utf-16) this might be wrong?
What are you thinking would cause the string we get back from kubectl to actually have a different encoding?
I wonder what the consequences of ignoring the locales could be... but if the code assumes UTF-8 it is probably better to assert this. |
91c2db8
to
11ef6ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing more reading on encoding, I'm feeling confident that this is the right way to fix the bug, if we want to fix it. The alternate view is that this is really a problem with the user's environment, and they should be responsible for either fixing it globally, invoking krane with the correct env set, or at least invoking ruby with the correct locale (which you can do via ruby -E
, though I'm not sure how to do it with a gem executable).
Here are some resources I found informative:
- https://ruby-doc.org/core-2.6/Encoding.html
- https://www.justinweiss.com/articles/3-steps-to-fix-encoding-problems-in-ruby/
- Related problem handling UTF8 content from an external source in RubyGems: Broken UTF-8 handling when environment locales are not set rubygems/rubygems#139. And its fix: ruby/psych@fe65329.
Reviewers, please take another look and voice your opinion on whether we want this fix.
logger.debug("Kubectl out: " + out.gsub(/\s+/, ' ')) unless output_is_sensitive | ||
|
||
# https://github.com/Shopify/krane/issues/395 | ||
unless out.valid_encoding? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This returns false when the string uses characters that aren't in the character set for the encoding the string is claiming to have. I think this is more strictly correct than my previous solution of checking if the original tag is utf8 and always changing it if so. The trade-off is potentially performance: in this version we have to check the encoding on every single string, but we only have to dup the string and change it when a non-ASCII character is found. In the previous version, we never had to check the encoding, but if your locales are set wrong, we're dup'ing and forcing encoding on every single string.
Note that setting Encoding.default_external
would also fix the bug, but I don't think we should do that, because as a gem, we should not set globals (not to mention that the docs say "don't set this yourself").
|
||
# https://github.com/Shopify/krane/issues/395 | ||
unless out.valid_encoding? | ||
out = out.dup.force_encoding(Encoding::UTF_8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After further thought and reading about encodings, I am comfortable saying this is safe to do. The string we're tagging here isn't arbitrary--it's coming from kubectl. I'd be shocked if kubectl decided to suddenly start outputting e.g. UTF-16 encoding. Putting that aside, the worst case is that a user gets a non-UTF8 string and still sees the original bug. If we want to be super paranoid, we can add this code after this line:
unless out.valid_encoding?
out.encode(invalid: :replace, undef: :replace)
end
That will force the string to be UTF8, with invalid characters replaced with "?". But I just don't think it's worth it. I spent a few minutes trying to figure out how to test that, and couldn't figure out how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better behaviour here is, as you suggest, to let the original bug come through if they hit it. Relying on undef:
feels very close to masking a (potentially) larger problem
@@ -348,6 +348,29 @@ def test_retry_delay_backoff | |||
end | |||
end | |||
|
|||
def test_kubectl_run_fixes_encoding_when_locales_set_to_non_utf8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now a much better version of the test that reproduces the Ruby environment from the bug report (to confirm, start a console session with the wrong local envs set and check the Encoding
values below). Implementation inspired by this patch on Pysch, which I found via a bug report on RubyGems.
Note that I confirmed this is a true regression test that files if I remove my new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's ship this. If the perf impact becomes an issue we can roll it back, but I think we should try and be helpful when possible and safe.
|
||
# https://github.com/Shopify/krane/issues/395 | ||
unless out.valid_encoding? | ||
out = out.dup.force_encoding(Encoding::UTF_8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the better behaviour here is, as you suggest, to let the original bug come through if they hit it. Relying on undef:
feels very close to masking a (potentially) larger problem
What are you trying to accomplish with this PR?
Fixes #395 for good.
How is this accomplished?
What could go wrong?