Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add defensive conversions to display methods, tests #69662

Draft
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

nitzmahone
Copy link
Member

SUMMARY
  • prevent display/debug/verbose messages from causing new failures (eg by increasing verbosity to a level that tries to render a bytes-ish message string)
  • defensive early convert messages and other strings to text
  • adds basic regression tests with mixed bytes/text non-ASCII inputs (that fail without the changes)
ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

lib/ansible/utils/display.py

ADDITIONAL INFORMATION

(this was after a painful debug session where running with -vvv caused a new failure deep in a place where the resultant exception was lost because of an implicit bytes via ascii encoding in a display method that expected text)

@ansibot ansibot added affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. core_review In order to be merged, this PR must follow the core review workflow. needs_triage Needs a first human triage before being processed. support:community This issue/PR relates to code supported by the Ansible community. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels May 22, 2020
Copy link
Contributor

@jborean93 jborean93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this, while we run the risk of missing data in the output or log the alternative is for Ansible to fail and bring down everything. I think the trade-off of missing some chars in the log output is worth making Ansible more stable.

@ansibot ansibot added shipit This PR is ready to be merged by Core needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed core_review In order to be merged, this PR must follow the core review workflow. shipit This PR is ready to be merged by Core labels May 22, 2020
@nitzmahone nitzmahone removed the needs_triage Needs a first human triage before being processed. label May 22, 2020
@nitzmahone nitzmahone requested a review from abadger May 22, 2020 00:37
@nitzmahone
Copy link
Member Author

@abadger I know you had a contradictory opinion on this- I think it's better to try and fix potential display issues (which probably would be happening in most cases anyway if all callers are blindly to_texting everything anyway) than to crash Ansible on an unrelated thing when someone cranks up the verbosity to investigate a problem.

Also happy to discuss changes/tests to make it more resilient to those kinds of things, so long as the ultimate goal remains "display methods don't raise exceptions".

@ansibot ansibot added shipit This PR is ready to be merged by Core and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. labels May 22, 2020
Copy link
Contributor

@abadger abadger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, big veto from me on this. This code is either unnecessary (all the changing of string literals) or actively harmful (converting via to_text).

  • If you are running into issues where this would seem to solve a problem, then it means that you have other code where you aren't properly containing your data types which is the real problem that unicode sandwich fixes. converting to text in the called function is hiding those errors and allowing the bad code to multiply (I would throw in a pandemic reference but it's probably too soon ;-). You must not pass around variable values whose type you don't know. You must not send the wrong type of variable to other interfaces. Those rules are really obvious to people when you're dealing with a text object and a DataLoader object, for instance. But when we deal with text and byte strings, for some reason, people start thinking those rules no longer apply. That is wrong and you have to stop thinking that or your code base will degenerate. This is why I didn't really think you should be passing native strings around in the first place and then finally relented on the condition that all native strings are in variables prefixed with n_ just like all byte strings are in variables prefixed with b_. If you're not following that, then you're going to run into these types of issues everywhere (for instance, the place that I ran into when I tried to fix the ansiballz loader earlier... the variables containing native strings hadn't been marked so the code was throwing errors when it was being passed to an interface that expected a text string. Marking the variable would let us know immediately not to do that without a conversion in the caller.) It's not appropriate to throw to_text() conversions into all the receivers in the code. You wouldn't try to handle sh.RunningCommand, pathlib.Path, or Exception objects by demanding that the called code transform all of those into strings before they tried to operate on them. Instead you make the caller do that work. It's the same idea for byte strings. Ansible-1.7 and earlier code made the called code responsible for the transformation (without the convenience functions) and it littered the code with unnecessary conversions and numerous unicode errors because no one ever knew whether the data was guaranteed to be a text string or if it could sometimes be a byte string. Instead, of going back to that, you need to limit your usage of non-text strings to within the borders of a subsystem and use the n_ and b_ convention to keep track of what variables are storing those types so you don't let those variables leak out.
  • You're not messing with just logging here. display(), do_var_prompt(), banner(), and banner_cowsay() are all UI functions, not logging functions. These need to keep taking text. It is entirely appropriate for these functions to throw an error if they were passed the wrong type just as they should error if you passed them a random class instead of a string. (Note: display is both a UI function and the backend for both other UI functions and the logging functions. If you get jimi-c to signoff on some of these functions being logging functions[see next bullet], you should probably stop using display as the backend for both classes of functions. Even if the implementation looks the same for each, at first, except for the types which they accept, it will make the code clearer and easier to modify in the future if you make them two separate things. You may end up with display being a high level, UI function and part of its present code being moved to a low level backend which UI and logging code both use to actually print to the screen. Or you might end up using a logging library with a stdouthandler for the logging backend which I keep saying is the right way to go long term ;-)
  • I would argue, along with you, that error, debug, warn, deprecated, (maybe verbose... it's been a while since I looked at that) and v*()should be considered logging, not ui and I can see where we could decide that logging is its own subsystem which should accept anything and then figure out how it is going to record useful information about what it received at its input border. But jimi-c has never wanted to go along with me on that. Until you change his mind that those are logging, these need to count as UI functions and fall under the same rules as above.
    • If or when you do change his mind, you should not be converting bytes to text. Instead, you should be converting to the text representation of the bytes. The reason is that, if I'm debugging why I'm getting Unicode tracebacks somewhere in the calling code, I need the logging code to be showing me that the variable I passed in was a bytes object, rather than a text object. Converting destroys that information whereas a representation should preserve it. However, that's a little funky... the representation you get from python3 when you call repr on a byte string is good for the unicode sandwich model. It displays that the string is a byte string, as out of the ordinary. The python2 repr does not make the byte string standout. Context switching between the two formats as you debug a problem is a sure way to lose your sanity. It would probably be best to write a new function for the logging functions that check if they're dealing with a byte string specifically and if so, they output something the python3 representation in either case. (ie: all byte strings get logged as b'This is a caf\xc3\xa9').
    • Non-string objects should be treated the same way but I'm also not sure if it's as simple as applying repr or to_text to them... Because python allows objects to define their own str and their own repr methods, sometimes the repr gives more helpful information and sometimes converting to a string does so. I think I've seen more instances where repr would be more helpful in logging but it's definitely a toss up.

(To be clear about what I said in the first line, I wouldn't mind converting all string literals to explicit u"text string" and b"byte string". But all of the code where you made those changes in this PR are just cosmetic differences. It worries me when someone does that while they're trying to rejigger the framework that underpines our unicode work because it doesn't tell me whether they actually understand the issues around bytes, text, and encoding inside of python or if they're making changes with less than the full picture. There's a Unicode blog post or slidedeck that I read where the presenter says that the problem with unicode in python2 is that you can't write solutions to the problem with less than the full picture... You have to understand 95% of it otherwise you will get it wrong. The Unicode sandwich methodology is good because it segments the code into a few pieces where you have to have that 95% understanding and the rest of the code where you can operate safely as long as you follow a small set of rules.)

@ansibot ansibot added needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed shipit This PR is ready to be merged by Core labels May 22, 2020
@samdoran
Copy link
Contributor

There's a lot to unpack here, because text. 🤯

In short, I don't think we should merge this.

It seems to trade short term gain (less hair-pulling during debugging, which is a very reasonable desire) for longer term pain (what if we actually need the type of a passed value but can't get it because our debugging machinery is converting everything to native types?).

If anything, this seems to highlight the need for a separation of UI display functions, which should be resilient (crashing Ansible by increasing verbosity is quite dumb from a user perspective), with debug/logging functions, which should expose the gnarly unicode/bytes information and potential errors in helpful ways for the programmer when needed.

@nitzmahone
Copy link
Member Author

nitzmahone commented May 26, 2020

It seems like we're about 90% in agreement about the underlying problem, but our perspectives and motivations differ (and again, TBC, I'm not against approaching a solution in a different way, but we need to do something sooner than rewriting the entire display/logging infra or saying "fix all the callers", because we no longer control them all).

It is entirely appropriate for these functions to throw an error if they were passed the wrong type

My primary beef here is that they don't. We expect that all our display inputs are text, but we don't enforce it or warn on dangerous inputs (I'm including any non-text-string under the umbrella of "dangerous"). That leads to "works on my machine" development cases where implicit string conversion prevents fail fast behavior when the code is being developed, and only fails on a host configured for ASCII default encoding when non-ASCII characters are present. This leads to end-users hitting unrelated failures because they turned up the verbosity. No matter where you're coming from on any of the technical arguments, I'm sure you'll agree that's not ok, and that's the primary problem I want to fix.

The part I've never gotten is this long-standing preoccupation with using the display methods as some sort of purity test for the string handling of their callers. Is there still stupid and poorly-written string handling happening throughout our codebase? Undoubtedly (and I'm fully willing to own up to writing some of it). Are the display methods an effective forcing function to drive people to change that? I don't think so- it just leads to an endless game of whack-a-mole with folks wrapping dynamically-generated display inputs in to_text when tests/users complain about it.

But all of the code where you made those changes in this PR are just cosmetic differences

Not exactly- I did it to fix test failures under the assumption this PR makes that raising an exception on a non-convertible display input is unacceptable (because several of these methods rely on implicit conversion internally as well, which works fine until it doesn't).

I see three (ish) options:

  1. do nothing, continue to allow display failures to bomb on failed implicit conversion - This sucks for end-users and devs alike, since problems are only caught on high verbosity under "exotic" filesystems, which is exactly when you don't want it to fail. This will also happen more often as we lose control/influence over the callers in collectionized code.
  2. make all display methods resilient to failure with silent implicit conversion (as this PR does), or with warning + traceback (but not exception) on either:
    • any bytes-typed input (preferable for getting callers to fix their code at the expense of short-term noise),
    • only on failed implicit internal conversion (meh, devs are less likely to see and fix)
  3. make all display methods fail fast on any non-text input - painful one-time conversion for all inputs in our repo, and will certainly break collections and 3rd-party callers. This also still has the potential to expose end-users to the "I increased the verbosity and triggered an unrelated bug" problem, though less so since it'd trip on any bytes input instead of one that can't be implicitly coerced, but someone that slapped in some display.vvvvvv() code and never ran it could still trigger it.

Also of note: Python itself has followed a form of 3) with things like os.path.join, where implicit conversion is strictly forbidden. I think that'd be more painful for our users in the short-term (including devs that call our display APIs), but it's still much better than the middle-of-the-road position we've always taken.

@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jun 4, 2020
@nitzmahone nitzmahone marked this pull request as draft June 15, 2020 16:34
@ansibot ansibot added shipit This PR is ready to be merged by Core needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. and removed needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. shipit This PR is ready to be merged by Core labels Jun 15, 2020
@samdoran samdoran added the ci_verified Changes made in this PR are causing tests to fail. label Jun 19, 2020
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Jun 27, 2020
@ansibot ansibot added pre_azp This PR was last tested before migration to Azure Pipelines. and removed ci_verified Changes made in this PR are causing tests to fail. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. labels Dec 5, 2020
@ansibot ansibot added the WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers. label Feb 16, 2021
@ansibot ansibot removed the support:community This issue/PR relates to code supported by the Ansible community. label Mar 4, 2021
@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Mar 2, 2022
* prevent display/debug/verbose messages from causing new failures (eg by increasing verbosity to a level that tries to render a bytes-ish message string)
* defensive early convert messages and other strings to text
* adds basic regression tests with mixed bytes/text non-ASCII inputs (that fail without the changes)
@ansibot ansibot removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html pre_azp This PR was last tested before migration to Azure Pipelines. labels Feb 23, 2023
@ansibot ansibot added the stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. label Mar 3, 2023
@ansibot ansibot added the stale_review Updates were made after the last review and the last review is more than 7 days old. label Jul 12, 2023
@ansibot ansibot added the needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html label Sep 12, 2023
@ansibot ansibot added needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html and removed needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html labels Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects_2.10 This issue/PR affects Ansible v2.10 bug This issue/PR relates to a bug. needs_rebase https://docs.ansible.com/ansible/devel/dev_guide/developing_rebasing.html needs_revision This PR fails CI tests or a maintainer has requested a review/revision of the PR. stale_ci This PR has been tested by CI more than one week ago. Close and re-open this PR to get it retested. stale_review Updates were made after the last review and the last review is more than 7 days old. support:core This issue/PR relates to code supported by the Ansible Engineering Team. WIP This issue/PR is a work in progress. Nevertheless it was shared for getting input from peers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants