Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W1202 - logging-fstring-interpolation is not useful #2395

Closed
pzelnip opened this issue Aug 8, 2018 · 21 comments
Closed

W1202 - logging-fstring-interpolation is not useful #2395

pzelnip opened this issue Aug 8, 2018 · 21 comments

Comments

@pzelnip
Copy link
Contributor

pzelnip commented Aug 8, 2018

Is your feature request related to a problem? Please describe

In my experiments W1202 is misleading as f-strings performance is virtually equivalent to %-operator formatting in practice.

Describe the solution you'd like

W1202 be deprecated, as it can lead people to not use f-strings, which (if available) are the preferred mechanism for string interpolation.

Additional context

This is related to my inquiry around this warning: #2354

To verify my hypothesis that f-strings are virtually as performant as %-operator formatting I wrote the following snippet to explore timing related to log statements with the 3 approaches (f-strings, .format(), and %-operator). This was on a Django app I'm working on.

import timeit

def timeitex1(num_times):
    fstring = timeit.timeit(
        'logging.info(f"""this is a {thing}, and thing: {thing + "blah"}""")',
        setup="import logging ; thing='1234'",
        number=num_times,
    )

    formattim = timeit.timeit(
        'logging.info("this is a {}, and thing: {}".format(thing, thing + "blah"))',
        setup="import logging ; thing='1234'",
        number=num_times,
    )

    percentop = timeit.timeit(
        'logging.info("this is a %s, and thing: %s", thing, thing + "blah")',
        setup="import logging ; thing='1234'",
        number=num_times,
    )

    return (fstring, formattim, percentop)

With logging output disabled (ie in my Django settings I had: LOGGING["loggers"] = {}), %-formatting and f-strings are virtually identical.

num_times = 10,000:
(0.016536445124074817, 0.02152163698337972, 0.018616185057908297)

num_times = 100,000:
(0.16004435811191797, 0.20005284599028528, 0.1561291899997741)

num_times = 1,000,000:
(1.641325417906046, 2.0023047979921103, 1.6249939629342407)

Returned times are tuples of "time for f-strings", "time for the format() call", and "time for %-formatting".

That's a difference of ~1% (1.641325417906046 vs 1.6249939629342407 seconds) between f-strings & the %-operator on the largest run (.format calls being clearly slower).

With logging enabled, it's a total wash (technically f-strings was the winner in the larger runs but by such a small margin it's inconsequential). This is with log output going to standard out (so I only went up to 100,000 entries as I didn't want to dump 3 million log statements to my console):

num_times = 10,000:
(1.6245153648778796, 2.0137458329554647, 1.6029843359719962)

num_times = 100,000:
(4.39409149507992, 4.51585376705043, 4.532764581032097)

So basically unless you are emitting a ton of logging statements for a level that are never actually logged (say because the LOG_LEVEL is set high), then it doesn't matter. And if you're in that scenario, then really there's no noticeable difference between %-operator formatting or f-strings (just don't do .format() calls).

OTOH, if you are emitting the logging message, then it's likely that the f-string approach will be faster, so following W1202 will actually make your code (extremely slightly) slower.

@Steffan-Ennis
Copy link

Steffan-Ennis commented Aug 16, 2018

Have a read of #1788, fstrings are gone for good

@FrankDMartinez
Copy link

How about an option to exempt f-strings? By default, f-strings would be flagged; with the option, no.

@mcallaghan-bsm
Copy link

mcallaghan-bsm commented Jun 3, 2019

Do we have consensus on this? We too feel this should be deprecated.

As of Python 3.6 the recommended way to do this is with a formatted string literal. https://docs.python.org/3/reference/lexical_analysis.html#f-strings

Note that there are TWO warnings that could come ...

# python3 -m pylint blahblah.py 
************* Module blahblah
blahblah.py:8:13: W1202: Use % formatting in logging functions and pass the % parameters as arguments (logging-format-interpolation)
blahblah.py:11:0: W1203: Use % formatting in logging functions and pass the % parameters as arguments (logging-fstring-interpolation)

This is from:

MY_VAR_BAR = 'ywah'
_LOG.warning('foo %s', MY_VAR_BAR)
_LOG.warning('foo {}'.format(MY_VAR_BAR))
_LOG.warning('foo {MY_VAR_BAR}')
_LOG.warning("foo {MY_VAR_BAR}")
_LOG.warning(f"foo {MY_VAR_BAR}")

go

# python3 blahblah.py
foo ywah
foo ywah
foo {MY_VAR_BAR}
foo {MY_VAR_BAR}
foo ywah

Of note, in our .pylintrc file, we did already bump it to new format for logging-format-style...

[LOGGING]

# Format style used to check logging format string. `old` means using %
# formatting, while `new` is for `{}` formatting.
logging-format-style=new

@earonesty
Copy link

This is a per-project issue. We're writing a filesystem. There are certain points in the code where we have TRACE level logging. These are hit millions of times and interpolation can kill performance. Even DEBUG level f-strings can slaughter the functioning of low level code.

On the other hand, the readability of f-strings should generally be preferred when the number of interpolations isn't significant. In this case, providing developer with linter tools and options that help them make smart decisions is the right choice... not arbitrarily deprecating or enabling an option that has, at least for me personally, rescued seriously broken code.

Many pylint settings fall into this nuanced trade-off range, where considerations of readability, code complexity, etc. need to be balanced against each other and there are not easy choices. These generally cause controversy, and repeated issue tickets (this one is the 3rd!):

#2354
#1788

This could be an indicator that pylint needs more nuanced options to allow configuration of the linter depending on the nature of the project.

See: #3000 for an example.

@mcallaghan-bsm
Copy link

In addition to @earonesty 's comment above, another consideration MAY be that if we add clarity and documentation to these such nuanced warnings, the projects and users can make better educated decisions on their own.

(In this example, the information available at the moment for the fstring-interpolation warning leads to confusion and at worse developers NOT using this for non-performant required code which hurts readability unnecessarily)

@tristanbrown
Copy link

@earonesty, I think the point that's missed here is that the purpose of pylint is not to make every project's algorithmic decisions for them, but to make code generally consistent for readability purposes. The cases where the performance of interpolation matters seem particularly niche, whereas the cases where f-strings enhance readability are broad and varied.

In that sense, -in general- f-strings should be favored by default by style-checkers. The few projects where f-strings cause performance problems can set non-default configurations.

@chdsbd
Copy link

chdsbd commented Jul 16, 2019

Here’s a copy of a comment I made on #1788. #1788 (comment)

It seems like everyone here is talking about performance. There are other benefits to letting logging do the interpolation.

  1. if interpolation fails, it will be logged instead of raising an exception
  2. things like Sentry can aggregate log messages. If you interpolate before logging, Sentry would see it as one event 0

@earonesty
Copy link

I've come around to the idea that f-strings are almost always a bad idea, certainly in logging. The run-time implications and side-effects of the f-string eval() simply don't belong in logging statements.

@tristanbrown
Copy link

I've come around to the idea that f-strings are almost always a bad idea, certainly in logging. The run-time implications and side-effects of the f-string eval() simply don't belong in logging statements.

@earonesty Can you elaborate on those side-effects?

@earonesty
Copy link

earonesty commented Oct 21, 2019

As @chdsbd said, string interpolation is happening at run-time in a way that can lead to exceptions. And while it's abundantly clear what the potential side effects of log.debug("%s", str(a.thing)), it's less clear that log.debug(f"{a.thing}") is the same thing.

@mcallaghan-bsm
Copy link

@earonesty I really don't agree with this as a general rule of thumb. It completely depends on the criticality of your application and frequency of logging. It is definitely possible that some applications may wish to avoid this for robustness and performance reasons, but MOST ~80% of Python programs need not worry.

@earonesty
Copy link

earonesty commented Oct 24, 2019

Sure, they don't have to worry. But linters are all about best practices. And in this case, you have a choice of logging lines that either can or cannot cause exceptions... and either can or cannot be aggregated. Worse... if you're writing a library... you're making this decision on behalf of your users. f-strings just don't improve readability that much in log lines to justify the possibility of all 3 issues: run-time failures, aggregation incompatibility, and performance issues.

@tristanbrown
Copy link

@earonesty it's generally best practice and more pythonic to explicitly catch your exceptions. Trusting the logger to implicitly catch exceptions, but only when the code is contained within the logging statement, seems like a rather odd and unreliable approach.

I disagree about the extent of readability. f-strings put an expression exactly where it should go in a string statement. That's a monumental step up from positional %s replacements, improving readability and reducing the potential for bugs:

Suppose you have a logging string requiring 5 different variable replacements. It's already difficult to see which element of the tuple maps to which %s instance. Now you want to rearrange the message a little bit, or add a variable to the message, or remove one. Now are you sure you've rearranged all of the variables in the tuple correctly? The potential for error here is quite high, especially if you're working with someone else's code.

Most people should be using f-strings where possible. I don't deny that there are edge cases where %s or even .format can work better, but f-strings should be the default.

@earonesty
Copy link

earonesty commented Oct 24, 2019

@tristanbrown

  1. i don't think you understand what kinds of exceptions are handled. the exceptions are those that result from unintentional invalid string conversions. of course, relying on it is absurd. it is equally absurd for your log lines to introduce bugs at runtime. "%s" conversion fixes some, but not all of this. ideally, you could pass f-like-strings for deferred evaluation when used. that way log lines could never break a working program. but python doesn't have that feature yet.

  2. i agree, f-strings are easier to work with when you have larger templates with many variables. not that this is typical for logging. i cloned the top 10 'trending' repos from github just now and not one of them used more than 2 in a log line.

  3. many, many people think that logging with f-strings results in deferred evaluation. i've fought that fire enough times at enough companies to know never to allow it.

@thormick
Copy link

I do see that this is an old, closed issue, but it's the first hit when I Googled this pylint warning, but it's still being referenced here and on SO.

When I test it the way it's tested in the original it seems to impose overhead with getting the logger that muddles the results, more than 40% of the measured runtime being getting the logger over and over. I don't quite see why, logging with logger directly this way should just create a root logger once and get it on each subsequent run, but there's still a significant overhead. Maybe timeit interferes somehow, or there's more to it anyhow. I also wanted to check how formatting with the % operator performs, as it's used on a project I'm working on, and it's a tiny bit faster than using format. There also isn't code in the f-strings on the projects I'm currently on, so I skipped the + "blah" part.

My whole test then becomes:

import timeit


def timeitex2(num_times):
    fstring = timeit.timeit(
        'logging.info(f"""this is a {thing}, and thing: {thing}""")',
        setup="import logging ; thing='1234'",
        number=num_times,
    )
    formattim = timeit.timeit(
        'logging.info("this is a {}, and thing: {}".format(thing, thing))',
        setup="import logging ; thing='1234'",
        number=num_times,
    )
    percentop = timeit.timeit(
        'logging.info("this is a %s, and thing: %s", thing, thing)',
        setup="import logging ; thing='1234'",
        number=num_times,
    )
    formatpercentop = timeit.timeit(
        'logging.info("this is a %s, and thing: %s" % (thing, thing))',
        setup="import logging ; thing='1234'",
        number=num_times,
    )
    fstring2 = timeit.timeit(
        'logger.info(f"""this is a {thing}, and thing: {thing}""")',
        setup="import logging ; thing='1234' ; logger = logging.getLogger(__name__)",
        number=num_times,
    )
    formattim2 = timeit.timeit(
        'logger.info("this is a {}, and thing: {}".format(thing, thing))',
        setup="import logging ; thing='1234' ; logger = logging.getLogger(__name__)",
        number=num_times,
    )
    percentop2 = timeit.timeit(
        'logger.info("this is a %s, and thing: %s", thing, thing)',
        setup="import logging ; thing='1234' ; logger = logging.getLogger(__name__)",
        number=num_times,
    )
    formatpercentop2 = timeit.timeit(
        'logger.info("this is a %s, and thing: %s" % (thing, thing))',
        setup="import logging ; thing='1234' ; logger = logging.getLogger(__name__)",
        number=num_times,
    )
    return (fstring, formattim, percentop, formatpercentop, fstring2, formattim2, percentop2, formatpercentop2)

So the four first results are using logging.info, the last four use the same string formatting style but use a logger object.

I ran this several times until I got a run where every result was lower than for all previous runs:

>>> print(timeitex2(10000000))
(4.543375627021305, 5.602089578984305, 4.252049928996712, 5.226050494937226, 2.43524838401936, 3.362805937998928, 1.9745253099827096, 3.079829377005808)

So while f-strings are only 6% slower in this test when using logger directly:

>>> print( 4.543375627021305 / 4.252049928996712 )
1.0685141761948531

They're 23% slower when testing using a logger instance:

>>> print( 2.43524838401936 / 1.9745253099827096 )
1.2333335874234423

This is still not much if there isn't much logging per second, of course, so I'd say the discussion about correctness vs. readability trumps this in the general case.

To clarify, this is of course when the logs aren't being emitted, when they are this is moot anyhow, and f-strings seem a slight bit more performant, as originally mentioned.

@FrankDMartinez
Copy link

For the record, conversations like this, where people get bogged down in what others should do in their projects instead of giving them the tools to do what they determine they should do and project maintainers appear to not have any interest in supporting said tool (in this case, an option), are why I have stopped using so many tools, pylint being no exception.

@earonesty
Copy link

earonesty commented Feb 12, 2021

@thormick that's fine for integers. what about complex structures where str() is extremely expensive. i've had to debug code where str() was hitting an endpoint. i'm surprised the effect is so large just for ints.

@FrankDMartinez it's easy enough to disable. i disable a lot of pylint things. my pylintrc:

disable=broad-except,
        wrong-import-position,
        wrong-import-order,
        trailing-newlines,
        unidiomatic-typecheck,
        no-else-raise,
        bad-continuation,
        line-too-long,
        too-few-public-methods,
        no-else-return,
        missing-final-newline,
        logging-fstring-interpolation,
        trailing-whitespace,
        len-as-condition,               # change this at your peril!, len!=0 is not the same as truthy for all sequences!
        fixme,
        ungrouped-imports,
        invalid-name,
        consider-using-in,
        too-many-ancestors

@FrankDMartinez
Copy link

@earonesty Thanks. How long have you known about the disabling aspect?

@earonesty
Copy link

earonesty commented Feb 12, 2021

i just thought everyone knew about it :(. bad to assume. didn't notice your first comment was about how to disable it.

@couling
Copy link

couling commented Apr 19, 2021

I'm surprised people spend so long arguing the performance case for this and not mention the code-injection attack this check prevents. Personally I'd have thought the issue of causing errors by using f-strings was of greater importance.

If you pass unsanitized input to logging you can get errors if that input happens to contain %s. So a code block as below is prone to error just because some user typed %s.

value = get_user_input()
logger.info(f"User wrote {value}")

Now I know this linter check doesn't catch the trivial logger.info(value). But in my experience it does catch 99/100 of those bugs just because logging some string without describing it is pretty uncommon.

@Pierre-Sassoulas
Copy link
Member

The configuration for fstring is avaialble since pylint 2.4 and was introduced in eeca6cf.

See, http://pylint.pycqa.org/en/latest/whatsnew/2.4.html?highlight=fstring%20interpolation

Allow the option of f-strings as a valid logging string formatting method.

logging-fstring--interpolation has been merged into logging-format-interpolation to allow the logging-format-style option to control which logging string format style is valid. To allow this, a new fstr value is valid for the logging-format-style option.

I'm going to prevent further discussion here.

@pylint-dev pylint-dev locked as resolved and limited conversation to collaborators Apr 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants