Skip to content

Subtle changes to sprintf in 6.e #519

@lizmat

Description

@lizmat

With RakuAST getting closer to be the default backend, it was time to
me to revisit the work I did on sprintf about 3 years ago (but
which was halted because of the inability to be able to use synthetically
generated code in precomp files. Which is now fixed, thanks to @ugexe++).

Background

Perhaps some background first: the current (legacy) implementation of
sprintf actually lives in NQP and has a number of known issues:

  1. slow

Because every call to sprintf takes the format string, parses it
using the grammar in NQP and produces a string from the result of that
parse.

  1. errors

Several nooks and crannies produces incorrect strings, e.g.
sprintf("%#-08.2f",0) produces "0000.000" whereas it should produce
"00000.00". Or sprintf("%#o",-64) producing "0-100".

  1. unimplemented features

Some more obscure features are not implemented, such as # in %f
(show decimal point even if the precision is 0), or %F (uppercase
Inf and NaN).

  1. Native num based

The NQP implementation is based on native nums and is thus limited in
its precision.

Re-imagine

So with all of that known now (well, actually I only knew about 1.) it
felt like an excellent opportunity to create a new implementation using
RakuAST.

The big difference with the legacy implementation is that the RakuAST
implementation only runs the sprintf grammar once and from that
creates a Callable. A simple example: the format string "foo%xbar"
creates:

-> Int() $a! {
    ("foo", $a ?? $a.base(16).lc !! "0", "bar").join
}

The advantage to this approach is that once the Callable has been
created, it is a piece of code that will be executed (and runtime
optimized) like any other Raku code. And in principle, the above
format could be converted to a Callable at compile time, because
there are no dynamic parts in the format string (which would happen
at the optimize stage in RakuAST, which is still quite underdeveloped
at the moment).

Anyways, I blogged about it at the time.

Issues / Worries

  1. %s vs $foo

The parsing a grammar and turning a format string into executable code
is quite a bit more involved than parsing a grammar to create a string.
So an old optimization for just interpolating a string was to use a
double quoted string, instead of using %s in the format and passing
the string as an argument.

This optimization is still valid (to an extent) if the content of
the variable doesn't change. However, if the contents of that variable
changes, creating a Callable for each change can become a "burden".
Some timings:

$ time raku -e 'use v6.d; my $a = "foo"; Nil for ^100000'
real    0.10s
$ time raku -e 'use v6.d; my $a = "foo"; sprintf("zip%s",$a) for ^100000'
real    1.92s
$ time raku -e 'use v6.d; my $a = "foo"; sprintf("zip$a") for ^100000'
real    0.81s
$ time raku -e 'use v6.d; sprintf("zip$_") for ^100000'
real    0.81s
$ time raku -e 'use v6.*; my $a = "foo"; sprintf("zip%s",$a) for ^100000'
real    0.19s
$ time raku -e 'use v6.*; my $a = "foo"; sprintf("zip$a") for ^100000'
real    0.16s
$ time raku -e 'use v6.*; sprintf("zip$_") for ^100000'
real    59.65s

This shows that the new sprintf implementation, even without compile
time caching of the Callable, is between 100x and 200x as fast as the
old implementation, except when the format string is not static, in
which case the new implementation is about 80x slower.

So I feel that might become a gotcha for some code in the ecosystem,
when people are expecting to have things go faster, but in fact would
only see a very significant slowdown.

In RakuAST at CHECK time, it should be possible to issue a "worry"
for the case when a sprintf is being called with a dynamic string.
However, as seen above, if the variable in there is not changing its
value often, it may well be a valid optimization. So YMMV.

  1. Stricter binding

As seen in the generated code example for "foo%x", the argument is
using coercion to create an Int value that is needed to be able to
convert it to a hex presentation.

-> Int() $a! {
    ("foo", $a ?? $a.base(16).lc !! "0", "bar").join
}

However, this creates a difference in behaviour for type objects
(a difference that is actually "enshrined" in some spectests):

$ raku -e 'use v6.d; dd sprintf("foo%x",Num)'
Use of uninitialized value of type Num in numeric context
  in block <unit> at -e line 1
"foo0"
$ raku -e 'use v6.*; dd sprintf("foo%x",Num)'
Cannot create an Int from a 'Num' type object
  in block <unit> at -e line 1

The stricter coercion rules make passing the Num type object an
execution error, because Num.Int is an execution error.

One could argue that this is an improvement, as in the legacy
implementation it would just create a warning, and produce "0"
and thus hide a problem in the input data.

On the other hand one could argue that this is a regression.

If we're going to consider this a regression, then additional
checking code would have to be inserted, which would slow down
the general execution of sprintf. And if we're going to
do that, we would need to decide whether this is only going
to be for the sprintf function, and not for the new q:o//
adverb in quoting.

Suggestions

I think I covered all possible issues with regards to sprintf
in 6.e. But I probably missed a few. So please comment!

Metadata

Metadata

Assignees

No one assigned

    Labels

    languageChanges to the Raku Programming Language

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions